OAK

Text-to-Speech With Lip Synchronization Based on Speech-Assisted Text-to-Video Alignment and Masked Unit Prediction

Metadata Downloads
Abstract
Text-to-speech (TTS) with lip synchronization (TTSLS) is the task of generating a speech signal synchronized with the lip movements in a video given the text transcription and the video without speech. Previous approaches to TTSLS aligned the phoneme sequence and video frames using scaled dot-product attention with a diagonal constraint loss, which was employed to prevent a phoneme from being assigned to video frames too far away. However, the diagonal constraint loss basically assumes that the duration of each phoneme is about the same, which is not always valid as speaking styles can be different. In this letter, we propose a TTSLS system based on speech-assisted text-to-video alignment and masked unit prediction. By utilizing the ground-truth speech signal available in the training phase, we construct a loss function for text-to-video alignment using the text-to-speech alignment obtained by a pre-trained TTS model. To deal with video frames without frontal lip images, we employ a masked unit prediction loss so that the unit predictor in the proposed system can estimate the masked units from the rest of the units. In addition, we modified the probability distribution for the unit predictor using a learnable null embedding for video inspired by classifier-free guidance. Experimental results demonstrated that our proposed method outperformed previous TTSLS systems in both lip-speech synchronization and speech recognition performance. © 1994-2012 IEEE.
Author(s)
Ahn, YoungdoChae, JongwookShin, Jong Won
Issued Date
2025-02
Type
Article
DOI
10.1109/LSP.2025.3537949
URI
https://scholar.gist.ac.kr/handle/local/9047
Publisher
Institute of Electrical and Electronics Engineers Inc.
Citation
IEEE Signal Processing Letters, v.32, pp.961 - 965
ISSN
1070-9908
Appears in Collections:
Department of Electrical Engineering and Computer Science > 1. Journal Articles
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.