OAK

Speech Enhancement based on cascaded two flows

Metadata Downloads
Author(s)
Lee, SeonggyuCheong, SeinHan, SangwookKim, KihyukShin, Jong-won
Type
Conference Paper
Citation
26th Interspeech Conference 2025, pp.4863 - 4867
Issued Date
2025-08-21
Abstract
Speech enhancement (SE) based on diffusion probabilistic models has exhibited impressive performance, while requiring a relatively high number of function evaluations (NFE). Recently, SE based on flow matching has been proposed, which showed competitive performance with a small NFE. Early approaches adopted the noisy speech as the only conditioning variable. There have been other approaches which utilize speech enhanced with a predictive model as another conditioning variable and to sample an initial value, but they require a separate predictive model on top of the generative SE model. In this work, we propose to employ an identical model based on flow matching for both SE and generating enhanced speech used as an initial starting point and a conditioning variable. Experimental results showed that the proposed method required the same or fewer NFEs even with two cascaded generative methods while achieving equivalent or better performances to the previous baselines. © 2025 International Speech Communication Association. All rights reserved.
Publisher
International Speech Communication Association
Conference Place
NE
Rotterdam
URI
https://scholar.gist.ac.kr/handle/local/32378
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.