OAK

FlowSE: Flow Matching-based Speech Enhancement

Metadata Downloads
Abstract
Diffusion probabilistic models have shown impressive performance for speech enhancement, but they typically require 25 to 60 function evaluations in the inference phase, resulting in heavy computational complexity. Recently, a fine-tuning method was proposed to correct the reverse process, which significantly lowered the number of function evaluations (NFE). Flow matching is a method to train continuous normalizing flows which model probability paths from known distributions to unknown distributions including those described by diffusion processes. In this paper, we propose a speech enhancement based on conditional flow matching. The proposed method achieved the performance comparable to those for the diffusion-based speech enhancement with the NFE of 60 when the NFE was 5, and showed similar performance with the diffusion model correcting the reverse process at the same NFE from 1 to 5 without additional fine tuning procedure. We also have shown that the corresponding diffusion model derived from the conditional probability path with a modified optimal transport conditional vector field demonstrated similar performances with the NFE of 5 without any fine-tuning procedure. © 2025 IEEE.
Author(s)
Lee, SeonggyuCheong, SeinHan, SangwookShin, Jong Won
Issued Date
2025
Type
Conference Paper
DOI
10.1109/ICASSP49660.2025.10888274
URI
https://scholar.gist.ac.kr/handle/local/23649
Publisher
Institute of Electrical and Electronics Engineers Inc.
Citation
2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
ISBN
979-835036874-1
ISSN
1520-6149
Conference Place
Hyderabad
Appears in Collections:
Department of Electrical Engineering and Computer Science > 2. Conference Papers
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.