OAK

On Training Speech Separation Models With Various Numbers of Speakers

Metadata Downloads
Abstract
Many monaural speech separation models assume that the exact number of speakers is known in advance, which is not applicable to many real-world scenarios. To deal with an unknown number of speakers, previous approaches either iteratively separate one speech at a time, or employ a more relaxed assumption that the maximum number of speakers is known a priori and set the number of outputs accordingly. When the number of speakers in the mixture is smaller than the number of outputs in the latter case, the extra outputs that are not mapped onto signals in the input mixture are trained to produce predefined target signals such as the silence or the input mixture. In this letter, we propose to ignore the extra outputs in training instead of evaluating the cost with a certain target for separation models with a fixed number of output channels. We also introduce a method to select valid output signals. Experimental results showed that assigning any type of predefined targets degraded separation performance compared with ignoring the extra outputs.
Author(s)
Kim, HyeonseungShin, Jong Won
Issued Date
2023-08
Type
Article
DOI
10.1109/LSP.2023.3310881
URI
https://scholar.gist.ac.kr/handle/local/10062
Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Citation
IEEE SIGNAL PROCESSING LETTERS, v.30, pp.1202 - 1206
ISSN
1070-9908
Appears in Collections:
Department of Electrical Engineering and Computer Science > 1. Journal Articles
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.