OAK

On Training Speech Separation Models With Various Numbers of Speakers

Metadata Downloads
Author(s)
Kim, HyeonseungShin, Jong Won
Type
Article
Citation
IEEE SIGNAL PROCESSING LETTERS, v.30, pp.1202 - 1206
Issued Date
2023-08
Abstract
Many monaural speech separation models assume that the exact number of speakers is known in advance, which is not applicable to many real-world scenarios. To deal with an unknown number of speakers, previous approaches either iteratively separate one speech at a time, or employ a more relaxed assumption that the maximum number of speakers is known a priori and set the number of outputs accordingly. When the number of speakers in the mixture is smaller than the number of outputs in the latter case, the extra outputs that are not mapped onto signals in the input mixture are trained to produce predefined target signals such as the silence or the input mixture. In this letter, we propose to ignore the extra outputs in training instead of evaluating the cost with a certain target for separation models with a fixed number of output channels. We also introduce a method to select valid output signals. Experimental results showed that assigning any type of predefined targets degraded separation performance compared with ignoring the extra outputs.
Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
ISSN
1070-9908
DOI
10.1109/LSP.2023.3310881
URI
https://scholar.gist.ac.kr/handle/local/10062
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.