OAK

Monaural Speech Separation Using Speaker Embedding From Preliminary Separation

Metadata Downloads
Abstract
In speech separation, the identities of the speakers may be an important cue to discriminate speeches in the mixture and separate them better. A few recent researches used the speaker embedding as an additional information, but they often require prior information about the target speaker or used noisy speaker embedding extracted from the mixture signal. In this article, we propose monaural speech separation that utilizes the speaker embedding in the later separator blocks, which is extracted from the intermediate separated results obtained by the early stages of the separator network. The later blocks in the separator networks consisting of repeated blocks such as the fully-convolutional time-domain audio separation network (Conv-TasNet) or the successive downsampling and resampling of multi-resolution features (SuDoRM-RF) are modified to take the speaker information as a form of affine transformation or addition to the original input tensor. The experimental results showed that the proposed methods significantly improved the performances of existing separation systems with a moderate number of additional parameters.
Author(s)
Byun, JaeukShin, Jong Won
Issued Date
2021-08
Type
Article
DOI
10.1109/TASLP.2021.3101617
URI
https://scholar.gist.ac.kr/handle/local/11344
Publisher
IEEE Advancing Technology for Humanity
Citation
IEEE/ACM Transactions on Speech and Language Processing, v.29, pp.2753 - 2763
ISSN
2329-9290
Appears in Collections:
Department of Electrical Engineering and Computer Science > 1. Journal Articles
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.