OAK

GIST Library Login

Metadata Downloads

Citation: IEEE/ACM Transactions on Speech and Language Processing, v.29, pp.2753 - 2763

Abstract: In speech separation, the identities of the speakers may be an important cue to discriminate speeches in the mixture and separate them better. A few recent researches used the speaker embedding as an additional information, but they often require prior information about the target speaker or used noisy speaker embedding extracted from the mixture signal. In this article, we propose monaural speech separation that utilizes the speaker embedding in the later separator blocks, which is extracted from the intermediate separated results obtained by the early stages of the separator network. The later blocks in the separator networks consisting of repeated blocks such as the fully-convolutional time-domain audio separation network (Conv-TasNet) or the successive downsampling and resampling of multi-resolution features (SuDoRM-RF) are modified to take the speaker information as a form of affine transformation or addition to the original input tensor. The experimental results showed that the proposed methods significantly improved the performances of existing separation systems with a moderate number of additional parameters.

Appears in Collections:: Department of Electrical Engineering and Computer Science > 1. Journal Articles

공개 및 라이선스

qrcode

OAK GIST Scholar는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.