OAK

GIST Library Login

검색

Metadata Downloads

Abstract: Recently, various deep learning based multi-channel speech separation (MCSS) models have been proposed to address the performance degradation problem of single-channel models in reverberant environments. Among them, there are several approaches that use the spectral feature of the reference channel signal and the additional inter-channel features as the input of the masking-based single-channel separator such as a fully convolutional time-domain audio separation network (Conv-TasNet). Some utilize hand-crafted spatial features such as inter-channel phase difference (IPD), others extract cross-channel features in a data-driven manner (e.g., inter-channel convolution difference, ICD).
In this paper, we propose a multi-channel version of a multi-phase gammatone filterbank based speech separation network. It is shown that the speech separation varies depending on which optimization technique is used. Our experimental results show that the multi-phase Gammatone filterbank based feature has comparable performance and is more explainable than the feature extracted by a learnable encoder. Moreover, evaluation results show that the proposed Gammatone feature-based MCSS model outperforms existing MCSS models in both Wall Street Journal 0 (WSJ0) 2-mix dataset and LibriSpeech 2-mix dataset in reverberant environments. In addition, we show that the model trained on the English dataset also performs on a Korean dataset, without any fine-tuning.

Appears in Collections:: Department of Electrical Engineering and Computer Science > 3. Theses(Master)

공개 및 라이선스

qrcode

OAK GIST Repository는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.