OAK

Multi-channel Speech Separation with Gammatone Filterbank in Reverberant Environments

Metadata Downloads
Author(s)
Jinwoo Oh
Type
Thesis
Degree
Master
Department
대학원 전기전자컴퓨터공학부
Advisor
Shin, Jong Won
Abstract
Recently, various deep learning based multi-channel speech separation (MCSS) models have been proposed to address the performance degradation problem of single-channel models in reverberant environments. Among them, there are several approaches that use the spectral feature of the reference channel signal and the additional inter-channel features as the input of the masking-based single-channel separator such as a fully convolutional time-domain audio separation network (Conv-TasNet). Some utilize hand-crafted spatial features such as inter-channel phase difference (IPD), others extract cross-channel features in a data-driven manner (e.g., inter-channel convolution difference, ICD).
In this paper, we propose a multi-channel version of a multi-phase gammatone filterbank based speech separation network. It is shown that the speech separation varies depending on which optimization technique is used. Our experimental results show that the multi-phase Gammatone filterbank based feature has comparable performance and is more explainable than the feature extracted by a learnable encoder. Moreover, evaluation results show that the proposed Gammatone feature-based MCSS model outperforms existing MCSS models in both Wall Street Journal 0 (WSJ0) 2-mix dataset and LibriSpeech 2-mix dataset in reverberant environments. In addition, we show that the model trained on the English dataset also performs on a Korean dataset, without any fine-tuning.
URI
https://scholar.gist.ac.kr/handle/local/19508
Fulltext
http://gist.dcollection.net/common/orgView/200000884940
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.