OAK

Multi-band Approach to Deep Learning-Based Artificial Stereo Extension

Metadata Downloads
Abstract
In this paper, an artificial stereo extension method that creates stereophonic sound from a mono sound source is proposed. The proposed method first trains deep neural networks (DNNs) that model the nonlinear relationship between the dominant and residual signals of the stereo channel. In the training stage, the band-wise log spectral magnitude and unwrapped phase of both the dominant and residual signals are utilized to model the nonlinearities of each sub-band through deep architecture. From that point, stereo extension is conducted by estimating the residual signal that corresponds to the input mono channel signal with the trained DNN model in a sub-band domain. The performance of the proposed method was evaluated using a log spectral distortion (LSD) measure and multiple stimuli with a hidden reference and anchor (MUSHRA) test. The results showed that the proposed method provided a lower LSD and higher MUSHRA score than conventional methods that use hidden Markov models and DNN with full-band processing.
Author(s)
Jeon, Kwang MyungPark, Su YeonChun, Chan JunPark, Nam InKim, Hong Kook
Issued Date
2017-06
Type
Article
DOI
10.4218/etrij.17.0116.0773
URI
https://scholar.gist.ac.kr/handle/local/13745
Publisher
한국전자통신연구원
Citation
ETRI Journal, v.39, no.3, pp.398 - 405
ISSN
1225-6463
Appears in Collections:
Department of Electrical Engineering and Computer Science > 1. Journal Articles
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.