OAK

Multi-band Approach to Deep Learning-Based Artificial Stereo Extension

Metadata Downloads
Author(s)
Jeon, Kwang MyungPark, Su YeonChun, Chan JunPark, Nam InKim, Hong Kook
Type
Article
Citation
ETRI Journal, v.39, no.3, pp.398 - 405
Issued Date
2017-06
Abstract
In this paper, an artificial stereo extension method that creates stereophonic sound from a mono sound source is proposed. The proposed method first trains deep neural networks (DNNs) that model the nonlinear relationship between the dominant and residual signals of the stereo channel. In the training stage, the band-wise log spectral magnitude and unwrapped phase of both the dominant and residual signals are utilized to model the nonlinearities of each sub-band through deep architecture. From that point, stereo extension is conducted by estimating the residual signal that corresponds to the input mono channel signal with the trained DNN model in a sub-band domain. The performance of the proposed method was evaluated using a log spectral distortion (LSD) measure and multiple stimuli with a hidden reference and anchor (MUSHRA) test. The results showed that the proposed method provided a lower LSD and higher MUSHRA score than conventional methods that use hidden Markov models and DNN with full-band processing.
Publisher
한국전자통신연구원
ISSN
1225-6463
DOI
10.4218/etrij.17.0116.0773
URI
https://scholar.gist.ac.kr/handle/local/13745
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.