Improved Alias-and-Separate Speech Coding Framework With Minimal Algorithmic Delay
- Abstract
- Alias-and-Separate (AaS) speech coding framework has shown the possibility to encode wideband (WB) speech with a narrowband (NB) speech codec and reconstruct it using speech separation. WB speech is first decimated incurring aliasing and then coded, transmitted, and decoded with a NB codec. The decoded signal is then separated into lower band and spectrally-flipped high band using a speech separation module, which are expanded, lowpass/highpass filtered, and added together to reconstruct the WB speech. The original AaS system, however, has algorithmic delay originated from the overlap-add operation for consecutive segments. This algorithmic delay can be reduced by omitting the overlap-add procedure, but the quality of the reconstructed speech is also degraded due to artifacts on the segment boundaries. In this work, we propose an improved AaS framework with minimum algorithmic delay. The decoded signal is first expanded by inserting zeros in-between samples before being processed by source separation module. As the expanded signal can be viewed as a summation of the frequency-shifted versions of the original signal, the decoded-and-expanded signal is then separated into the frequency-shifted signals, which are multiplied by complex exponentials and summed up to reconstruct the original signal. With carefully designed transposed convolution operation in the separation module, the proposed system requires minimal algorithmic delay while preventing discontinuity at the segment boundaries. Additionally, we propose to employ a generative vocoder to further improve the perceived quality and a modified multi-resolution short-time Fourier transform (MR-STFT) loss. Experimental results on the WB speech coding with a NB codec demonstrated that the proposed system outperformed the original AaS system and the existing WB speech codec in the subjective listening test. We have also shown that the proposed method can be applied when the decimation factor is not 2in the experiment on the fullband speech coding with a WB codec. © 2007-2012 IEEE.
- Author(s)
- Lee, Eunkyun; Beack, Seungkwon; Shin, Jong Won
- Issued Date
- 2024-12
- Type
- Article
- DOI
- 10.1109/JSTSP.2024.3501681
- URI
- https://scholar.gist.ac.kr/handle/local/9185
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.