OAK

GIST Library Login

GIST Scholar College of Information and Computing Department of Electrical Engineering and Computer Science 1. Journal Articles

Improved Alias-and-Separate Speech Coding Framework With Minimal Algorithmic Delay

Metadata Downloads

Author(s): Lee, Eunkyun; Beack, Seungkwon; Shin, Jong Won

Type: Article

Citation: IEEE Journal of Selected Topics in Signal Processing, v.18, no.8, pp.1414 - 1426

Issued Date: 2024-12

Abstract: Alias-and-Separate (AaS) speech coding framework has shown the possibility to encode wideband (WB) speech with a narrowband (NB) speech codec and reconstruct it using speech separation. WB speech is first decimated incurring aliasing and then coded, transmitted, and decoded with a NB codec. The decoded signal is then separated into lower band and spectrally-flipped high band using a speech separation module, which are expanded, lowpass/highpass filtered, and added together to reconstruct the WB speech. The original AaS system, however, has algorithmic delay originated from the overlap-add operation for consecutive segments. This algorithmic delay can be reduced by omitting the overlap-add procedure, but the quality of the reconstructed speech is also degraded due to artifacts on the segment boundaries. In this work, we propose an improved AaS framework with minimum algorithmic delay. The decoded signal is first expanded by inserting zeros in-between samples before being processed by source separation module. As the expanded signal can be viewed as a summation of the frequency-shifted versions of the original signal, the decoded-and-expanded signal is then separated into the frequency-shifted signals, which are multiplied by complex exponentials and summed up to reconstruct the original signal. With carefully designed transposed convolution operation in the separation module, the proposed system requires minimal algorithmic delay while preventing discontinuity at the segment boundaries. Additionally, we propose to employ a generative vocoder to further improve the perceived quality and a modified multi-resolution short-time Fourier transform (MR-STFT) loss. Experimental results on the WB speech coding with a NB codec demonstrated that the proposed system outperformed the original AaS system and the existing WB speech codec in the subjective listening test. We have also shown that the proposed method can be applied when the decimation factor is not 2in the experiment on the fullband speech coding with a WB codec. © 2007-2012 IEEE.

Publisher: Institute of Electrical and Electronics Engineers Inc.

ISSN: 1932-4553

DOI: 10.1109/JSTSP.2024.3501681

URI: https://scholar.gist.ac.kr/handle/local/9185

Appears in Collections:: Department of Electrical Engineering and Computer Science > 1. Journal Articles

메타데이터 간략히 보기메타데이터 전체 보기

공개 및 라이선스

공개 구분공개

qrcode

트윗하기

OAK GIST Scholar는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.