OAK

R3VQ: Redundancy-Reduced Residual Vector Quantization for Low-Bitrate Neural Speech Coding

Metadata Downloads
Author(s)
Lee, EunkyunChae, JongwookPark, SooyoungShin, Jong Won
Type
Article
Citation
IEEE SIGNAL PROCESSING LETTERS, v.33, pp.693 - 697
Issued Date
2026-01
Abstract
Neural speech and audio codecs have demonstrated decent quality of the decoded audio at low bitrates. They consist of three parts, an encoder, a decoder, and a quantizer. Residual vector quantization (RVQ) or multi-stage vector quantization in which the residual signal from the previous stage is quantized in the next stage is employed in many neural speech codecs and has exhibited good performance while providing bitrate scalability. In this letter, we propose the redundancy-reduced residual vector quantization (R3VQ) which improves the RVQ by inserting a neural network called a refiner. The role of the refiner is to reduce the power of the residual signal to be quantized by enhancing the estimate of the original speech from the quantized signals in the previous stages. We also present a part-wise (PW) training scheme suitable for the training of the neural speech codec with the R3VQ. Experimental results showed that the proposed R3VQ trained with a PW training scheme outperformed the RVQ in both objective measures for speech quality and subjective MUltiple Stimuli with Hidden Reference and Anchor (MUSHRA) test.
Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
ISSN
1070-9908
DOI
10.1109/LSP.2026.3655351
URI
https://scholar.gist.ac.kr/handle/local/33658
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.