R3VQ: Redundancy-Reduced Residual Vector Quantization for Low-Bitrate Neural Speech Coding
- Author(s)
- Lee, Eunkyun; Chae, Jongwook; Park, Sooyoung; Shin, Jong Won
- Type
- Article
- Citation
- IEEE SIGNAL PROCESSING LETTERS, v.33, pp.693 - 697
- Issued Date
- 2026-01
- Abstract
- Neural speech and audio codecs have demonstrated decent quality of the decoded audio at low bitrates. They consist of three parts, an encoder, a decoder, and a quantizer. Residual vector quantization (RVQ) or multi-stage vector quantization in which the residual signal from the previous stage is quantized in the next stage is employed in many neural speech codecs and has exhibited good performance while providing bitrate scalability. In this letter, we propose the redundancy-reduced residual vector quantization (R3VQ) which improves the RVQ by inserting a neural network called a refiner. The role of the refiner is to reduce the power of the residual signal to be quantized by enhancing the estimate of the original speech from the quantized signals in the previous stages. We also present a part-wise (PW) training scheme suitable for the training of the neural speech codec with the R3VQ. Experimental results showed that the proposed R3VQ trained with a PW training scheme outperformed the RVQ in both objective measures for speech quality and subjective MUltiple Stimuli with Hidden Reference and Anchor (MUSHRA) test.
- Publisher
- IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
- ISSN
- 1070-9908
- DOI
- 10.1109/LSP.2026.3655351
- URI
- https://scholar.gist.ac.kr/handle/local/33658
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.