Entropy coding of compressed feature parameters for distributed speech recognition
- Abstract
- In this paper, we propose several entropy coding methods to further compress quantized mel-frequency cepstral coefficients (MFCCs) used for distributed speech recognition (DSR). As a standard DSR front-end, the European Telecommunications Standards Institute (ETSI) published an extended front-end that includes the split-vector quantization of MFCCs and voicing class information. By exploring entropy variances of compressed MFCCs according to the voicing class of the analysis frame and the amount of the entropy due to MFCC subvector indices, voicing class-dependent and subvector-wise Huffman coding methods are proposed. In addition, differential Huffman coding is then applied to further enhance the coding gain against class-dependent and subvector-wise Huffman codings. Subsequent experiments show that the average bit-rate of the subvector-wise differential Huffman coding is measured at 33.93 bits/frame, which is the smallest among the proposed Huffman coding methods, whereas that of a traditional Huffman coding that does not consider voicing class and encodes with a single Huffman coding tree for all the subvectors is measured at 42.22 bits/frame for the TIMIT database. In addition, we evaluate the performance of the proposed Huffman coding methods applied to speech in noise by using the Aurora 4 database, a standard speech database for DSR. As a result, it is shown that the subvector-wise differential Huffman coding method provides the smallest average bit-rate. (C) 2010 Elsevier B.V. All rights reserved.
- Author(s)
- Lee, Young Han; Kim, Hong Kook
- Issued Date
- 2010-05
- Type
- Article
- DOI
- 10.1016/j.specom.2010.01.002
- URI
- https://scholar.gist.ac.kr/handle/local/16737
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.