Maximum likelihood speaking rate normalization of speech signals for improving access to speech-enabled automatic systems
- Author(s)
- Choi, Seung Ho; Kim, Hong Kook
- Type
- Article
- Citation
- Asia Life Sciences, pp.197 - 206
- Issued Date
- 2015-07
- Abstract
- Speech is the most natural medium for human communication. Thus, the goal of automatic speech recognition (ASR) is to help humans easily communicate with computational devices and to integrate technology into human life. ASR is very useful and has many applications in various areas. However, speaking rate is one of the variabilities influencing ASR performance. In this paper, we propose a maximum likelihood (ML) speaking rate normalization approach for hidden Markov model (HMM)-based speech recognition, which is realized through the combination of signal-level and acoustic model-level approaches. The speaking rate of input speech is controlled by applying a time-scale modification (ISM) algorithm. Speaking rate normalization is achieved by selecting a scale factor of ISM. The scale factor selection for training and testing of a speech recognition system is performed based on an ML criterion during HMM decoding. From connected digit recognition experiments, it is shown that a speech recognition system employing the proposed speaking rate normalization technique can reduce average word error rate (WER) by 9.5% compared to that without any speaking rate normalization.
- Publisher
- Asia Life Sciences
- ISSN
- 0117-3375
- URI
- https://scholar.gist.ac.kr/handle/local/14650
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.