OAK

Maximum likelihood speaking rate normalization of speech signals for improving access to speech-enabled automatic systems

Metadata Downloads
Author(s)
Choi, Seung HoKim, Hong Kook
Type
Article
Citation
Asia Life Sciences, pp.197 - 206
Issued Date
2015-07
Abstract
Speech is the most natural medium for human communication. Thus, the goal of automatic speech recognition (ASR) is to help humans easily communicate with computational devices and to integrate technology into human life. ASR is very useful and has many applications in various areas. However, speaking rate is one of the variabilities influencing ASR performance. In this paper, we propose a maximum likelihood (ML) speaking rate normalization approach for hidden Markov model (HMM)-based speech recognition, which is realized through the combination of signal-level and acoustic model-level approaches. The speaking rate of input speech is controlled by applying a time-scale modification (ISM) algorithm. Speaking rate normalization is achieved by selecting a scale factor of ISM. The scale factor selection for training and testing of a speech recognition system is performed based on an ML criterion during HMM decoding. From connected digit recognition experiments, it is shown that a speech recognition system employing the proposed speaking rate normalization technique can reduce average word error rate (WER) by 9.5% compared to that without any speaking rate normalization.
Publisher
Asia Life Sciences
ISSN
0117-3375
URI
https://scholar.gist.ac.kr/handle/local/14650
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.