OAK

GIST Library Login

Metadata Downloads

Abstract: Speech is the most natural medium for human communication. Thus, the goal of automatic speech recognition (ASR) is to help humans easily communicate with computational devices and to integrate technology into human life. ASR is very useful and has many applications in various areas. However, speaking rate is one of the variabilities influencing ASR performance. In this paper, we propose a maximum likelihood (ML) speaking rate normalization approach for hidden Markov model (HMM)-based speech recognition, which is realized through the combination of signal-level and acoustic model-level approaches. The speaking rate of input speech is controlled by applying a time-scale modification (ISM) algorithm. Speaking rate normalization is achieved by selecting a scale factor of ISM. The scale factor selection for training and testing of a speech recognition system is performed based on an ML criterion during HMM decoding. From connected digit recognition experiments, it is shown that a speech recognition system employing the proposed speaking rate normalization technique can reduce average word error rate (WER) by 9.5% compared to that without any speaking rate normalization.

Appears in Collections:: Department of Electrical Engineering and Computer Science > 1. Journal Articles

공개 및 라이선스

qrcode

OAK GIST Scholar는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.