OAK

Speech Emotion Recognition Using Fully Convolutional Capsule Network

Metadata Downloads
Author(s)
Jeonghwa Yoo
Type
Thesis
Degree
Master
Department
대학원 전기전자컴퓨터공학부
Advisor
Shin, Jong Won
Abstract
Speech emotion recognition (SER) is becoming more important for natural human-machine interaction. Recently, to solve the problem that it is difficult to find powerful speech features in distinguishing between emotions, SER techniques using a deep learning method that can classify emotions directly from a spectrogram of the speech have been proposed. Most techniques use Convolutional Neural Networks (CNNs) to capture the spatial information of the spectrogram. Since CNN has a problem of loss of spatial information, capsule networks that overcome the CNN's weaknesses has recently been proposed in the speech domain. However, despite the expectation that the original capsule network would be effective in SER due to the attention mechanism of the capsule network, the original capsule network is not suitable for SER because dynamic routing is fully connected to all pixels, so that the input size of the network must be fixed and the number of parameters increases dramatically. In this paper, we propose a fully convolutional capsule network (FCCaps) that replaces a fully connected layer with a convolutional layer to solve this problem. The capsule type of the last layer of the encoder of FCCaps was as many as the number of emotions to be classified, and each capsule type represents each emotion to be classified. For each capsule type, the L2 norm is calculated, and the emotion corresponding to the capsule type having the largest L2 norm value is predicted by the emotion of the sentence. We also proposed a decoder using transpose convolution method and used it as a regularization method. Experimental results show that our system achieves state-of-the-art performance in SER using spectrogram for IEMOCAP dataset over four emotions, i.e., angry, happy, neutral and sad.
URI
https://scholar.gist.ac.kr/handle/local/32777
Fulltext
http://gist.dcollection.net/common/orgView/200000909926
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.