OAK

Speech Emotion Recognition Incorporating Relative Difficulty and Labeling Reliability

Metadata Downloads
Abstract
Emotions in speech are expressed in various ways, and the speech emotion recognition (SER) model may perform poorly on unseen corpora that contain different emotional factors from those expressed in training databases. To construct an SER model robust to unseen corpora, regularization approaches or metric losses have been studied. In this paper, we propose an SER method that incorporates relative difficulty and labeling reliability of each training sample. Inspired by the Proxy-Anchor loss, we propose a novel loss function which gives higher gradients to the samples for which the emotion labels are more difficult to estimate among those in the given minibatch. Since the annotators may label the emotion based on the emotional expression which resides in the conversational context or other modality but is not apparent in the given speech utterance, some of the emotional labels may not be reliable and these unreliable labels may affect the proposed loss function more severely. In this regard, we propose to apply label smoothing for the samples misclassified by a pre-trained SER model. Experimental results showed that the performance of the SER on unseen corpora was improved by adopting the proposed loss function with label smoothing on the misclassified data. © 2024 by the authors.
Author(s)
Ahn, YoungdoHan, SangwookLee, SeonggyuShin, Jong Won
Issued Date
2024-07
Type
Article
DOI
10.3390/s24134111
URI
https://scholar.gist.ac.kr/handle/local/9465
Publisher
Multidisciplinary Digital Publishing Institute (MDPI)
Citation
Sensors, v.24, no.13
ISSN
1424-3210
Appears in Collections:
Department of Electrical Engineering and Computer Science > 1. Journal Articles
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.