OAK

GIST Library Login

Metadata Downloads

Abstract: Multi-modal Emotion Recognition in Conversation (MERC) aims to estimate emotional states by simultaneously leveraging multiple modalities for each utterance in a conversation. However, existing MERC models based on single-scale embeddings struggle to capture both the transient changes in intonation and the long-range contextual dependencies of a dialogue. Moreover, they often suffer from modality imbalance during training, where dominant modalities suppress the contribution of others, leading to suboptimal fusion. To address these issues, we propose applying Matryoshka Representation Learning (MRL) and Modality-wise Knowledge Distillation (MKD) to a MERC model that classifies emotional labels of each utterance. MRL enables the model to learn embeddings at multiple resolutions, thereby effectively capturing both short-term acoustic cues and broader contextual semantics. In parallel, MKD utilizes uni-modal emotion recognition models to guide the training of each modality-specific encoder, mitigating training imbalance and enhancing fusion performance. Experimental results on benchmark datasets demonstrate that the proposed framework achieves improved emotion recognition performance over baselines, validating the effectiveness of combining MRL and MKD for robust multi-modal learning.

Appears in Collections:: Department of Electrical Engineering and Computer Science > 3. Theses(Master)

공개 및 라이선스

qrcode

OAK GIST Scholar는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.