Matryoshka Representation Learning with Modality-wise Knowledge Distillation for Multimodal Emotion Recognition in Conversation
- Author(s)
- Young-jin Na
- Type
- Thesis
- Degree
- Master
- Department
- 정보컴퓨팅대학 전기전자컴퓨터공학과
- Advisor
- Shin, Jong Won
- Abstract
- Multi-modal Emotion Recognition in Conversation (MERC) aims to estimate emotional states by simultaneously leveraging multiple modalities for each utterance in a conversation. However, existing MERC models based on single-scale embeddings struggle to capture both the transient changes in intonation and the long-range contextual dependencies of a dialogue. Moreover, they often suffer from modality imbalance during training, where dominant modalities suppress the contribution of others, leading to suboptimal fusion. To address these issues, we propose applying Matryoshka Representation Learning (MRL) and Modality-wise Knowledge Distillation (MKD) to a MERC model that classifies emotional labels of each utterance. MRL enables the model to learn embeddings at multiple resolutions, thereby effectively capturing both short-term acoustic cues and broader contextual semantics. In parallel, MKD utilizes uni-modal emotion recognition models to guide the training of each modality-specific encoder, mitigating training imbalance and enhancing fusion performance. Experimental results on benchmark datasets demonstrate that the proposed framework achieves improved emotion recognition performance over baselines, validating the effectiveness of combining MRL and MKD for robust multi-modal learning.
- URI
- https://scholar.gist.ac.kr/handle/local/33779
- Fulltext
- http://gist.dcollection.net/common/orgView/200000945085
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.