OAK

Matryoshka Representation Learning with Modality-wise Knowledge Distillation for Multimodal Emotion Recognition in Conversation

Metadata Downloads
Author(s)
Young-jin Na
Type
Thesis
Degree
Master
Department
정보컴퓨팅대학 전기전자컴퓨터공학과
Advisor
Shin, Jong Won
Abstract
Multi-modal Emotion Recognition in Conversation (MERC) aims to estimate emotional states by simultaneously leveraging multiple modalities for each utterance in a conversation. However, existing MERC models based on single-scale embeddings struggle to capture both the transient changes in intonation and the long-range contextual dependencies of a dialogue. Moreover, they often suffer from modality imbalance during training, where dominant modalities suppress the contribution of others, leading to suboptimal fusion. To address these issues, we propose applying Matryoshka Representation Learning (MRL) and Modality-wise Knowledge Distillation (MKD) to a MERC model that classifies emotional labels of each utterance. MRL enables the model to learn embeddings at multiple resolutions, thereby effectively capturing both short-term acoustic cues and broader contextual semantics. In parallel, MKD utilizes uni-modal emotion recognition models to guide the training of each modality-specific encoder, mitigating training imbalance and enhancing fusion performance. Experimental results on benchmark datasets demonstrate that the proposed framework achieves improved emotion recognition performance over baselines, validating the effectiveness of combining MRL and MKD for robust multi-modal learning.
URI
https://scholar.gist.ac.kr/handle/local/33779
Fulltext
http://gist.dcollection.net/common/orgView/200000945085
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.