OAK

GIST Library Login

Metadata Downloads

Author(s): Han, Daeyoung; Jung, Hyung Rok; Li, Tianhong; Katabi, Dina; Son, Jeany; Kim, Hong Kook; Jeon, Moongu

Abstract: Token-based self-supervised learning (SSL) has emerged as a powerful paradigm for leveraging large-scale unla beled data, yet it suffers from a previously overlooked challenge: token-class imbalance. We show that visual token distributions are highly skewed; a small subset of frequent tokens, often representing uninformative backgrounds, dominates the training process. Conversely, semantically rich but rare tokens are severely underrepresented. This imbalance distorts the learning objective, hindering the model's ability to learn robust representations and impairing generalization. To address this, we introduce two solutions adapted from imbalanced learning: a class-balanced cross-entropy loss that re-weights the training signal based on token rarity, and semantic-aware label smoothing (SLS), a novel regularization technique that leverages token embedding similarity to create more meaningful soft targets. We validate our methods on MAGE for representation learning and MaskGIT for im age generation. Our experiments demonstrate that these techniques significantly enhance both discriminative and generative performance, evidenced by improved linear separability of the representation space and better mode coverage in image synthesis, respectively. This study underscores the necessity of mitigating token-class imbalance, offering scalable solutions that contribute to more robust and generalizable visual learning.

Appears in Collections:: Department of Electrical Engineering and Computer Science > 1. Journal Articles

공개 및 라이선스

qrcode

OAK GIST Scholar는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.