NCWP:Unsupervised Semantic Embedding Alignment Post-processing for Improving RAG in Language Models
- Author(s)
- GangHo Lee
- Type
- Thesis
- Degree
- Master
- Department
- 정보컴퓨팅대학 AI융합학과
- Advisor
- Lee, Yong-Gu
- Abstract
- This paper investigates structural limitations of large language model embeddings in Retrieval-Augmented Generation (RAG) systems, focusing on anisotropy and high dimensionality. When most variance is concentrated in a few principal directions, cosine similarity becomes distorted; at the same time, thousand-dimensional vectors incur substantial memory, indexing, and latency costs. Classical post-processing methods such as mean-centering, PCA/LPP, whitening, and random projection can partially restore isotropy by rescaling variances, but they do not explicitly learn to preserve neighborhood structure and rankings without labels. Contrastive fine-tuning of encoders can improve retrieval, yet it requires updating the whole model and is expensive to deploy.
To address this, the paper proposes NCWP (Neighbour-Contrastive Whitening Projection), a purely post-hoc method that keeps the backbone language model frozen and learns only a single linear projection 𝑊. NCWP first applies ZCA-shrink whitening to obtain an approximately isotropic initial space, then constructs positive pairs and hard negatives from k-nearest neighbors and trains 𝑊 with an InfoNCE-style contrastive loss. Output covariance regularization, orthogonal regularization, and periodic QR retraction are used to prevent collapse and maintain isotropy even at low dimensions. Experiments on a synthetic sentence corpus with STS-based labels (STS-Embed) and traditional IR-based labels (U2/U3 from TF-IDF, Jaccard, BM25) show that NCWP consistently outperforms PCA-Whitening, LPP, and Random Projection in mAP and nDCG@10, with particularly large gains for low dimensions (𝑟 ≤64). While the base model exhibits anisotropy_mean around 0.39, NCWP reduces this value to near zero across all tested dimensions, and self_sim decreases, indicating stronger separation between non-matching sentences. At the same time, dimensionality reduction with NCWP reduces latency and increases QPS by up to 2–3 times, yielding a better quality–efficiency Pareto trade-off than existing post-processing methods. These results demonstrate that NCWP is a practical embedding post-processing strategy for improving RAG retrieval quality without modifying or fine-tuning the underlying language model.
- URI
- https://scholar.gist.ac.kr/handle/local/33792
- Fulltext
- http://gist.dcollection.net/common/orgView/200000952386
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.