OAK

GIST Library Login

검색

Metadata Downloads

Abstract: Neural machine translation is dependent on the need for large-scale parallel corpora. Parallel corpora are costly to build as they require specialized expertise and are often nonexistent for low-resource languages. The unsupervised neural machine translation addresses this issue by leveraging monolingual corpora through aligning across languages. However, the limitation is that they still require an anchoring to align two languages. In this paper, we first propose the concept of a Primitive Basis Vector based on the universal grammar assumption and structure it to learn a universal representation to improve alignment between two languages during back-translation. Specifically, we suggest the Primitive Basis Learning(PBL) framework, which consists of a Vector Quantization Primitive Dictionary (VQPD) and a Universal Structure Module. This framework encourages to learn language-agnostic representations, thereby reducing the semantic discrepancy between cross-lingual sentences. We empirically show that our proposed method outperforms existing approaches by improving cross-language alignment by structuring universal representations during back-translation.

공개 및 라이선스

qrcode

OAK GIST Repository는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.