Improving Back-Translation with Denoising Auto-Encoding
- Abstract
- The shift from recurrent neural network models to transformer models in neural machine translation has significantly boosted translation quality. However, most neural machine translation models require a substantial parallel corpus data, posing challenges in acquisition. To enhance translation quality, researchers have explored data augmentation methods extensively, especially leveraging easier-to-obtain monolingual corpus data in various studies, including dual learning and back-translation. Back-translation, a widely-used data augmentation technique in neural machine translation, creates synthetic parallel data by translating the target language back into the source language using monolingual corpus data. Neural machine translation models employing back-translation are typically trained on three data types: (1) original data, (2) reference translation, and (3) translated data. Reference translation and translated data, human- and translation model-generated, respectively, usually exhibit similar characteristics but differ from original data. Back-translation primarily improves translations for inputs like reference translation and translated data. However, when the input is original data, impact of back-translation may be limited or even negative. To address this limitation, this dissertation aims to enhance performance of back-translation for original data inputs. The proposal involves using back-translation with denoising auto-encoding to make synthetic data characteristics resemble original data, improving effectiveness of back-translation.
- Author(s)
- Seokhyun Oh
- Issued Date
- 2024
- Type
- Thesis
- URI
- https://scholar.gist.ac.kr/handle/local/19397
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.