Exploration for Combining Fine-tuning Methods in Abstactive Summarization
- Author(s)
- UkDong Gim
- Type
- Thesis
- Degree
- Master
- Department
- 대학원 전기전자컴퓨터공학부
- Advisor
- Lee, Hyunju
- Abstract
- Motivation Owing to the advent of the big data era, people can select only a few important data from large amounts of information, and text data account for a considerable portion of big data. However, for text data, if a machine automatically extracts important text information from the source and provides a compressive, high-quality summary of the source, we can obtain more information that is highly concentrated in a short time.
Methods Five methods were used to fine-tune the BART model to perform abstractive text sum-marization. The R3F method, suggested by Aghajanyan et al., was applied with no change. We used the concepts of ROUGE-based reinforcement learning, ROUGE-based validation, and Cosine-similarity loss, established in other studies, and applied them with slight modifications. We sug-gested the mid-epoch validation method. We conducted ablation studies in the low-resource and full-data environments with various combinations of these five methods.
Results In the low-resource environments, Xsum and CNNDM, the +CS+Rval method showed the best performance for both datasets with an improvement of (+4.95, +2.65, +4.38) and (+1.7, +0.42, +1.2) respectively for the (R1, R2, R-L) average scores from the baseline. For the refinement process, +Rval+Midval and +Rval showed the best results among the methods, and it increased the ROUGE scores (+0.11, -0.04, +0.13) and (+0.30, +0.67, +0.47) from the initial point, respectively. In the Xsum full-data environment, the +CS method showed the best improvement (+0.32, +0.36, +0.27) from the baseline. Finally, for full-data refinement, the +CS+R3F method showed the best scores with an improvement of (+0.95, +0.95, +1.17) after the refinement.
- URI
- https://scholar.gist.ac.kr/handle/local/33342
- Fulltext
- http://gist.dcollection.net/common/orgView/200000905835
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.