OAK

Exploration for Combining Fine-tuning Methods in Abstactive Summarization

Metadata Downloads
Author(s)
UkDong Gim
Type
Thesis
Degree
Master
Department
대학원 전기전자컴퓨터공학부
Advisor
Lee, Hyunju
Abstract
Motivation Owing to the advent of the big data era, people can select only a few important data from large amounts of information, and text data account for a considerable portion of big data. However, for text data, if a machine automatically extracts important text information from the source and provides a compressive, high-quality summary of the source, we can obtain more information that is highly concentrated in a short time.
Methods Five methods were used to fine-tune the BART model to perform abstractive text sum-marization. The R3F method, suggested by Aghajanyan et al., was applied with no change. We used the concepts of ROUGE-based reinforcement learning, ROUGE-based validation, and Cosine-similarity loss, established in other studies, and applied them with slight modifications. We sug-gested the mid-epoch validation method. We conducted ablation studies in the low-resource and full-data environments with various combinations of these five methods.
Results In the low-resource environments, Xsum and CNNDM, the +CS+Rval method showed the best performance for both datasets with an improvement of (+4.95, +2.65, +4.38) and (+1.7, +0.42, +1.2) respectively for the (R1, R2, R-L) average scores from the baseline. For the refinement process, +Rval+Midval and +Rval showed the best results among the methods, and it increased the ROUGE scores (+0.11, -0.04, +0.13) and (+0.30, +0.67, +0.47) from the initial point, respectively. In the Xsum full-data environment, the +CS method showed the best improvement (+0.32, +0.36, +0.27) from the baseline. Finally, for full-data refinement, the +CS+R3F method showed the best scores with an improvement of (+0.95, +0.95, +1.17) after the refinement.
URI
https://scholar.gist.ac.kr/handle/local/33342
Fulltext
http://gist.dcollection.net/common/orgView/200000905835
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.