OAK

Generative Data Augmentation Strategy Leveraging External Data for Abstractive Dialogue Summarization

Metadata Downloads
Abstract
With the proliferation of digital communication, dialogue summarization has become increasingly important. However, it still faces a shortage of data. To address this issue, we developed Generative Data Augmentation Strategy Leveraging External Data for Abstractive Dialogue summarization (GENDEX), which is based on the hypothetical foundation that texts containing people and their interpersonal interactions can potentially serve as summaries of corresponding dialogues. We filter short texts containing people and resolve coreferences for better contextual analysis. We then identify the semantic roles of words within the texts and filter them based on the patterns observed in the dialogue summarization datasets. Using these texts, we generate synthetic dialogues through a controlled generation method. To better leverage the augmented data, we utilize noise-tolerant training to fine-tune the summarization model. The experimental results demonstrate the effectiveness of our proposed method, showing its robust performance, generalizability, and scalability. Moreover, performance improvements by GENDEX were observed regardless of complexity of dialogues.
Author(s)
Sangwon Park
Issued Date
2024
Type
Thesis
URI
https://scholar.gist.ac.kr/handle/local/19328
Alternative Author(s)
박상원
Department
대학원 AI대학원
Advisor
Lee, Hyunju
Degree
Master
Appears in Collections:
Department of AI Convergence > 3. Theses(Master)
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.