OAK

Contextual Data Augmentation with Label-Masked Language Model

Metadata Downloads
Author(s)
Dongju Park
Type
Thesis
Degree
Master
Department
대학원 전기전자컴퓨터공학부
Advisor
Ahn, Chang Wook
Abstract
Recently, Natural Language Processing (NLP) is rapidly progressing using deep learning and is performing better than humans in various fields. In most deep learning methodologies, large amounts of data generalize the model and improve performance. In most cases, however, there may be insufficient amounts of data, resulting in incomplete learning or overfitting. Many research and experiments use data augmentation techniques to alleviate this problem. In this thesis, we propose a self-supervised data augmentation method regarding the context of the data. The proposed method looks for words that can take the place of the original words by considering the context. For the proposed method, We use a Mask Language Model (MLM) that learns by hiding certain words within sentences and finding the original words. However, since the MLM cannot contain the label information of the data, we propose a Label-Masked Language Model (LMLM) that can contain label information. The proposed method uses LMLM to perform self-supervised learning, then implements data augmentation via the trained model. The experimental results on the text classification benchmark dataset show that the proposed method enhances the performance of recurrent neural networks and convolutional neural network-based classifiers.
URI
https://scholar.gist.ac.kr/handle/local/32840
Fulltext
http://gist.dcollection.net/common/orgView/200000908280
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.