OAK

GIST Library Login

Metadata Downloads

Abstract: Recently, Natural Language Processing (NLP) is rapidly progressing using deep learning and is performing better than humans in various fields. In most deep learning methodologies, large amounts of data generalize the model and improve performance. In most cases, however, there may be insufficient amounts of data, resulting in incomplete learning or overfitting. Many research and experiments use data augmentation techniques to alleviate this problem. In this thesis, we propose a self-supervised data augmentation method regarding the context of the data. The proposed method looks for words that can take the place of the original words by considering the context. For the proposed method, We use a Mask Language Model (MLM) that learns by hiding certain words within sentences and finding the original words. However, since the MLM cannot contain the label information of the data, we propose a Label-Masked Language Model (LMLM) that can contain label information. The proposed method uses LMLM to perform self-supervised learning, then implements data augmentation via the trained model. The experimental results on the text classification benchmark dataset show that the proposed method enhances the performance of recurrent neural networks and convolutional neural network-based classifiers.

Appears in Collections:: Department of Electrical Engineering and Computer Science > 3. Theses(Master)

공개 및 라이선스

qrcode

OAK GIST Scholar는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.