Development of Various Deep Learning Approaches for Low-Resource Text Classification
- Abstract
- Recent advances in deep learning have demonstrated the ability to encode large and complex datasets. However, in situations where data is limited, the potential of deep learning is restricted, and models are often prone to overfitting. The issue of data scarcity is a long-standing problem in artificial intelligence research and industry. To address this, there has been a recent increase in attention towards low-resource settings. In this thesis, we explore various deep learning approaches for low-resource text classification. First, we propose transfer learning and multi-task learning approaches using external data. These approaches were applied to tasks of understanding the relationship between two sentences. Second, we propose an early stopping method that leverages unlabeled data. This allows to use all the labeled data for training, thereby improving prediction performance, particularly in low-resource settings. Third, we propose an initialization and early stopping method that utilizes only a small amount of labeled data, which can be useful when no external data is available. In this thesis, we also identify several challenges that can arise in low-resource settings and provide in-depth discussions. We conduct extensive experiments on 16 public datasets, including various tasks such as sentence and document classification, and relationship classification between sentences. We compare our proposed approaches with various competitive methods, and the results show that our approaches achieve state-of-the-art performance in their respective tasks and mitigate the challenges of low-resource settings. This study provides empirical knowledge that will be useful for future efforts in dealing with low-resource data.
- Author(s)
- Choi, HongSeok
- Issued Date
- 2023
- Type
- Thesis
- URI
- https://scholar.gist.ac.kr/handle/local/19175
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.