OAK

GIST Library Login

검색

GIST Repository College of Information and Computing Department of AI Convergence 3. Theses(Master)

Relation-level Regularization for Recurrent Neural Networksa

Metadata Downloads

Abstract: In modern deep learning, there are two mainstream architectures dealing with sequential data: Transformers and Recurrent Neural Networks. Despite the remarkable success of transformers, it suffers from long-distance forecasting partly due to the complex, non-deterministic temporal dynamics across time steps. This limitation of the transformer is crucial, especially because of that the characteristic of sequential data consists of various relations. And here, we use the term ``relation" as a set of input and output pairs in every time step having a similar relationship. And those relations repeatedly occur in sequential data.
In contrast, RNNs can handle longer sequences without increasing the model size due to their recursive nature. It is also intuitively obvious that the recurrent nature of RNNs would learn those relations much better than transformers. Thus, it is reasonable to use RNNs over transformers on sequential data, but RNNs work poorly compared to transformers.
In this work, we pointed out that the regularizations on RNNs need to make better use of these advantages of RNNs to learn recurrent relations in sequential data. We aim to strengthen these advantages and alleviate the disadvantages of RNNs. We hypothesized that RNNs' underestimated performance is partly due to their simple architecture and regularization methods.
We proposed novel global and local regularization methods which make it possible to separately regularize the cardinality of relations and the complexity of each relation. The proposed regularization methods can be applied to any RNN-based model with a simple extension of that architecture. This extension of the model explicitly distinguishes recursively occurring relations and learns to choose relation adaptively at every inference time step.
To analyze the proposed extended model and regularization methods, we designed a toy task with Binary Counter datasets. The experimental results showed that the proposed extended architecture, proto-LSTM cells with relation loader, can solely improve the robustness over hyper-parameter sweeping in longer prediction. The performance on longer prediction of the proto-LSTMs with relation loader trained with proposed regularization methods outperformed transformer and other RNN-based baselines in terms of both accuracy and robustness.

Author(s): Juhyeon Nam

Issued Date: 2023

Type: Thesis

URI: https://scholar.gist.ac.kr/handle/local/19635

Alternative Author(s): 남주현

Department: 대학원 AI대학원

Advisor: Kim, Kangil

Degree: Master

Appears in Collections:: Department of AI Convergence > 3. Theses(Master)

메타데이터 간략히 보기메타데이터 전체 보기

공개 및 라이선스

공개 구분공개

qrcode

트윗하기

OAK GIST Repository는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.