OAK

Relation-level Regularization for Recurrent Neural Networksa

Metadata Downloads
Abstract
In modern deep learning, there are two mainstream architectures dealing with sequential data: Transformers and Recurrent Neural Networks. Despite the remarkable success of transformers, it suffers from long-distance forecasting partly due to the complex, non-deterministic temporal dynamics across time steps. This limitation of the transformer is crucial, especially because of that the characteristic of sequential data consists of various relations. And here, we use the term ``relation" as a set of input and output pairs in every time step having a similar relationship. And those relations repeatedly occur in sequential data.
In contrast, RNNs can handle longer sequences without increasing the model size due to their recursive nature. It is also intuitively obvious that the recurrent nature of RNNs would learn those relations much better than transformers. Thus, it is reasonable to use RNNs over transformers on sequential data, but RNNs work poorly compared to transformers.
In this work, we pointed out that the regularizations on RNNs need to make better use of these advantages of RNNs to learn recurrent relations in sequential data. We aim to strengthen these advantages and alleviate the disadvantages of RNNs. We hypothesized that RNNs' underestimated performance is partly due to their simple architecture and regularization methods.
We proposed novel global and local regularization methods which make it possible to separately regularize the cardinality of relations and the complexity of each relation. The proposed regularization methods can be applied to any RNN-based model with a simple extension of that architecture. This extension of the model explicitly distinguishes recursively occurring relations and learns to choose relation adaptively at every inference time step.
To analyze the proposed extended model and regularization methods, we designed a toy task with Binary Counter datasets. The experimental results showed that the proposed extended architecture, proto-LSTM cells with relation loader, can solely improve the robustness over hyper-parameter sweeping in longer prediction. The performance on longer prediction of the proto-LSTMs with relation loader trained with proposed regularization methods outperformed transformer and other RNN-based baselines in terms of both accuracy and robustness.
Author(s)
Juhyeon Nam
Issued Date
2023
Type
Thesis
URI
https://scholar.gist.ac.kr/handle/local/19635
Alternative Author(s)
남주현
Department
대학원 AI대학원
Advisor
Kim, Kangil
Degree
Master
Appears in Collections:
Department of AI Convergence > 3. Theses(Master)
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.