3D Convolutional Network with Attention Gate for Action Recognition in Videos
- Abstract
- Human action recognition is an active field in computer vision tasks. It is mostly based on the extensively developed image recognition algorithm using convolutional neural networks(CNNs) or recurrent neural networks (RNNs). However, these methods are computationally expensive and are complex to build for learning spatio-temporal dependencies. Action recognition is considered a more challenging task than image recognition as a video consists of an image sequence that changes in every frame, and the model has to deal with both spatial and temporal information simultaneously. Recently proposed methods using the two-stream fusion technique show good performance in such tasks. This approach aims to propose a simple yet efficient deep neural network architecture, Gated 3D-CNN, which consists of 3D convolutional layers and gating modules to act as an LSTM model for learning spatial and temporal dependencies and give attention to essential features. The proposed method first learns spatial and temporal features of actions through 3D-CNN, and then the sigmoid gated
3D convolution layers help to locate attention to essential features of the action. The proposed architecture is comparatively simpler to implement and gives a competitive performance on the UFC-101 dataset.
- Author(s)
- Labina Shrestha
- Issued Date
- 2022
- Type
- Thesis
- URI
- https://scholar.gist.ac.kr/handle/local/18810
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.