OAK

GIST Library Login

Metadata Downloads

Abstract: Human action recognition is an active field in computer vision tasks. It is mostly based on the extensively developed image recognition algorithm using convolutional neural networks(CNNs) or recurrent neural networks (RNNs). However, these methods are computationally expensive and are complex to build for learning spatio-temporal dependencies. Action recognition is considered a more challenging task than image recognition as a video consists of an image sequence that changes in every frame, and the model has to deal with both spatial and temporal information simultaneously. Recently proposed methods using the two-stream fusion technique show good performance in such tasks. This approach aims to propose a simple yet efficient deep neural network architecture, Gated 3D-CNN, which consists of 3D convolutional layers and gating modules to act as an LSTM model for learning spatial and temporal dependencies and give attention to essential features. The proposed method first learns spatial and temporal features of actions through 3D-CNN, and then the sigmoid gated
3D convolution layers help to locate attention to essential features of the action. The proposed architecture is comparatively simpler to implement and gives a competitive performance on the UFC-101 dataset.

Appears in Collections:: Department of Electrical Engineering and Computer Science > 3. Theses(Master)

공개 및 라이선스

qrcode

OAK GIST Scholar는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.