Action Recognition in Unconstrained Condition and Its Application
- Abstract
- Human action recognition has become one of the most popular research studies in recent years with the aim of classifying human action from data. Achieving promising progress in the area of computer vision, it has played an increasingly crucial role in many potential applications such as video surveillance, human biometrics, human-computer interactions and video retrieval which impact all areas of human life. Despite ongoing successes realized, however, most of the existing methods have been able to classify trained domains only. For instance, a model will not able to classify ac tions which are not seen in the training step. Also, a model will likely confuse the action if the given data is noised, leading to inaccuracies. To address these issues, we propose the following three approaches: 1) Feature extraction and distance metric learning, 2) Spatio-temporal representation Matching (STRM) for joint learning of appearance and motion and 3) Predictively encoded graph convolutional network (PeGCN) based on shared information . The first proposed framework consists of two models: 1) ST-feature extraction model and 2) verification model. The ST-feature extraction model extracts discriminative ST features from a given video clip. With these features, the verification model computes the similarity between them to examine their class-identity to determine whether their classes are identical or not. The experimental results show that the proposed framework can outperform other action recognition methods under the unconstrained condition. The STRM extracts spatio-temporal representations from the video clips through a joint learning pipeline with both motion and appearance information. Then, the STRM computes the similarities between the ST-representations to find the one with the highest similarity. We set the experimental protocol for open-set action recognition and carried out experiments on UCF101 and HMDB51 to evaluate the STRM. The experimental results showed that the proposed method not only outperformed existing methods under the open-set condition, but also provided comparable performance to the state-of-the-art methods under the closed-set condition. The PeGCN learns to improve its representation ability of the noisy skeleton by predicting complete samples from noisy samples in latent space in the training step. The key insight of our approach is to train a model by maximizing the mutual in formation between normal and noisy skeletons using predictive coding in the latent space. The PeGCN increases the flexibility of GCNs and is more suitable for action recognition tasks using skeleton features. We conducted comprehensive skeleton-based action recognition experiments with defective skeletons using the NTU-RGB+D and Kinetics-Skeleton datasets. The experimental results demonstrate that when the skele ton samples are noisy, our approach achieves outstanding performances compared with the existing state-of-the-art methods.
- Author(s)
- Yongsang Yoon
- Issued Date
- 2022
- Type
- Thesis
- URI
- https://scholar.gist.ac.kr/handle/local/18829
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.