OAK

Adversarial Continual Learning to Transfer Self-Supervised Speech Representations for Voice Pathology Detection

Metadata Downloads
Abstract
In recent years, voice pathology detection (VPD) has received considerable attention because of the increasing risk of voice problems. Several methods, such as support vector machine and convolutional neural network-based models, achieve good VPD performance. To further improve the performance, we use a self-supervised pretrained model as feature representation instead of explicit speech features. When the pretrained model is fine-tuned for VPD, an overfitting problem occurs due to a domain shift from conversation speech to the VPD task. To mitigate this problem, we propose an adversarial task adaptive pretraining (A-TAPT) approach by incorporating adversarial regularization during the continual learning process. Experiments on VPD using the Saarbrucken Voice Database show that the proposed A-TAPT improves the unweighted average recall (UAR) by an absolute increase of 12.36% and 15.38% compared with SVM and ResNet50, respectively. It is also shown that the proposed A-TAPT achieves a UAR that is 2.77% higher than that of conventional TAPT learning.
Author(s)
Park, DongkeonYu, YechanKatabi, DinaKim, Hong Kook
Issued Date
2023-07
Type
Article
DOI
10.1109/LSP.2023.3298532
URI
https://scholar.gist.ac.kr/handle/local/10111
Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Citation
IEEE SIGNAL PROCESSING LETTERS, v.30, pp.932 - 936
ISSN
1070-9908
Appears in Collections:
Department of Electrical Engineering and Computer Science > 1. Journal Articles
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.