Repository Collection:

Repository Collection: https://scholar.gist.ac.kr/handle/local/7917 2025-12-08T07:47:04Z Training Strategies for End-to-End Noise-Robust Speech Recognition https://scholar.gist.ac.kr/handle/local/19852 Title: Training Strategies for End-to-End Noise-Robust Speech Recognition Author(s): Geon Woo Lee Abstract: Automatic speech recognition (ASR) systems convert speech audio signals into text and are widely used in various applications. Traditional ASR consists of an acoustic model (AM) for extracting speech features and a language model (LM) for grammar and lexicon information. Recently, end-to-end (E2E) ASR models using neural networks (NN) have outperformed modular-based architectures. However, these models often perform poorly in low signal-to-noise ratio (SNR) conditions, as they are typically developed in high SNR environments. Speech enhancement (SE) or feature enhancement modules have been studied to improve low SNR performance, but they can introduce artifacts that increase error rates. Alternatively, multi-condition training (MCT) and noise-aware training (NAT) use acoustic noise as a model condition. While MCT is simple and efficient, it has limitations in low SNR conditions. Joint training of SE and ASR models has been proposed to address these issues, but conflicting gradients and frame mismatch problems make performance improvement challenging. This dissertation proposes training approaches to mitigate these joint training problems and enhance ASR performance. First, to prevent the different tasks of the SE model and ASR model, which are two distinct models, from conflicting with each other, a training approach that separates the training procedure is proposed. The proposed training approach consists of two steps. In the first step, with the parameters of the ASR model are frozen, only the parameters of the SE model are updated using an objective function for speech quality. During this process, regularization term is applied using feature vectors extracted from the ASR encoder. Next, the parameters of both models are updated using the objective functions of SE and ASR. Secondly, to address the conflicting gradient and frame mismatch problems, an interpreting the pipeline consisting of the SE and ASR models as a teacher-student model is proposed. In other words, the ASR model is interpreted as the teacher model to leverage linguistic knowledge, and the SE model is trained using the fine-grained those information. In addition, to transfer the frame-wise linguistic information, the acoustic tokenizer is employed as surrogate model. The acoustic tokenizer is optimized to predict cluster from k-means clustering using latent vectors of the ASR encoder. The optimized acoustic tokenizer and ASR encoder, as teacher models, transfer linguistic information to the SE model, updating the parameters of the SE model. Finally, to mitigate problem of the cross-entropy used in the acoustic tokenizer, a pairwise distance-based loss function is proposed. In addition, to enhance the contextual representation, a contrastive learning-based relational representation between acoustic tokens and those sequence is proposed. First, samples with the same/different cluster in the acoustic tokenizer are defined as positive/negative samples, and a cluster-based pairwise distance-based loss is applied to optimize the acoustic tokenizer. Additionally, for contextual representation, contrastive learning is utilized to match the relationship between acoustic tokens and those sequence extracted from the acoustic tokenizer. The proposed training approaches were evaluated for ASR and SE performance using simulated noisy environments and real-world audio dataset. The proposed training approaches for noise-robust ASR achieved lower word error rates (WER) in ASR performance compared to conventaional training approaches. Moreover, in SE performance, the proposed training approaches that interpreted teacher-student model achieved improved results in speech quality-related metrics compared to a separated trained SE model. These results were indicated as the addressing of the conflicting gradient and frame mismatch problems. Furthermore, comprehensive performance evaluation were conducted to verify the effectiveness of the proposed training approach for different SE and ASR models' architectures. The proposed training approaches consistently achieved better performance compared to conventional training approach in different SE model and ASR model architectures as well. 2023-12-31T15:00:00Z Study on Unsupervised Learning and Cyber Threat Detection in Industrial Control Systems Woo-Hyun Choi Gwangju Institute of Science and Technology https://scholar.gist.ac.kr/handle/local/19758 Title: Study on Unsupervised Learning and Cyber Threat Detection in Industrial Control Systems Woo-Hyun Choi Gwangju Institute of Science and Technology Author(s): 최우현 Abstract: 산업 제어 시스템(ICS)은 제조 및 에너지와 같은 분야의 중요 인프라를 관리하는 데 필수적입니다. 이러한 시스템들의 네트워크 연결성이 증가함에 따라 사이버 보안 위험 에 대한 취약성이 높아졌습니다. 격리된 환경을 위해 설계된 ICS는 이제 중요한 위협에 노출되어 있으며, 이는 중요 인프라에 물리적 손상을 초래한 Stuxnet 공격과 같은 사례 에서확인되었습니다.전통적인 IT시스템과달리 ICS는 물리적 프로세스를 제어하므로 사이버 공격이 실제 세계에 구체적인 영향을 미칠 수 있습니다. ICS 보안의 주요 과 제는 다양한 장치와 독점 프로토콜로 구성된 이러한 시스템의 복잡성과 이질성입니다. 지속적인 운영 요구로인해 보안 업데이트 기회가 제한되어 기존 IT보안조치의 효과가 감소됩니다. 그결과, 실시간으로 알려지지 않은 위협과 이상을 탐지하기 위해 머신러닝과 비지도학습기술을 포함한 고급 접근 방식을 탐구하는 데 관심이 증가하고 있습니다. 이 논문은 두 부분으로 구성되어 있습니다. 첫 번째 부분은 비지도 기계 학습을 사용한 ICS의 이상 탐지를 검토합니다. 이 연구는 사전에 레이블이 지정된 데이터 없이 이상 행동을 식별하기 위한 복합 오토인코더 모델을 조사합니다. HIL-based Augmented ICS(HAI)의 데이터셋을 활용하여 값과 시간 모두와 관련된 이상을 탐지하는 모델의 능력을 분석합니다. 이 접근 방식은 ICS 환경 에서 시스템 신뢰성과 운영 효율성을 향상시키기 위한 지속적인 노력에 기여하는 것을 목표로 합니다. 두 번째 부분에서는 MITRE ATT&CK 프레임워크와 함께 ICS 트래픽의 이상 탐지를 위한 영과잉 포아송(ZIP) 기반 GRU 학습 모델을 탐구합니다. 모델의 성능은 ’Stuxnet’ 과 ’Industroyer’라는 두가지 주요 사이버 공격 시나리오 시뮬레이션을 통해 평가 되었습니다. 탐지된 이상을 MITRE ATT&CK 프레임워크에 매핑함으로써, 이 연구는 이러한 공격에 대한 정보에 입각한 대응 전략 개발에 기여하고자 합니다. 이 연구는 ICS 보안의 지속적인 과제를 다루며,진화하는 사이버위협으로부터 이러한 중요시스템을 보호하기 위한 잠재적 접근 방식을 연구합니다.|Industrial Control Systems (ICS) are essential for managing critical infrastructure in sectors such as manufacturing and energy. The increasing connectivity of these systems to networks has heightened their vulnerability to cybersecurity risks. Originally designed for isolated environments, ICS are now exposed to significant threats, as evidenced by incidents like the Stuxnet attack, which resulted in physical damage to critical infrastructure. Unlike traditional IT systems, ICS control physical processes, meaning cyberattacks can have tangible, real-world impacts. A significant challenge in ICS security is the complexity and heterogeneity of these systems, which comprise diverse devices and proprietary protocols. The requirement for continuous operation limits opportunities for security updates, reducing the effectiveness of conventional IT security measures. As a result, there is growing interest in exploring advanced approaches, including machine learning and unsupervised learning techniques, for real- time detection of unknown threats and anomalies. This dissertation examines two key topics. The first examines anomaly detection in ICS using unsupervised machine learning. The study investigates a composite autoencoder model for identifying anomalous behavior without pre-labeled data. Utilizing a dataset from HIL-based Augmented ICS (HAI), the research analyzes the model’s capacity to detect anomalies related to both value and time. This approach aims to contribute to the ongoing efforts to enhance system reliability and operational efficiency in ICS environments. The second focuses on a Zero Inflated Poisson (ZIP) based GRU Learning model for anomaly detection in ICS traffic, in conjunction with the MITRE ATT&CK framework. The model’s performance was evaluated through simulations of two major cyberattack scenarios, Stuxnet and Industroyer. By mapping detected anomalies to the MITRE ATT&CK framework, the study seeks to contribute to the development of more in- formed response strategies for such attacks. This research addresses the ongoing challenges in ICS security and studies potential approaches to enhance the protection of these critical systems against evolving cyber threats. 2024-12-31T15:00:00Z Study on Probabilistic Graphical Modeling and Continual Learning for Cyber Security Defense https://scholar.gist.ac.kr/handle/local/19750 Title: Study on Probabilistic Graphical Modeling and Continual Learning for Cyber Security Defense Author(s): Hyejin Kim Abstract: With the development of information and communication technology, modern society is hyper-connected through the Internet. Although the hyper-connected society makes our lives more convenient and efficient, applications and data that we use have several faults or vulnerabilities that can be exploited for cyberattacks. In addition, recent cybersecurity threats have advanced significantly to bypass or nullify conventional security defense systems. To counter advanced cyber attacks, proactive defense techniques and artificial intelligence (AI) based cybersecurity applications are considered promising solutions. Especially, it is required for AI-based cybersecurity applications to properly update or retain the threat knowledge well, as new types of vulnerabilities and cyber threats have been increasing over time. In this dissertation, I propose an efficient proactive defense technique, a novel continual learning (CL) method, and CL-based cybersecurity applications for cybersecurity defense. In the first part of this dissertation, I develop a time-based moving target defense (MTD) using Bayesian attack graph (BAG) analysis. The MTD is a proactive cybersecurity defense technique that constantly changes potentially vulnerable points to be attacked, to confuse the attackers, making it difficult for attackers to infer the system configuration and nullify reconnaissance activities to a victim system. I consider an MTD strategy for a software-defined networking (SDN) environment where every SDN switch is controlled by a central SDN controller. As the MTD may incur excessive usage of the network/system resources for cybersecurity purposes, I propose to perform the MTD operations adaptively according to the security risk assessment based on a BAG analysis. For accurate BAG analysis, I model random and weakest-first attack behaviors and incorporate the derived analytical models into the BAG analysis. Using the BAG analysis result, I formulate a knapsack problem to determine the optimal set of vulnerabilities to be reconfigured under a constraint of SDN reconfiguration overhead. The experiment results prove that the proposed MTD strategy outperforms the full MTD and random MTD counterparts in terms of the maximum/average of attack success probabilities and the number of SDN reconfiguration updates. In the second part of this dissertation, I develop a CL method (called CLiCK), a hybrid of an architecture-based approach that increments a model when it detects that the dataset characteristics have changed significantly, and a rehearsal-based approach that exploits an episodic memory to store past dataset samples. The proposed CLiCK makes the final decision by taking the ensemble of the inference results for the current and a series of past models. A novelty of CLiCK is to introduce a concept for a slack class, which is an auxiliary class to represent unseen or undetermined classes that do not belong to the current dataset. Because the models trained with a slack class have the capability to differentiate between the classes that they were trained on and unseen classes, the inference results of the models that do not have knowledge about input can be automatically neglected in the final decision. Our experiments show that the proposed CLiCK achieves performance comparable to joint learning, which uses the entire dataset for each task, in domain-incremental scenarios on the MNIST dataset. In class-incremental scenarios on the MNIST and CIFAR-100 datasets, CLiCK outperforms other existing CL methods significantly. In the third part of this dissertation, I focus on a CL-based intrusion detection method for cyber security. Deep learning-based cybersecurity applications should be able to continually accumulate threat knowledge for new types of threats over time while maintaining the knowledge of threats already exposed to the application. Therefore, I propose episodic memory management for continual learning with network intrusion datasets. For new attacks, the number of samples may not be sufficiently large for training, and thus the memory management algorithm should retain as many samples as possible instead of random sampling in the episodic memory for continual learning. The experiment results indicated that the proposed algorithm outperforms offline learning in terms of average per-class accuracy in a continual scenario with a network intrusion dataset. 2023-12-31T15:00:00Z Similarity-based Deep Learning and Computer Vision Methods for Registration of Coronary Angiogram-Fluoroscopy https://scholar.gist.ac.kr/handle/local/19665 Title: Similarity-based Deep Learning and Computer Vision Methods for Registration of Coronary Angiogram-Fluoroscopy Author(s): Changhyeon Kim Abstract: Cardiovascular disease (CVD) is the second leading cause of death in Korea and the leading cause of death worldwide. CVD primarily affects the heart and major arteries, including the coronary arteries, which supply blood to the heart muscle. Coronary artery disease, a type of CVD, involves the buildup of cholesterol and fatty deposits in artery walls, leading to narrowed or blocked arteries that hinder blood flow. This condition can result in myocardial infarction, where heart muscle cells die due to lack of blood, or angina pectoris, characterized by arteries narrowed by over 70% without tissue death. In severe cases, these conditions can cause sudden death from a heart attack. Percutaneous Coronary Intervention (PCI) is a crucial treatment and diagnostic method for this disease. PCI involves using guidewires and stents to reopen blood vessels and restore normal blood flow, requiring high precision and expertise. During PCI, practitioners must manage dual-monitor imaging angiography and real-time fluoroscopy, which involves significant radiation exposure due to contrast agents. To address these challenges, the dynamic coronary roadmap (DCR) have been developed. By using ECG-gating, DCR matches angiographic images with real-time fluoroscopic images to assist surgeons. Recently, deep learning-based registration methods have been developed to cover patients with irregular heartbeats without using electrocardiograms. They used transfer learning to match images, tailored specifically to each patient. However, they need manual labeling of blood vessels before live fluoroscopy. Moreover, the transfer learning takes about ten minutes, which may not be practical in real clinical setting. In order to reduce the transfer learning time, this thesis proposes a new efficient transfer learning method that utilizes blood vessel similarities in both angiograms of a pre-trained model and the patient of interest. Similarities are computed by principal component analysis and cosine similarity. In addition, a residual U-Net architecture, employing residual blocks and leaky ReLU activation, is proposed to accelerate the learning process. These made about twice fast transfer learning (from 10 to 5 minutes). To achieve even faster registration, this thesis proposes an alternative traditional computer vision method. In this method, blood vessels are segmented by conventional computer vision techniques such as Frangi filters, etc. and careful inlet matching between the diagnostic catheter in the angiogram and the guide catheter in the fluoroscopy. Then an overlap similarity is used to best match the guidewire and the segmented blood vessel with final refinement of two registered images by RANSAC algorithm. This conventional method can register two X-ray images within one minute with comparable registration errors with previous methods across diverse clinical data. These technologies could be used as appropriate visual guidance during coronary interventions. 2023-12-31T15:00:00Z