OAK

GIST Library Login

GIST Scholar College of Information and Computing Department of Electrical Engineering and Computer Science 4. Theses(Ph.D)

Development of feature selection methods for cancer diagnosis and prognosis using omics data Euiyoung Oh School of Electrical Engineering and Computer Science Gwangju Institute of Science and Technology

Metadata Downloads

Author(s): 오의영

Type: Thesis

Degree: Doctor

Department: 대학원 전기전자컴퓨터공학부

Advisor: Lee, Hyunju

Abstract: In the field of cancer diagnosis and prognosis, identifying relevant feature subsets is essential to improve prediction model performance and determine biomarkers. In this dissertation, we propose two study approaches to identify significant genes associated with various types of cancer: (1) exploring the prognostic efficacy of both tumor and tumor-adjacent normal tissues, and (2) developing a deep neural network architecture for feature selection. In the first part of the dissertation, we present the prognostic efficacy of transcriptomic data from both tumor and adjacent normal tissues, utilizing The Cancer Genome Atlas (TCGA) dataset. By applying Cox regression models for prognostic analysis and machine learning models for survival prediction, the study demonstrates that for cancers such as kidney, liver, and head and neck, adjacent normal tissues exhibit a higher proportion of prognostic genes and outperform tumor tissues in survival prediction accuracy. Moreover, a distance correlation-based feature selection method was applied to external datasets for kidney and liver cancer, further confirming that genes selected from adjacent normal tissues consistently showed better prediction performance improvement than those from tumor tissues. These findings suggest that adjacent normal tissues may provide valuable insights into cancer prognosis, positioning them as potential targets for biomarker discovery. In the second part of the dissertation, we present a novel machine learning-based feature selection method called “Deep neural network with PaIrwise connected layers integrated with stochastic Gates” (DeepPIG) to address the challenge of selecting relevant features from complex omics data, particularly when feature signals are weak. Built upon the knockoff filter framework, DeepPIG is designed to enhance the detection power of relevant features without violating the false discovery rate (FDR) threshold. In comparison with baseline and recent models, such as Deep feature selection using Paired-Input Nonlinear Knockoffs (DeepPINK) and SHapley Additive exPlanations (SHAP), DeepPIG demonstrated superior detection power on synthetic datasets, particularly in cases where feature signals were subtle. Furthermore, in real-world applications, including cancer prognosis prediction and microbiome and single-cell data classification tasks, DeepPIG consistently outperformed traditional models in selecting relevant features and improving classification performance. The model’s robustness, especially when feature signals are weak, highlights its potential utility in a variety of high-dimensional biological data analyses. This dissertation highlights the potential of both tumor-adjacent normal tissues and the novel DeepPIG model as valuable tools in the field of cancer diagnosis and prog- nosis. These findings can enhance prognostic insights from high-dimensional biological data, improving model accuracy and supporting more precise biomarker discovery.

URI: https://scholar.gist.ac.kr/handle/local/19137

Fulltext: http://gist.dcollection.net/common/orgView/200000826620

Alternative Author(s): Euiyoung Oh

Appears in Collections:: Department of Electrical Engineering and Computer Science > 4. Theses(Ph.D)

메타데이터 간략히 보기메타데이터 전체 보기

공개 및 라이선스

공개 구분공개

qrcode

트윗하기

OAK GIST Scholar는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.