Development of feature selection methods for cancer diagnosis and prognosis using omics data Euiyoung Oh School of Electrical Engineering and Computer Science Gwangju Institute of Science and Technology
- Author(s)
- 오의영
- Type
- Thesis
- Degree
- Doctor
- Department
- 대학원 전기전자컴퓨터공학부
- Advisor
- Lee, Hyunju
- Abstract
- In the field of cancer diagnosis and prognosis, identifying relevant feature subsets is essential to improve prediction model performance and determine biomarkers. In this dissertation, we propose two study approaches to identify significant genes associated with various types of cancer: (1) exploring the prognostic efficacy of both tumor and tumor-adjacent normal tissues, and (2) developing a deep neural network architecture for feature selection. In the first part of the dissertation, we present the prognostic efficacy of transcriptomic data from both tumor and adjacent normal tissues, utilizing The Cancer Genome Atlas (TCGA) dataset. By applying Cox regression models for prognostic analysis and machine learning models for survival prediction, the study demonstrates that for cancers such as kidney, liver, and head and neck, adjacent normal tissues exhibit a higher proportion of prognostic genes and outperform tumor tissues in survival prediction accuracy. Moreover, a distance correlation-based feature selection method was applied to external datasets for kidney and liver cancer, further confirming that genes selected from adjacent normal tissues consistently showed better prediction performance improvement than those from tumor tissues. These findings suggest that adjacent normal tissues may provide valuable insights into cancer prognosis, positioning them as potential targets for biomarker discovery. In the second part of the dissertation, we present a novel machine learning-based feature selection method called “Deep neural network with PaIrwise connected layers integrated with stochastic Gates” (DeepPIG) to address the challenge of selecting relevant features from complex omics data, particularly when feature signals are weak. Built upon the knockoff filter framework, DeepPIG is designed to enhance the detection power of relevant features without violating the false discovery rate (FDR) threshold. In comparison with baseline and recent models, such as Deep feature selection using Paired-Input Nonlinear Knockoffs (DeepPINK) and SHapley Additive exPlanations (SHAP), DeepPIG demonstrated superior detection power on synthetic datasets, particularly in cases where feature signals were subtle. Furthermore, in real-world applications, including cancer prognosis prediction and microbiome and single-cell data classification tasks, DeepPIG consistently outperformed traditional models in selecting relevant features and improving classification performance. The model’s robustness, especially when feature signals are weak, highlights its potential utility in a variety of high-dimensional biological data analyses. This dissertation highlights the potential of both tumor-adjacent normal tissues and the novel DeepPIG model as valuable tools in the field of cancer diagnosis and prog- nosis. These findings can enhance prognostic insights from high-dimensional biological data, improving model accuracy and supporting more precise biomarker discovery.
- URI
- https://scholar.gist.ac.kr/handle/local/19137
- Fulltext
- http://gist.dcollection.net/common/orgView/200000826620
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.