OAK

Development of feature selection methods for cancer diagnosis and prognosis using omics data Euiyoung Oh School of Electrical Engineering and Computer Science Gwangju Institute of Science and Technology

Metadata Downloads
Abstract
In the field of cancer diagnosis and prognosis, identifying relevant feature subsets is essential to improve prediction model performance and determine biomarkers. In this dissertation, we propose two study approaches to identify significant genes associated with various types of cancer: (1) exploring the prognostic efficacy of both tumor and tumor-adjacent normal tissues, and (2) developing a deep neural network architecture for feature selection. In the first part of the dissertation, we present the prognostic efficacy of transcriptomic data from both tumor and adjacent normal tissues, utilizing The Cancer Genome Atlas (TCGA) dataset. By applying Cox regression models for prognostic analysis and machine learning models for survival prediction, the study demonstrates that for cancers such as kidney, liver, and head and neck, adjacent normal tissues exhibit a higher proportion of prognostic genes and outperform tumor tissues in survival prediction accuracy. Moreover, a distance correlation-based feature selection method was applied to external datasets for kidney and liver cancer, further confirming that genes selected from adjacent normal tissues consistently showed better prediction performance improvement than those from tumor tissues. These findings suggest that adjacent normal tissues may provide valuable insights into cancer prognosis, positioning them as potential targets for biomarker discovery. In the second part of the dissertation, we present a novel machine learning-based feature selection method called “Deep neural network with PaIrwise connected layers integrated with stochastic Gates” (DeepPIG) to address the challenge of selecting relevant features from complex omics data, particularly when feature signals are weak. Built upon the knockoff filter framework, DeepPIG is designed to enhance the detection power of relevant features without violating the false discovery rate (FDR) threshold. In comparison with baseline and recent models, such as Deep feature selection using Paired-Input Nonlinear Knockoffs (DeepPINK) and SHapley Additive exPlanations (SHAP), DeepPIG demonstrated superior detection power on synthetic datasets, particularly in cases where feature signals were subtle. Furthermore, in real-world applications, including cancer prognosis prediction and microbiome and single-cell data classification tasks, DeepPIG consistently outperformed traditional models in selecting relevant features and improving classification performance. The model’s robustness, especially when feature signals are weak, highlights its potential utility in a variety of high-dimensional biological data analyses. This dissertation highlights the potential of both tumor-adjacent normal tissues and the novel DeepPIG model as valuable tools in the field of cancer diagnosis and prog- nosis. These findings can enhance prognostic insights from high-dimensional biological data, improving model accuracy and supporting more precise biomarker discovery.
Author(s)
오의영
Issued Date
2025
Type
Thesis
URI
https://scholar.gist.ac.kr/handle/local/19137
Alternative Author(s)
Euiyoung Oh
Department
대학원 전기전자컴퓨터공학부
Advisor
Lee, Hyunju
Table Of Contents
Abstract i
감 사 의 글 iv
List of Contents vi
List of Tables viii
List of Figures ix
List of Algorithms x
1 Introduction 1
1.1 Introduction 1
1.2 Problem Statement 2
1.3 Proposed Approach 3
2 Background and Related Works 4
2.1 Tumor-adjacent normal tissues as cancer prognostic markers 4
2.2 Feature selection based on knockoff framework 5
3 Survival analysis using transcriptomic data revealed that tumor-adjacent
normal tissues harbor prognostic information on multiple cancer types 6
3.1 Materials and Methods 6
3.1.1 Study design 6
3.1.2 Datasets and preprocessing 8
3.1.3 Identification of differentially expressed genes and their expression ratio 9
3.1.4 Data screening via distance correlation 10
3.1.5 Survival prediction model and evaluation 10
3.1.6 Functional annotation 12
3.2 Results 12
3.2.1 Survival analysis with clinical data, gene expression data of tu-
mor and normal tissues, and expression ratio of DEGs 12
3.2.2 Prognostic values of selected features for kidney and liver cancer 16
3.2.3 Functional annotation of survival-related genes 19
3.3 Discussion 21
4 DeepPIG: deep neural network architecture with pairwise connected
layers and stochastic gates using knockoff frameworks for feature selection 24
4.1 Methods 24
4.1.1 Knockoff framework 24
4.1.2 Proposed Model 27
4.2 Simulation Studies 31
4.2.1 Synthetic data 31
4.2.2 Simulation results 32
4.3 Real Data Analysis 34
4.3.1 Transcriptomic Markers of Cancer Prognosis 34
4.3.2 Microbiome and single-cell datasets 38
4.4 Discussion 40
5 Supplementary Information 43
Summary 56
References 58
Degree
Doctor
Appears in Collections:
Department of Electrical Engineering and Computer Science > 4. Theses(Ph.D)
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.