OAK

GIST Library Login

GIST Scholar College of Information and Computing Department of Electrical Engineering and Computer Science 4. Theses(Ph.D)

Integrative algorithms for multi-modal biological data for classification and interpretation

Metadata Downloads

Author(s): Sehwan Moon

Type: Thesis

Degree: Doctor

Department: 대학원 전기전자컴퓨터공학부

Advisor: Lee, Hyunju

Abstract: Accurate diagnostic classification and biological interpretation are both very important
in biology and medicine, which are data-rich sciences. The integration of different
data types is challenging for high predictive accuracy of clinical phenotypes and discovery
of complex interactions. In this dissertation, we propose an unsupervised model,
a supervised model, and a loss function algorithm to integrate multi-modal biological
data.
In the first part of this dissertation, we propose a joint deep semi-non-negative matrix
factorization (JDSNMF) model, which uses a hierarchical non-linear feature extraction
approach that can capture shared latent features from the complex multi-omics data.
The extracted latent features obtained from JDSNMF enabled a variety of downstream
tasks, including disease prediction and module analysis. The proposed model is also
applicable not only to sample-matched data but also to feature-matched data, so it can
be flexibly applied in various cases. We demonstrate the capabilities of JDSNMF using
simulated data and multi-omics datasets from Alzheimer’s disease (AD) cohorts, and
evaluate the feature extraction performance in the context of classification. We identify
AD- and age-related modules from the latent matrices using an explainable artificial
intelligence and regression model.
In the second part of this dissertation, we propose a novel multi-task attention
learning algorithm for multi-omics data termed MOMA, which captures important
biological processes for high diagnostic performance and interpretability. MOMA vectorizes
features and modules using a novel geometric approach and focuses on important
modules in multi-omics data via an attention mechanism. Experiments on public data
on AD and cancer datasets with various classification tasks demonstrate the superior
performance of this approach. The utility of MOMA is also verified using a comparison
experiment with an attention mechanism that is turned on or off and biological analysis.
Multi-modal learning often outperforms its unimodal counterparts by using unimodal
contributions and cross-modal interactions. However, focusing solely on integrating
multi-modally learned instances into a unified comprehensive representation often
overlooks the unimodal characteristics. In real data, the contributions of modalities can
vary from instance to instance, and they often reinforce or conflict with each other. In
the third part of this dissertation, we introduce a novel MultiModal loss for multi-modal
learning by subgrouping instances according to their unimodal contributions. This loss
is empirically shown to perform better on one synthetic and four real datasets, and
we validate that it accelerates convergence. Furthermore, we show that it generates a
reliable prediction score for each modality, which is essential for subgrouping.

URI: https://scholar.gist.ac.kr/handle/local/19411

Fulltext: http://gist.dcollection.net/common/orgView/200000883798

Alternative Author(s): 문세환

Appears in Collections:: Department of Electrical Engineering and Computer Science > 4. Theses(Ph.D)

메타데이터 간략히 보기메타데이터 전체 보기

공개 및 라이선스

공개 구분공개

qrcode

트윗하기

OAK GIST Scholar는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.