OAK

Integrative algorithms for multi-modal biological data for classification and interpretation

Metadata Downloads
Author(s)
Sehwan Moon
Type
Thesis
Degree
Doctor
Department
대학원 전기전자컴퓨터공학부
Advisor
Lee, Hyunju
Abstract
Accurate diagnostic classification and biological interpretation are both very important
in biology and medicine, which are data-rich sciences. The integration of different
data types is challenging for high predictive accuracy of clinical phenotypes and discovery
of complex interactions. In this dissertation, we propose an unsupervised model,
a supervised model, and a loss function algorithm to integrate multi-modal biological
data.
In the first part of this dissertation, we propose a joint deep semi-non-negative matrix
factorization (JDSNMF) model, which uses a hierarchical non-linear feature extraction
approach that can capture shared latent features from the complex multi-omics data.
The extracted latent features obtained from JDSNMF enabled a variety of downstream
tasks, including disease prediction and module analysis. The proposed model is also
applicable not only to sample-matched data but also to feature-matched data, so it can
be flexibly applied in various cases. We demonstrate the capabilities of JDSNMF using
simulated data and multi-omics datasets from Alzheimer’s disease (AD) cohorts, and
evaluate the feature extraction performance in the context of classification. We identify
AD- and age-related modules from the latent matrices using an explainable artificial
intelligence and regression model.
In the second part of this dissertation, we propose a novel multi-task attention
learning algorithm for multi-omics data termed MOMA, which captures important
biological processes for high diagnostic performance and interpretability. MOMA vectorizes
features and modules using a novel geometric approach and focuses on important
modules in multi-omics data via an attention mechanism. Experiments on public data
on AD and cancer datasets with various classification tasks demonstrate the superior
performance of this approach. The utility of MOMA is also verified using a comparison
experiment with an attention mechanism that is turned on or off and biological analysis.
Multi-modal learning often outperforms its unimodal counterparts by using unimodal
contributions and cross-modal interactions. However, focusing solely on integrating
multi-modally learned instances into a unified comprehensive representation often
overlooks the unimodal characteristics. In real data, the contributions of modalities can
vary from instance to instance, and they often reinforce or conflict with each other. In
the third part of this dissertation, we introduce a novel MultiModal loss for multi-modal
learning by subgrouping instances according to their unimodal contributions. This loss
is empirically shown to perform better on one synthetic and four real datasets, and
we validate that it accelerates convergence. Furthermore, we show that it generates a
reliable prediction score for each modality, which is essential for subgrouping.
URI
https://scholar.gist.ac.kr/handle/local/19411
Fulltext
http://gist.dcollection.net/common/orgView/200000883798
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.