OAK

Representation learning methodologies for bulk and single-cell RNA sequence and chemical data

Metadata Downloads
Author(s)
Sejin Park
Type
Thesis
Degree
Doctor
Department
정보컴퓨팅대학 전기전자컴퓨터공학과
Advisor
Lee, Hyunju
Abstract
The methodologies for handling biological data have a significant impact on addressing diverse biological problems. In this thesis, I propose three machine learning approaches tailored to different biological data modalities: bulk RNA-sequence, single-cell RNA-sequence (scRNA-seq), and molecular structures. First, I developed a cancer drug response prediction model based on gene embeddings: gene embedding-based fully connected neural networks (GEN). Unlike conventional one-hot encoding approaches, this method employs gene embedding within fully connected neural networks, allowing the use of sample-specific gene sets. This representation enhances model flexibility and improves predictive performance, as validated across multiple cancer drug response datasets. Second, I present scRobust, a self-supervised learning framework designed for scRNA-seq data. By combining contrastive learning with gene expression prediction tasks within a Transformer architecture, scRobust mitigates the inherent sparsity of scRNA-seq data. The model demonstrates superior performance in cell type annotation, generates informative embeddings for clustering and biomarker detection. Finally, I developed GlintDM, a diffusion-based generative model for multi-objective drug design. GlintDM incorporates a novel skip-transition denoising strategy that integrates global and local gradients, substantially reducing computational costs. This approach enables efficient molecule generation with refined binding poses and improved satisfaction of pharmacological objectives. Experimental evaluations on CrossDocked and Binding MOAD datasets, as well as pose quality and molecular property assessments, confirm the superior performance of GlintDM over existing methods. Collectively, these contributions advance representation learning and generative modeling for biological data, offering powerful tools for precision medicine and rational drug discovery.
URI
https://scholar.gist.ac.kr/handle/local/33818
Fulltext
http://gist.dcollection.net/common/orgView/200000938135
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.