Development of embedding-based deep learning models for predicting drug responses Yewon Kim Gwangju Institute of Science and Technology
- Abstract
- 정밀 의료에 대한 관심이 높아짐에 따라, 개별적인 유전 및 오믹스 정보에 기반하여 개인맞춤형약물반응을제공하기위한암약물반응(CDR)예측연구가진행되고있다. 그러나 기존의 딥러닝 모델은 훈련된 약물(seen drugs)에 대한 예측에 의존하며, 새로운 약물이나 훈련되지 않은 약물(unseen drugs)에 적응하는 데 한계를 보인다. 이러한 한 계를 해결하기 위해, 본 연구는 약물 분자의 특징과 오믹스 데이터를 기반으로 새로운 약물의 반응을 예측할 수 있는 모델을 개발하는 데 초점을 맞췄다. 본 연구에서 제안한 모델인 UniEN (Unified Embedding-based Neural Network for unseen drugs)은 훈련 과정에서 포함되지 않은 약물의 반응을 예측한다. UniEN은 세포 주에 대한 개별 유전자 세트를 선택하고, 높은 성능을 보이는 기존 모델들의 임베딩을 활용하여 세포주와 약물의 특징을 극대화한다. 또한, UniEN은 어텐션 메커니즘을 활용 해 세포주, 유전자, 약물 임베딩을 통합하여 약물과 오믹스 각각의 복잡한 상호작용을 효과적으로 포착한다. 실험을 통해 UniEN 모델은 기존의 기준 모델들인 gene embedding-based fully con- nected neural networks (GEN), DeepCDR, DeepTTA와비교해정확도,재현율,정밀도, F1 점수를 포함한 전반적인 성능 지표에서 더 높은 성능을 달성했다. 이를 통해 임베딩 – iii – 조합의 선택이 예측 성능 향상에 중요한 역할을 한다는 것을 발견했다. 그중에서도 유전 자 임베딩과 약물 임베딩은 unseen drug의 예측 정확도를 크게 향상시키는데 기여하며 복잡한 상호작용 모델링에서 중요하게 작용함을 보여준다. 이러한 결과는 임베딩 기반 접근방식이개인화된정밀의료를발전시키고,신약개발을촉진하며,약물치료전략을 최적화하는 데 기여할 가능성을 제시한다. ©2025 김 예 원 ALL RIGHTS RESERVED|With the growing interest in precision medicine, cancer drug response (CDR) pre- diction studies have been conducted to provide personalized drug responses based on individual genetic and omics information. However, conventional deep learning mod- els rely on predictions for “seen drugs” they were trained on, limiting their ability to adapt to novel or unseen drugs. To address this limitation, our study focused on unseen drugs by developing a model capable of predicting their responses based on molecular features and omics data. Our proposed model, the Unified Embedding-based Neural Network for unseen drugs (UniEN), predicts the response of drugs that were not included in the training process. UniEN utilizes individual gene sets for cell lines and enhances the features of cell lines and drugs by employing embedding vectors from high-performing existing models. To capture complex interactions between drugs and omics, UniEN integrates gene, drug, and cell embeddings through attention mechanisms. Experimental results demonstrated that our model achieved higher overall perfor- mance metrics, including accuracy, recall, precision and F1 score compared to baseline models such as gene embedding-based fully connected neural networks (GEN), Deep- CDR, and DeepTTA. Notably, we found that the combination of gene, drug, and cell embeddings played a crucial role in enhancing prediction performance. In particular, gene embeddings and drug embeddings significantly contributed to improving unseen CDR prediction performances, highlighting their importance in modeling complex in- teractions. These findings underscore the potential of embedding-based approaches to advance personalized precision medicine, facilitate novel drug development, and opti- mize drug therapy strategies. ©2025 Yewon Kim ALL RIGHTS RESERVED
- Author(s)
- 김예원
- Issued Date
- 2025
- Type
- Thesis
- URI
- https://scholar.gist.ac.kr/handle/local/19136
- Alternative Author(s)
- Yewon Kim
- Department
- 대학원 AI대학원
- Advisor
- Lee, Hyunju
- Table Of Contents
- Abstract (English) i
Abstract (Korean) iii
List of Contents v
List of Tables vii
List of Figures viii
1 Introduction 1
1.1 Introduction 1
1.2 Related work 3
2 Methods 7
2.1 Data description 7
2.1.1 GDSC 7
2.1.2 Drug response and toxicity 8
2.2 Feature selection 9
2.3 Gene embeddings 10
2.3.1 Gene2vec 10
2.4 Drug embeddings 11
2.4.1 GIN 11
2.4.2 Chemformer 11
2.5 Cell embeddings 14
2.5.1 scFoundation 14
2.6 UniEN: Unified embeddings network for unseen drug 15
2.7 Additional models 19
2.8 Evaluation 20
3 Results 23
3.1 UniEN outperforms in classification tasks 23
3.2 Performance improvements using Chemformer drug embeddings 23
3.3 Random gene embeddings and Gene2vec gene embeddings 24
3.4 Adapting scRNA-seq weight-based cell embeddings 25
– v –
3.5 Ablation study 26
4 Discussion and Conclusion 28
References 30
A Supplementary Figure 35
B Supplementary Table 38
– vi –
- Degree
- Master
-
Appears in Collections:
- Department of AI Convergence > 3. Theses(Master)
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.