Machine and Deep learning models to predict anticancer response prediction and generate cancer cell-specific drugs
- Author(s)
- Sejin Park
- Type
- Thesis
- Degree
- Master
- Department
- 대학원 전기전자컴퓨터공학부
- Advisor
- Lee, Hyunju
- Abstract
- Personalized medicine is expected to maximize the intended drug effects and minimize side effects by treating patients based on their genetic profiles. For precision oncology, predicting the drug response of a patient and generating drugs based on the disease genetic profiles are important, especially in anticancer drug discovery. In recent studies, multi-omics data have been used to improve the accuracy in drug discovery. Although multi-omics data are good resources, the large dimension of data tends to hinder performance improvement, and the number of samples is not enough to handle the large features. In addition, it is challenging to generate target molecules quickly because the vast chemical space and variations in cancer cell properties do the search for appropriate molecules a time-consuming endeavor. Therefore, a new method that effectively reduces the large dimension of data and an efficient and fast search method that considers genetic profiles are required for predicting drug response and de novo molecular design of anticancer drugs, respectively. To solve the issues, we propose two methods for anticancer drug response prediction and cancer sample-specific drugs generative model. To be specific, supervised feature extraction learning using triplet loss (Super.FELT) is for drug response prediction, and a faster molecular generative model with genetic algorithm and tree search for cancer samples (FasterGTS) is for generating cancer sample-specific drugs.
Super.FELT consists of three stages, namely, feature selection, feature encoding using a supervised method, and binary classification of drug response (sensitive or resistant). We used multi-omics data including mutation, copy number aberration, and gene expression, and these were obtained from cell lines [Genomics of Drug Sensitivity in Cancer (GDSC), Cancer Cell Line Encyclopedia (CCLE), and Cancer Therapeutics Response Portal (CTRP)], patient-derived tumor xenografts (PDX), and The Cancer Genome Atlas (TCGA). GDSC was used for training and cross-validation tests, and CCLE, CTRP, PDX, and TCGA were used for external validation. We performed ablation studies for the three stages and verified that the use of multi-omics data guarantees better performance of drug response prediction. Our results verified that Super.FELT outperformed the other methods at external validation on PDX and TCGA and was good at cross-validation on GDSC and external validation on CCLE and CTRP. In addition, through our experiments, we confirmed that using multi-omics data is useful for external non-cell line data. We have published Super.FELT in BMC bioinformatics, and source codes are available at https://github.com/DMCB-GIST/Super.FELT.
FasterGTS is constructed using a genetic algorithm and a Monte Carlo tree search with three deep neural networks, supervised learning, self-trained, and value networks, and it generates anticancer molecules based on the genetic profiles of a tumor sample. When compared to other methods, FasterGTS generated tumor sample-specific molecules with characteristic chemical properties of cancer drugs within a limited number of samplings. We expect that FasterGTS would contribute to anticancer drug generation for personalized medicine. Source codes are available at https://github.com/DMCB-GIST/FasterGTS.
- URI
- https://scholar.gist.ac.kr/handle/local/19463
- Fulltext
- http://gist.dcollection.net/common/orgView/200000884879
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.