OAK

Machine learning and multi-omics based drug candidates prediction modeling

Metadata Downloads
Author(s)
Eunyoung Kim
Type
Thesis
Degree
Doctor
Department
대학원 전기전자컴퓨터공학부
Advisor
Nam, Hojung
Abstract
Undesired drug effects, including drug toxicity and side effects, are one of the major causes of failure in the drug discovery and development process and withdrawal from the market. As identification through experiments requires tremendous time and cost, in silico modeling has been developed for drug safety screening.
In this study, I first constructed a machine learning model to predict drug-induced liver injury (DILI), one of the most frequent and concerning issues in drug toxicology. Because of a rare occurrence of DILI, two machine learning models were used to develop reliable prediction models with weighted molecular fingerprints as features. The Bayesian probability of each substructure was calculated to give weights on frequent substructures in DILI-positive compounds. The constructed Random Forest and Support Vector Machine models resulted in an accuracy of 73.8% and 72.6% in internal validation and 60.1% and 61.1% in external validation. The weighted fingerprints contributed to the performance increase, and significant substructures were reported. Finally, I applied the constructed models to predict the hepatotoxic potentials of compounds in natural products.
Secondly, the prediction of undesired drug effects from drug-drug interactions (DDIs) was considered. DDIs can result in various side effects while either drug is not harmful solely. It has been a significant concern because polypharmacy is commonly used to treat complex diseases. I developed a deep learning-based framework for DDI prediction with drug-induced gene expression signatures. Also, feature engineering using a gating mechanism was conducted to mimic co-administration effects on each paired drug and to analyze significant genes when interactions occur. Moreover, a translational embedding method and margin-based loss were adopted to learn distance-based scoring function for classification labels. As a result, the model achieved an AUC of 0.889 and an AUPR of 0.915 in unseen interaction prediction, outperforming previous methods. Also, the model could predict potential interactions with new compounds and proved interpretability by feature analysis.
URI
https://scholar.gist.ac.kr/handle/local/19466
Fulltext
http://gist.dcollection.net/common/orgView/200000883262
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.