OAK

Data-driven artificial intelligence models for efficient hit identification

Metadata Downloads
Abstract
Hit identification, which screens chemical compounds showing bioactivities to target proteins, plays a crucial role in early drug discovery. However, while successful hit identification can lead to advantages in further drug development, screening massive chemical libraries is still exhaustive and cost-consuming. Nowadays, thanks to large-size of pharmaceutical data and advances in computational technique, in-silico models have been developed to aid hit identification.
In this study, I first built a deep learning model, DeepConv-DTI, to predict drug-target interaction (DTI) with chemical fingerprints and protein sequences. Then, I applied convolutional neural networks (CNN) on the protein sequence to capture significant sequence motifs interacting with ligands. As a result, I showed that CNN on protein sequences outperforms previous machine learning models. Moreover, DeepConv-DTI can learn significant binding motifs by the model itself.
Secondly, I explicitly trained the model to predict the binding motifs and DTI together. We parsed 3D-complexes to extract binding motifs on the protein sequence. The model can use CNNs and transformers to highlight binding regions on the target sequence (HoTS) for the DTIs. HoTS outperforms DeepConv-DTI and previous deep learning models for the DTI prediction performance. In addition, HoTS shows a similar performance of prediction of binding sites, although it does not utilize any 3D-complex. Finally, I recruited HoTS to help hit the identification of the P2X3 receptor. HoTS successfully found binding sites of the P2X3 receptor and 11 hits from 150 thousand compounds. HoTS found more hits with different structures than the previous computer-aided drug discovery process.
Finally, hit can be identified from the genotype and corresponding drug-response (phenotype) of the cell line. However, for more elaborated modeling of bioactivity from the genotype, computational modeling for cell lines should consider systematic interactions of genes. I constructed a deep learning model (NeST-E) regarding hierarchical interactions between biological systems. The biological systems, genes, and compounds are embedded into the NeST embedding space, and the attention mechanisms between embeddings represent the interactions between biological entities. The model outperforms the previous drug-response prediction model and successfully finds the mechanism of erlotinib, a non-small cell lung cancer, and pancreatic cancer drug.
Author(s)
Ingoo Lee
Issued Date
2023
Type
Thesis
URI
https://scholar.gist.ac.kr/handle/local/19049
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.