OAK

GIST Library Login

Metadata Downloads

Abstract: The development of National Language Processing (NLP) has shown that pre-trained word embedding can extract data from sequence data and transform it into a meaningful vector representation. Several attempts have been made to evaluate pre-trained word embeddings for protein sequences, but these attempts have focused on the properties of universal proteins rather than drug-targets.
Therefore, We have proposed a drug target specific embedding model pre-trained with human protein from the UniProt database. Subsequently, we evaluated the suitability of this model for Drug Target Binding Affinity(DTA) prediction and Drug Target Interaction(DTI) prediction tasks. In both tasks, a new model was created by replacing the protein representation part of the other DTI prediction model with our proposed model, and the performance was evaluated with this newly created model. In addition, ablation studies were conducted to evaluate the importance of factors that influenced the pre-training learning process.
As a result, both classification and regression showed improved performance, especially in the unseen target dataset composed of proteins not included in training. Therefore, it was confirmed through this study that pre-training the embedding model with human proteins without DTI label could improve the performance of DTI.

Appears in Collections:: Department of Electrical Engineering and Computer Science > 3. Theses(Master)

공개 및 라이선스

qrcode

OAK GIST Scholar는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.