OAK

Protein Sequence Embedding Using a Pre-training Deep Learning Model

Metadata Downloads
Author(s)
Kibeob Kim
Type
Thesis
Degree
Master
Department
대학원 전기전자컴퓨터공학부
Advisor
Nam, Hojung
Abstract
The development of National Language Processing (NLP) has shown that pre-trained word embedding can extract data from sequence data and transform it into a meaningful vector representation. Several attempts have been made to evaluate pre-trained word embeddings for protein sequences, but these attempts have focused on the properties of universal proteins rather than drug-targets.
Therefore, We have proposed a drug target specific embedding model pre-trained with human protein from the UniProt database. Subsequently, we evaluated the suitability of this model for Drug Target Binding Affinity(DTA) prediction and Drug Target Interaction(DTI) prediction tasks. In both tasks, a new model was created by replacing the protein representation part of the other DTI prediction model with our proposed model, and the performance was evaluated with this newly created model. In addition, ablation studies were conducted to evaluate the importance of factors that influenced the pre-training learning process.
As a result, both classification and regression showed improved performance, especially in the unseen target dataset composed of proteins not included in training. Therefore, it was confirmed through this study that pre-training the embedding model with human proteins without DTI label could improve the performance of DTI.
URI
https://scholar.gist.ac.kr/handle/local/33066
Fulltext
http://gist.dcollection.net/common/orgView/200000909022
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.