OAK

Host-pathogen protein-protein interaction prediction with protein language models

Metadata Downloads
Abstract
In recent years, infections caused by viruses such as SARS-CoV-2 have become a global public health concern. The interaction between hosts and pathogens plays a crucial role in these infection processes, occurring when various organisms are infected by pathogens like viruses and bacteria. Especially, viruses require protein-protein interactions (PPIs) with the host for survival. During PPI, beneficial or harmful mutations in proteins occur at interfaces, which are three-dimensional structures composed of amino acid residues that are also binding sites. Therefore, identifying PPIs between hosts and viruses and understanding the interfaces, are vital for developing new antiviral therapies.
While traditional experimental methods have been widely used, they are time-consuming and costly. As a solution, computational methods have become increasingly utilized. These methods have solved the time and cost problems and improved PPI prediction performance but lacked analysis to reveal the actual interacting parts of the protein, the interfaces. Therefore, we propose a human-virus PPI prediction model using a protein language model that takes two sequences as input. The two protein sequences used as input are represented as tokens by a pre-trained RoBERTa. The attention mechanism of the RoBERTa model reflects the mutual information between the two protein sequences.
We compared the performance of the Cross-attention PHV and the STEP on our dataset, dataset from other study, and the SARS-CoV-2 PPI dataset. Our model not only showed better prediction performance than existing human-virus PPI prediction models, but also provided interpretability by identifying PPI interfaces through attention analysis.
Author(s)
Seungwoo Baek
Issued Date
2024
Type
Thesis
URI
https://scholar.gist.ac.kr/handle/local/19348
Alternative Author(s)
백승우
Department
대학원 AI대학원
Advisor
Nam, Hojung
Degree
Master
Appears in Collections:
Department of AI Convergence > 3. Theses(Master)
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.