OAK

Development of Data-driven Atomic Features as Basis Sets for Molecular Machine Learning

Metadata Downloads
Abstract
An important concept in chemistry is that molecular properties can be explained in a more microscopic view, mostly in atomic or electronic level. In this sense, electronic structure theories are developed on the basis of molecular orbital theory, which generally begins with the linear combination of atomic orbitals. Likewise, typical machine learning (ML) models for predicting molecular properties are built on the basis of atomic features. However, although some molecular properties were well-explained by properties of constituent atoms, there is no certain evidence that tabulated atomic properties are well-correlated to all the molecular properties. Here, we present our data-driven feature generation method for predicting molecular properties, which could not be easily decomposed into atomic properties.
Recently, ML has become one of the popular computational tools for establishing structure-property relationships. To predict molecular properties using ML, molecules should be represented as input variables in ML models, while retaining the structural characteristics of molecules. To make a meaningful representation, graph neural networks (GNNs) which treat a molecular structure as a molecular graph, have been widely adopted due to their remarkable performance. GNNs can process both 2D and 3D molecular representations which can be given in atomic coordinates, inter atomic distances or strings such as simplified molecular-input line-entry system (SMILES). Together with the graph structure with atomic feature matrices, advanced GNN models have shown the state-of-the-art performance. Nevertheless, these methods rely on conventional atomic features such as electronegativity, atomic weight, and van der Waals radius. Here in, we propose our method for generating fully data-driven atomic features that can be used as atomic properties in predicting molecular properties. New data-driven atomic feature showed considerable prediction performance even with small information compared to classical atomic properties in specific circumstances, and if used in conjunction with convolutional atomic properties, performance improvement could be expected in several cases. Interestingly, our generation method is based on metaheuristic algorithm, so it can be applied to various chemical fields. This research is expected to help people who want to create node features in GNNs to improve their model performance in various fields.
Author(s)
Dabean Han
Issued Date
2024
Type
Thesis
URI
https://scholar.gist.ac.kr/handle/local/19129
Alternative Author(s)
한다빈
Department
대학원 화학과
Advisor
Kim, Hyun Woo
Degree
Master
Appears in Collections:
Department of Chemistry > 3. Theses(Master)
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.