Biomedical Named Entity Recognition and Normalization using Deep Neural Networks
- Author(s)
- Hyejin Cho
- Type
- Thesis
- Degree
- Doctor
- Department
- 대학원 전기전자컴퓨터공학부
- Advisor
- Lee, Hyunju
- Abstract
- With the rapid development of biomedical technology, a large amount of literature has accumulated on the various aspects of the biomedical domain more rapidly in recent years. Recent newly discovered biomedical approaches written in natural language are available, and thus it can improve research efficiency by extracting key information from articles and converting it into structured knowledge for human comprehension to facilitate research productivity. In the biomedical domain, many researchers pay more attention to biological information from articles and perform several studies to transform raw data into their own information. Therefore, accurate knowledge retrieval from a huge amount of literature and effective management of numerous information sources are becoming more important in biomedical research for use in computational data analysis. In this study, we aim to automatically structure the information for comprehension from unstructured biomedical text data. Throughout this dissertation, we propose two parts of our studies for automatically recognizing and normalizing biomedical named entities among diseases, symptoms, chemicals, genes, and plants from existing biological literature.
In the first part of this dissertation, we developed an named entity recognition (NER) system based on bi-directional long short-term memory (BiLSTM) for named entities from biomedical literature. In biomedical text-mining, NER is one of the important tasks for automatically identifying meaningful terms or phrase in text and classifying them into pre-defined entity types. Previously, almost NER methods have been based on traditional machine learning but these approaches have been heavily dependent on the large-scale dictionaries, target-specific rules or well-defined corpora. Therefore, we developed a BiLSTM-based model with contextual information and conditional random fields (CRF), and then assessed our model on three kinds of biomedical corpora. We showed that our proposed system outperformed several other NER approaches and also exhibited similar performance to the transfer learning approach.
In the second part of this dissertation, we proposed a biomedical entity name normalization model based on deep neural networks to normalize the biomedical entity names recognized in the literature. After single words or multi-word phrases in text have been recognized, the next step is named entity normalization by assigning recognized entities to suitable identifiers in knowledge bases. Because many biological terms have multiple synonyms, various acronyms, and term variations, it is necessary to normalize the recognized entity names in biomedical literature. Although machine-learning approaches have been widely used for normalization, most normalization tools tend to rely on the accuracy of well-constructed dictionaries or domain-specific rules. Therefore, stepping forward from a method for comparing the word similarity by representing biological mentions in continuous vector spaces, we transformed it into a ranking problem through a binary classification technique. We showed improved performance in terms of diseases, symptoms, chemical components, genes, and plant entity names than the existing models. The plant corpus in this study was constructed based on the corpus construction guidelines by experts with biomedical knowledge, which showed the possibility of the reliable application as a biomedical-related natural language processing task.
The purpose of this dissertation is to automatically extract biomedical information from the literature and structure it through a method of recognizing and normalizing a specific entity names and extracting meaningful information in a given text. We hope that this research helps to contribute a new approach to the recognition and normalization of entity names in various fields.
- URI
- https://scholar.gist.ac.kr/handle/local/33136
- Fulltext
- http://gist.dcollection.net/common/orgView/200000907247
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.