Triage of documents containing protein interactions affected by mutations using an NLP based machine learning approach-Reference-Cited by-同舟云学术

Triage of documents containing protein interactions affected by mutations using an NLP based machine learning approach

Published:2020-11-10 Issue:1 Volume:21 Page:
ISSN:1471-2164
Container-title:BMC Genomics
language:en
Short-container-title:BMC Genomics

Author:

Qu Jinchan,Steppi Albert,Zhong Dongrui,Hao Jie,Wang Jian,Lung Pei-Yau,Zhao Tingting,He Zhe,Zhang Jinfeng^ORCID

Abstract

Abstract Background Information on protein-protein interactions affected by mutations is very useful for understanding the biological effect of mutations and for developing treatments targeting the interactions. In this study, we developed a natural language processing (NLP) based machine learning approach for extracting such information from literature. Our aim is to identify journal abstracts or paragraphs in full-text articles that contain at least one occurrence of a protein-protein interaction (PPI) affected by a mutation. Results Our system makes use of latest NLP methods with a large number of engineered features including some based on pre-trained word embedding. Our final model achieved satisfactory performance in the Document Triage Task of the BioCreative VI Precision Medicine Track with highest recall and comparable F1-score. Conclusions The performance of our method indicates that it is ideally suited for being combined with manual annotations. Our machine learning framework and engineered features will also be very helpful for other researchers to further improve this and other related biological text mining tasks using either traditional machine learning or deep learning based methods.

Funder

National Institute of General Medical Sciences

Publisher

Springer Science and Business Media LLC

Subject

Genetics,Biotechnology

Link

http://link.springer.com/content/pdf/10.1186/s12864-020-07185-7.pdf

Reference76 articles.

1. Bakail M, Ochsenbein F. Targeting protein–protein interactions, a wide open field for drug design. Comptes Rendus Chimie. 2016;19(1):19–27.

2. Feng Y, Wang Q, Wang T. Drug target protein-protein interaction networks: a systematic perspective. Biomed Res Int. 2017;2017:1289259.

3. Berggård T, Linse S, James P. Methods for the detection and analysis of protein–protein interactions. Proteomics. 2007;7(16):2833–42.

4. Rao VS, et al. Protein-protein interaction detection: methods and analysis. Int J Proteomics. 2014;2014:12.

5. Free RB, Hazelwood LA, Sibley DR. Identifying novel protein-protein interactions using co-immunoprecipitation and mass spectroscopy. Curr Protoc Neurosci. 2009;Chapter 5:Unit 5.28.

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. BioKG: a comprehensive, large-scale biomedical knowledge graph for AI-powered, data-driven biomedical research;2023-10-17

2. An integrated strategy to explore the wine-processed mechanism ofCorni Fructuson chronic renal failure based on metabolomics, network analysis and bioinformatics approaches;Journal of Pharmacy and Pharmacology;2023-02-23

3. Logistic Regression-Based Machine Learning Model for Mutation Classification in the Discovery of Precision Medicine;Translating Healthcare Through Intelligent Computational Methods;2023

4. Pre-trained models, data augmentation, and ensemble learning for biomedical information extraction and document classification;Database;2022-01-01

5. NLP-Based Tools for Decoding the Language of Life;Proceedings of Emerging Trends and Technologies on Intelligent Systems;2021-10-02