Text mining of 15 million full-text scientific articles-Reference-Cited by-同舟云学术

Text mining of 15 million full-text scientific articles

Published:2017-07-11 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Westergaard David^ORCID,Stærfeldt Hans-Henrik,Tønsberg Christian^ORCID,Jensen Lars Juhl^ORCID,Brunak Søren^ORCID

Abstract

AbstractAcross academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823–2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein–protein, disease–gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only.

Publisher

Cold Spring Harbor Laboratory

Reference47 articles.

1. Azevedo A. Integration of Data Mining in Business Intelligence Systems. 1st Editio. Azevedo A , Santos MF , editors. Integration of Data Mining in Business Intelligence Systems. IGI Publishing Hershey, PA, USA; 2014. 314 p.

2. Application of text mining in the biomedical domain

3. Text Mining in Cancer Gene and Pathway Prioritization. Vol. 13;Cancer Informatics,2014

4. Event-based text mining for biology and functional genomics

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Mining Software Entities in Scientific Literature;Proceedings of the 30th ACM International Conference on Information & Knowledge Management;2021-10-26

2. Bioinformatics in Disease Classification;Encyclopedia of Biomedical Engineering;2019

3. A Guide to Dictionary-Based Text Mining;Methods in Molecular Biology;2019

4. SciRide Finder: a citation-based paradigm in biomedical literature search;Scientific Reports;2018-04-18

5. Sentence-based undersampling for named entity recognition using genetic algorithm;Iran Journal of Computer Science;2018-03-06