A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment-Reference-Cited by-同舟云学术

A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment

Published:2011-03-08 Issue:1 Volume:12 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Guo Yufan,Korhonen Anna,Liakata Maria,Silins Ilona,Hogberg Johan,Stenius Ulla

Abstract

Abstract Background Many practical tasks in biomedicine require accessing specific types of information in scientific literature; e.g. information about the results or conclusions of the study in question. Several schemes have been developed to characterize such information in scientific journal articles. For example, a simple section-based scheme assigns individual sentences in abstracts under sections such as Objective, Methods, Results and Conclusions. Some schemes of textual information structure have proved useful for biomedical text mining (BIO-TM) tasks (e.g. automatic summarization). However, user-centered evaluation in the context of real-life tasks has been lacking. Methods We take three schemes of different type and granularity - those based on section names, Argumentative Zones (AZ) and Core Scientific Concepts (CoreSC) - and evaluate their usefulness for a real-life task which focuses on biomedical abstracts: Cancer Risk Assessment (CRA). We annotate a corpus of CRA abstracts according to each scheme, develop classifiers for automatic identification of the schemes in abstracts, and evaluate both the manual and automatic classifications directly as well as in the context of CRA. Results Our results show that for each scheme, the majority of categories appear in abstracts, although two of the schemes (AZ and CoreSC) were developed originally for full journal articles. All the schemes can be identified in abstracts relatively reliably using machine learning. Moreover, when cancer risk assessors are presented with scheme annotated abstracts, they find relevant information significantly faster than when presented with unannotated abstracts, even when the annotations are produced using an automatic classifier. Interestingly, in this user-based evaluation the coarse-grained scheme based on section names proved nearly as useful for CRA as the finest-grained CoreSC scheme. Conclusions We have shown that existing schemes aimed at capturing information structure of scientific documents can be applied to biomedical abstracts and can be identified in them automatically with an accuracy which is high enough to benefit a real-life task in biomedicine.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/1471-2105-12-69.pdf

Reference43 articles.

1. PubMed[http://www.ncbi.nlm.nih.gov/pubmed]

2. Cohen A, Hersh W: A survey of current work in biomedical text mining. Briefings in Bioinformatics 2005, 6: 57–71. 10.1093/bib/6.1.57

3. Ananiadou S, Mcnaught J: Text Mining for Biology And Biomedicine. Norwood, MA, USA: Artech House, Inc; 2005.

4. Hunter L, Cohen KB: Biomedical Language Processing: What's Beyond PubMed? Mol Cell 2006, 21(5):589–594. 10.1016/j.molcel.2006.02.012

5. Ananiadou S, Kell D, Tsujii J: Text mining and its potential applications in systems biology. Trends in Biotechnology 2006, 24(12):571–579. 10.1016/j.tibtech.2006.10.002

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A survey on clinical natural language processing in the United Kingdom from 2007 to 2022;npj Digital Medicine;2022-12-21

2. A hybrid approach to recognize generic sections in scholarly documents;International Journal on Document Analysis and Recognition (IJDAR);2021-06-21

3. A Systematic Approach to Map the Research Articles’ Sections to IMRAD;IEEE Access;2020

4. Information extraction from scientific articles: a survey;Scientometrics;2018-09-29

5. Protein interaction network constructing based on text mining and reinforcement learning with application to prostate cancer;IET Systems Biology;2015-08