Flagging incorrect nucleotide sequence reagents in biomedical papers: To what extent does the leading publication format impede automatic error detection?-Reference-Cited by-同舟云学术

Flagging incorrect nucleotide sequence reagents in biomedical papers: To what extent does the leading publication format impede automatic error detection?

Published:2020-05-22 Issue:2 Volume:124 Page:1139-1156
ISSN:0138-9130
Container-title:Scientometrics
language:en
Short-container-title:Scientometrics

Author:

Labbé Cyril^ORCID,Cabanac Guillaume^ORCID,West Rachael A.^ORCID,Gautier Thierry^ORCID,Favier Bertrand^ORCID,Byrne Jennifer A.^ORCID

Abstract

AbstractIn an idealised vision of science the scientific literature is error-free. Errors reported during peer review are supposed to be corrected prior to publication, as further research establishes new knowledge based on the body of literature. It happens, however, that errors pass through peer review, and a minority of cases errata and retractions follow. Automated screening software can be applied to detect errors in manuscripts and publications. The contribution of this paper is twofold. First, we designed the erroneous reagent checking () benchmark to assess the accuracy of fact-checkers screening biomedical publications for dubious mentions of nucleotide sequence reagents. It comes with a test collection comprised of 1679 nucleotide sequence reagents that were curated by biomedical experts. Second, we benchmarked our own screening software called Seek&Blastn with three input formats to assess the extent of performance loss when operating on various publication formats. Our findings stress the superiority of markup formats (a 79% detection rate on XML and HTML) over the prominent PDF format (a 69% detection rate at most) regarding an error flagging task. This is the first published baseline on error detection involving reagents reported in biomedical scientific publications. The benchmark is designed to facilitate the development and validation of software bricks to enhance the reliability of the peer review process.

Funder

Office of research integrity

Post-Truth Initiative, a Sydney University Research Excellence Initiative

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences,Computer Science Applications,General Social Sciences

Link

https://link.springer.com/content/pdf/10.1007/s11192-020-03463-z.pdf

Reference33 articles.

1. Acuna, D. E., Brookes, P. S., & Kording, K. P. (2018). Bioscience-scale automated detection of figure element reuse. bioRXiv. https://doi.org/10.1101/269415.

2. Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W., et al. (1997). Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25(17), 3389–3402. https://doi.org/10.1093/nar/25.17.3389.