Improving Publication Pipeline with Automated Biological Entity Detection and Validation Service
Author:
Xu Weijia1, Gupta Amit1, Jaiswal Pankaj2, Taylor Crispin3, Lockhart Patti3, Regala Jennifer3
Affiliation:
1. Texas Advanced Computing Center, University of Texas , Austin , USA 2. Oregon State University, Corvallis , Oregon , USA 3. American Society of Plant Biologists, Rockville , Maryland , USA
Abstract
Abstract
With the increasing amount of digital journal submissions, there is a need to deploy new scalable computational methods to improve information accessibilities. One common task is to identify useful information and named entity from text documents such as journal article submission. However, there are many technical challenges to limit applicability of the general methods and lack of general tools. In this paper, we present domain informational vocabulary extraction (DIVE) project, which aims to enrich digital publications through detection of entity and key informational words and by adding additional annotations. In a first of its kind to our knowledge, our system engages authors of the peer-reviewed articles and the journal publishers by integrating DIVE implementation in the manuscript proofing and publication process. The system implements multiple strategies for biological entity detection, including using regular expression rules, ontology, and a keyword dictionary. These extracted entities are then stored in a database and made accessible through an interactive web application for curation and evaluation by authors. Through the web interface, the authors can make additional annotations and corrections to the current results. The updates can then be used to improve the entity detection in subsequent processed articles in the future. We describe our framework and deployment in details. In a pilot program, we have deployed the first phase of development as a service integrated with the journals Plant Physiology and The Plant cell published by the American Society of Plant Biologists (ASPB). We present usage statistics to date since its production on April 2018. We compare automated recognition results from DIVE with results from author curation and show the service achieved on average 80% recall and 70% precision per article. In contrast, an existing biological entity extraction tool, a biomedical named entity recognizer (ABNER), can only achieve 47% recall and return a much larger candidate set.
Publisher
Walter de Gruyter GmbH
Subject
Geology,Ocean Engineering,Water Science and Technology
Reference69 articles.
1. Arnaud, E., Cooper, L., Shrestha, R., Menda, N., Nelson, R. T., Matteis, L.,…. McLaren, G. (2012, October). Towards a Reference Plant Trait Ontology for Modeling Knowledge of Plant Traits and Phenotypes. In KEOD (pp. 220-225). 2. Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Harris, M. A. (2000). Gene ontology: Tool for the unification of biology. Nature Genetics 25(1), 25–29. 3. Bhagavatula M., GSK S, Varma V. (2012, November). Named entity recognition an aid to improve multilingual entity filling in language-independent approach. Proceedings of the First Workshop on Information and Knowledge Management for Developing Region (pp. 3-10), ACM. 4. Bhattacharya, I., & Getoor, L. (2007). Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data (TKDD),1(1), 1-36. 5. Bilgic, M., Licamele, L., Getoor, L., & Shneiderman, B. (2006, October). D-dupe: An interactive tool for entity resolution in social networks. In 2006 IEEE Symposium on Visual Analytics Science and Technology (pp. 43-50). Baltimore, MD, USA.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|