Automating literature screening and curation with applications to computational neuroscience-Reference-Cited by-同舟云学术

Automating literature screening and curation with applications to computational neuroscience

Published:2024-05-09 Issue:7 Volume:31 Page:1463-1470
ISSN:1067-5027
Container-title:Journal of the American Medical Informatics Association
language:en
Short-container-title:

Author:

Ji Ziqing¹,Guo Siyan¹,Qiao Yujie¹²,McDougal Robert A¹³⁴⁵^ORCID

Affiliation:

1. Biostatistics, Yale School of Public Health, Yale University , New Haven, CT 06510, United States

2. Integrative Genomics, Princeton University , Princeton, NJ 08540, United States

3. Biomedical Informatics and Data Science, Yale School of Medicine, Yale University , New Haven, CT 06510, United States

4. Program in Computational Biology and Bioinformatics, Yale University , New Haven, CT 06510, United States

5. Wu Tsai Institute, Yale University , New Haven, CT 06510, United States

Abstract

Abstract Objective ModelDB (https://modeldb.science) is a discovery platform for computational neuroscience, containing over 1850 published model codes with standardized metadata. These codes were mainly supplied from unsolicited model author submissions, but this approach is inherently limited. For example, we estimate we have captured only around one-third of NEURON models, the most common type of models in ModelDB. To more completely characterize the state of computational neuroscience modeling work, we aim to identify works containing results derived from computational neuroscience approaches and their standardized associated metadata (eg, cell types, research topics). Materials and Methods Known computational neuroscience work from ModelDB and identified neuroscience work queried from PubMed were included in our study. After pre-screening with SPECTER2 (a free document embedding method), GPT-3.5, and GPT-4 were used to identify likely computational neuroscience work and relevant metadata. Results SPECTER2, GPT-4, and GPT-3.5 demonstrated varied but high abilities in identification of computational neuroscience work. GPT-4 achieved 96.9% accuracy and GPT-3.5 improved from 54.2% to 85.5% through instruction-tuning and Chain of Thought. GPT-4 also showed high potential in identifying relevant metadata annotations. Discussion Accuracy in identification and extraction might further be improved by dealing with ambiguity of what are computational elements, including more information from papers (eg, Methods section), improving prompts, etc. Conclusion Natural language processing and large language model techniques can be added to ModelDB to facilitate further model discovery, and will contribute to a more standardized and comprehensive framework for establishing domain-specific resources.

Publisher

Oxford University Press (OUP)

Link

https://academic.oup.com/jamia/article-pdf/31/7/1463/58243708/ocae097.pdf

Reference26 articles.

1. Wikidata: a free collaborative knowledgebase;Vrandečić;Commun ACM,2014

2. GenBank;Benson;Nucleic Acids Res,2013

3. NeuroMorpho.Org: a central resource for neuronal morphologies;Ascoli;J Neurosci,2007

4. Sharing neuron data: carrots, sticks, and digital records;Ascoli;PLoS Biol,2015

5. Reproducibility in computational neuroscience models and simulations;McDougal;IEEE Trans Biomed Eng,2016