Automating literature screening and curation with applications to computational neuroscience

Author:

Ji Ziqing1,Guo Siyan1,Qiao Yujie12,McDougal Robert A1345ORCID

Affiliation:

1. Biostatistics, Yale School of Public Health, Yale University , New Haven, CT 06510, United States

2. Integrative Genomics, Princeton University , Princeton, NJ 08540, United States

3. Biomedical Informatics and Data Science, Yale School of Medicine, Yale University , New Haven, CT 06510, United States

4. Program in Computational Biology and Bioinformatics, Yale University , New Haven, CT 06510, United States

5. Wu Tsai Institute, Yale University , New Haven, CT 06510, United States

Abstract

Abstract Objective ModelDB (https://modeldb.science) is a discovery platform for computational neuroscience, containing over 1850 published model codes with standardized metadata. These codes were mainly supplied from unsolicited model author submissions, but this approach is inherently limited. For example, we estimate we have captured only around one-third of NEURON models, the most common type of models in ModelDB. To more completely characterize the state of computational neuroscience modeling work, we aim to identify works containing results derived from computational neuroscience approaches and their standardized associated metadata (eg, cell types, research topics). Materials and Methods Known computational neuroscience work from ModelDB and identified neuroscience work queried from PubMed were included in our study. After pre-screening with SPECTER2 (a free document embedding method), GPT-3.5, and GPT-4 were used to identify likely computational neuroscience work and relevant metadata. Results SPECTER2, GPT-4, and GPT-3.5 demonstrated varied but high abilities in identification of computational neuroscience work. GPT-4 achieved 96.9% accuracy and GPT-3.5 improved from 54.2% to 85.5% through instruction-tuning and Chain of Thought. GPT-4 also showed high potential in identifying relevant metadata annotations. Discussion Accuracy in identification and extraction might further be improved by dealing with ambiguity of what are computational elements, including more information from papers (eg, Methods section), improving prompts, etc. Conclusion Natural language processing and large language model techniques can be added to ModelDB to facilitate further model discovery, and will contribute to a more standardized and comprehensive framework for establishing domain-specific resources.

Publisher

Oxford University Press (OUP)

Reference26 articles.

1. Wikidata: a free collaborative knowledgebase;Vrandečić;Commun ACM,2014

2. GenBank;Benson;Nucleic Acids Res,2013

3. NeuroMorpho.Org: a central resource for neuronal morphologies;Ascoli;J Neurosci,2007

4. Sharing neuron data: carrots, sticks, and digital records;Ascoli;PLoS Biol,2015

5. Reproducibility in computational neuroscience models and simulations;McDougal;IEEE Trans Biomed Eng,2016

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3