Structured information extraction from scientific text with large language models-Reference-Cited by-同舟云学术

Structured information extraction from scientific text with large language models

Published:2024-02-15 Issue:1 Volume:15 Page:
ISSN:2041-1723
Container-title:Nature Communications
language:en
Short-container-title:Nat Commun

Author:

Dagdelen John^ORCID,Dunn Alexander^ORCID,Lee Sanghoon,Walker Nicholas,Rosen Andrew S.^ORCID,Ceder Gerbrand,Persson Kristin A.^ORCID,Jain Anubhav^ORCID

Abstract

AbstractExtracting structured knowledge from scientific text remains a challenging task for machine learning models. Here, we present a simple approach to joint named entity recognition and relation extraction and demonstrate how pretrained large language models (GPT-3, Llama-2) can be fine-tuned to extract useful records of complex scientific knowledge. We test three representative tasks in materials chemistry: linking dopants and host materials, cataloging metal-organic frameworks, and general composition/phase/morphology/application information extraction. Records are extracted from single sentences or entire paragraphs, and the output can be returned as simple English sentences or a more structured format such as a list of JSON objects. This approach represents a simple, accessible, and highly flexible route to obtaining large databases of structured specialized scientific knowledge extracted from research papers.

Publisher

Springer Science and Business Media LLC

Link

https://www.nature.com/articles/s41467-024-45563-x.pdf

Reference67 articles.

1. Saal, J. E., Oliynyk, A. O. & Meredig, B. Machine learning in materials discovery: confirmed predictions and their underlying approaches. Annu. Rev. Mater. Res. 50, 49–69 (2020).

2. Choudhary, K. et al. Recent advances and applications of deep learning methods in materials science. npj Comput. Mater. 8, 59 (2022).

3. Oliveira, O. N. & Oliveira, M. C. F. Materials discovery with machine learning and knowledge discovery. Front. Chem. 10, 930369 (2022).

4. Weston, L. et al. Named entity recognition and normalization applied to large-scale information extraction from the materials science literature. J. Chem. Inform. Modeling 59, 3692–3702 (2019).

5. Trewartha, A. et al. Quantifying the advantage of domain-specific pre-training on named entity recognition tasks in materials science. Patterns 3, 100488 (2022).

Cited by 32 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Machine learning in materials research: Developments over the last decade and challenges for the future;Current Opinion in Solid State and Materials Science;2024-12

2. Enhancing knowledge tracing with concept map and response disentanglement;Knowledge-Based Systems;2024-10

3. Review of External Field Effects on Electrocatalysis: Machine Learning Guided Design;Advanced Functional Materials;2024-09-10

4. LLM-AIx: An open source pipeline for Information Extraction from unstructured medical text based on privacy preserving Large Language Models;2024-09-03

5. A large language model-powered literature review for high-angle annular dark field imaging;Chinese Physics B;2024-09-01