Abstract
AbstractA large amount of materials science knowledge is generated and stored as text published in peer-reviewed scientific literature. While recent developments in natural language processing, such as Bidirectional Encoder Representations from Transformers (BERT) models, provide promising information extraction tools, these models may yield suboptimal results when applied on materials domain since they are not trained in materials science specific notations and jargons. Here, we present a materials-aware language model, namely, MatSciBERT, trained on a large corpus of peer-reviewed materials science publications. We show that MatSciBERT outperforms SciBERT, a language model trained on science corpus, and establish state-of-the-art results on three downstream tasks, named entity recognition, relation classification, and abstract classification. We make the pre-trained weights of MatSciBERT publicly accessible for accelerated materials discovery and information extraction from materials science texts.
Funder
DST | Science and Engineering Research Board
DAE | Board of Research in Nuclear Sciences
Indian Space Research Organisation
Ministry of Human Resource Development
Indian Institute of Technology Delhi
Google
International Business Machines Corporation
Bloomberg L.P.
Publisher
Springer Science and Business Media LLC
Subject
Computer Science Applications,Mechanics of Materials,General Materials Science,Modeling and Simulation
Reference63 articles.
1. Science, N. & (US), T. C. Materials genome initiative for global competitiveness. (Executive Office of the President, National Science and Technology Council, https://www.mgi.gov/sites/default/files/documents/materials_genome_initiative-final.pdf, 2011).
2. Jain, A. et al. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013).
3. Zunger, A. Inverse design in search of materials with target functionalities. Nat. Rev. Chem. 2, 1–16 (2018).
4. Chen, C. et al. A critical review of machine learning of energy materials. Adv. Energy Mater. 10, 1903242 (2020).
5. de Pablo, J. J. et al. New frontiers for the materials genome initiative. Npj Comput. Mater. 5, 1–23 (2019).
Cited by
90 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献