Abstract
PurposeThe purpose of this paper is to present a preliminary work on extracting band gap information of materials from academic papers. With increasing demand for renewable energy, band gap information will help material scientists design and implement novel photovoltaic (PV) cells.Design/methodology/approachThe authors collected 1.44 million titles and abstracts of scholarly articles related to materials science, and then filtered the collection to 11,939 articles that potentially contain relevant information about materials and their band gap values. ChemDataExtractor was extended to extract information about PV materials and their band gap information. Evaluation was performed on randomly sampled information records of 415 papers.FindingsThe findings of this study show that the current system is able to correctly extract information for 51.32% articles, with partially correct extraction for 36.62% articles and incorrect for 12.04%. The authors have also identified the errors belonging to three main categories pertaining to chemical entity identification, band gap information and interdependency resolution. Future work will focus on addressing these errors to improve the performance of the system.Originality/valueThe authors did not find any literature to date on band gap information extraction from academic text using automated methods. This work is unique and original. Band gap information is of importance to materials scientists in applications such as solar cells, light emitting diodes and laser diodes.
Subject
Library and Information Sciences,Information Systems
Reference68 articles.
1. Aggarwal, C.C. and Zhai, C.X. (2013), “Mining text data”, in Mining Text Data, Vol. 9781461432234, doi: 10.1007/978-1-4614-3223-4.
2. A comparative analysis of chemical named entity recognition using support vector machines,2013
3. An overview of the CRAFT concept annotation guidelines,2010
4. An algorithm that learns what's in a name;Machine Learning,1999
5. The unified medical language system (UMLS): integrating biomedical terminology;Nucleic Acids Research,2004
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献