Affiliation:
1. School of Electronic, Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
2. Institute of Geology, Chinese Academy of Geological Sciences, Beijing 100037, China
3. Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
Abstract
With the advent of big data science, the field of geoscience has undergone a paradigm shift toward data-driven scientific discovery. However, the abundance of geoscience data distributed across multiple sources poses significant challenges to researchers in terms of data compilation, which includes data collection, collation, and database construction. To streamline the data compilation process, we present GeoKnowledgeFusion, a publicly accessible platform for the fusion of text, visual, and tabular knowledge extracted from the geoscience literature. GeoKnowledgeFusion leverages a powerful network of models that provide a joint multimodal understanding of text, image, and tabular data, enabling researchers to efficiently curate and continuously update their databases. To demonstrate the practical applications of GeoKnowledgeFusion, we present two scenarios: the compilation of Sm-Nd isotope data for constructing a domain-specific database and geographic analysis, and the data extraction process for debris flow disasters. The data compilation process for these use cases encompasses various tasks, including PDF pre-processing, target element recognition, human-in-the-loop annotation, and joint multimodal knowledge understanding. The findings consistently reveal patterns that align with manually compiled data, thus affirming the credibility and dependability of our automated data processing tool. To date, GeoKnowledgeFusion has supported forty geoscience research teams within the program by processing over 40,000 documents uploaded by geoscientists.
Funder
NSF China
National Key R&D Program of China
Reference61 articles.
1. Cajal, S. (1999). Reglas y Consejos Sobre Investigación Científica: Los Tónicos de la Voluntad (1897), Espasa Calpe. Translated from Spanish to English by Swanson, N.; Swanson, L.W.; Advice for a Young Investigator.
2. Big data challenges in building the global earth observation system of systems;Nativi;Environ. Model. Softw.,2015
3. Geoscience data publication: Practices and perspectives on enabling the FAIR guiding principles;Kinkade;Geosci. Data J.,2022
4. Encoding Collective Knowledge, Instructing Data Reusers: The Collaborative Fixation of a Digital Scientific Data Set;Hoeppe;Comput. Support. Coop. Work (CSCW),2021
5. Big Earth data: Disruptive changes in Earth observation data management and analysis?;Sudmanns;Int. J. Digit. Earth,2020