GeoKnowledgeFusion: A Platform for Multimodal Data Compilation from Geoscience Literature

Author:

Guo Zhixin1ORCID,Wang Chaoyang2,Zhou Jianping1ORCID,Zheng Guanjie1,Wang Xinbing1,Zhou Chenghu13

Affiliation:

1. School of Electronic, Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

2. Institute of Geology, Chinese Academy of Geological Sciences, Beijing 100037, China

3. Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China

Abstract

With the advent of big data science, the field of geoscience has undergone a paradigm shift toward data-driven scientific discovery. However, the abundance of geoscience data distributed across multiple sources poses significant challenges to researchers in terms of data compilation, which includes data collection, collation, and database construction. To streamline the data compilation process, we present GeoKnowledgeFusion, a publicly accessible platform for the fusion of text, visual, and tabular knowledge extracted from the geoscience literature. GeoKnowledgeFusion leverages a powerful network of models that provide a joint multimodal understanding of text, image, and tabular data, enabling researchers to efficiently curate and continuously update their databases. To demonstrate the practical applications of GeoKnowledgeFusion, we present two scenarios: the compilation of Sm-Nd isotope data for constructing a domain-specific database and geographic analysis, and the data extraction process for debris flow disasters. The data compilation process for these use cases encompasses various tasks, including PDF pre-processing, target element recognition, human-in-the-loop annotation, and joint multimodal knowledge understanding. The findings consistently reveal patterns that align with manually compiled data, thus affirming the credibility and dependability of our automated data processing tool. To date, GeoKnowledgeFusion has supported forty geoscience research teams within the program by processing over 40,000 documents uploaded by geoscientists.

Funder

NSF China

National Key R&D Program of China

Publisher

MDPI AG

Reference61 articles.

1. Cajal, S. (1999). Reglas y Consejos Sobre Investigación Científica: Los Tónicos de la Voluntad (1897), Espasa Calpe. Translated from Spanish to English by Swanson, N.; Swanson, L.W.; Advice for a Young Investigator.

2. Big data challenges in building the global earth observation system of systems;Nativi;Environ. Model. Softw.,2015

3. Geoscience data publication: Practices and perspectives on enabling the FAIR guiding principles;Kinkade;Geosci. Data J.,2022

4. Encoding Collective Knowledge, Instructing Data Reusers: The Collaborative Fixation of a Digital Scientific Data Set;Hoeppe;Comput. Support. Coop. Work (CSCW),2021

5. Big Earth data: Disruptive changes in Earth observation data management and analysis?;Sudmanns;Int. J. Digit. Earth,2020

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3