Affiliation:
1. Tsinghua University, Beijing, China
2. Peking University, Beijing, China
Abstract
In recent years, knowledge graph (KG) has attracted significant attention from academia and industry, resulting in the development of numerous technologies for KG construction, completion, and application. XLORE is one of the largest multilingual KGs built from Baidu Baike and Wikipedia via a series of knowledge modeling and acquisition methods. In this article, we utilize systematic methods to improve XLORE's data quality and present its latest version, XLORE 3, which enables the effective integration and management of heterogeneous knowledge from diverse resources. Compared with previous versions, XLORE 3 has three major advantages: (1) We design a comprehensive and reasonable schema, namely XLORE ontology, which can effectively organize and manage entities from various resources. (2) We merge equivalent entities in different languages to facilitate knowledge sharing. We provide a large-scale entity linking system to establish the associations between unstructured text and structured KG. (3) We design a multi-strategy knowledge completion framework, which leverages pre-trained language models and vast amounts of unstructured text to discover missing and new facts. The resulting KG contains 446 concepts, 2,608 properties, 66 million entities, and more than 2 billion facts. It is available and downloadable online at
https://www.xlore.cn/
, providing a valuable resource for researchers and practitioners in various fields.
Funder
Institute for Guo Qiang, Tsinghua University
Publisher
Association for Computing Machinery (ACM)
Reference149 articles.
1. Josh Achiam Steven Adler Sandhini Agarwal Lama Ahmad Ilge Akkaya Florencia Leoni Aleman Diogo Almeida Janko Altenschmidt Sam Altman Shyamal Anadkat et al. 2023. GPT-4 technical report. arXiv:2303.08774. Retrieved from https://arxiv.org/pdf/2303.08774.pdf
2. Learning Heterogeneous Knowledge Base Embeddings for Explainable Recommendation
3. The Fact Extraction and VERification Over Unstructured and Structured information (FEVEROUS) Shared Task
4. Rohan Anil Andrew M. Dai Orhan Firat Melvin Johnson Dmitry Lepikhin Alexandre Passos Siamak Shakeri Emanuel Taropa Paige Bailey Zhifeng Chen et al. 2023. PaLM 2 technical report. arXiv:2305.10403. Retrieved from https://arxiv.org/pdf/2305.10403.pdf
5. Dogu Araci. 2019. Finbert: Financial sentiment analysis with pre-trained language models. arXiv:1908.10063. Retrieved from https://arxiv.org/pdf/1908.10063.pdf