Algorithms and methods for automated construction of knowledge graphs based on text sources-Reference-Cited by-同舟云学术

Algorithms and methods for automated construction of knowledge graphs based on text sources

Published:2024 Issue: Volume:531 Page:03017
ISSN:2267-1242
Container-title:E3S Web of Conferences
language:
Short-container-title:E3S Web of Conf.

Author:

Filippov Victor,Ayusheeva Natalya,Kusheeva Maria

Abstract

In this article, we present our path towards building knowledge graphs automatically from Russian texts. We explore various methodologies and libraries to extract triples, which are the fundamental building blocks of knowledge graphs. Our approach involves the use of libraries for analyzing morphological characteristics of words, such as PyMorphy and Yandex Mystem, to construct triples. We also utilize the NLP library spaCy to analyze text and build triples based on semantic relationships recognized by the library. However, we found that in some cases, we could not extract relationships from the text, leading us to use word2vec to define relationships. Unfortunately, the results obtained from word2vec were unsatisfactory and could not be used as relationships. We also encountered the problem of building triples from text due to the use of pronouns. To address this issue, we explored the use of coreference resolution libraries, but unfortunately, there are no working libraries available for the Russian language at this time. Our results highlight both positive and negative outcomes of applying these methodologies and libraries, providing insights into the challenges and opportunities of building knowledge graphs automatically from Russian texts.

Publisher

EDP Sciences

Link

https://www.e3s-conferences.org/10.1051/e3sconf/202453103017/pdf

Reference17 articles.

1. Abián D., et al. Wikidata and DBpedia: a comparative study, Springer International Publishing, 142-154 (2018)

2. Pellissier T. T., et al., From freebase to wikidata: The great migration, Proceedings of the 25th international conference on world wide web, 1419-1428 (2016)