Building knowledge graphs from technical documents using named entity recognition and edge weight updating neural network with triplet loss for entity normalization-Reference-Cited by-同舟云学术

Building knowledge graphs from technical documents using named entity recognition and edge weight updating neural network with triplet loss for entity normalization

Published:2023-11-30 Issue: Volume: Page:1-25
ISSN:1088-467X
Container-title:Intelligent Data Analysis
language:
Short-container-title:IDA

Author:

Jeon Sung Hwan¹,Lee Hye Jin¹,Park Jihye¹,Cho Sungzoon¹²

Affiliation:

1. Department of Industrial Engineering, Seoul National University, Gwanak-ro, Gwanak-gu, Seoul, Korea

2. Institute for Industrial Systems Innovation, Seoul National University, Gwanak-ro, Gwanak-gu, Seoul, Korea

Abstract

Attempts to express information from various documents in graph form are rapidly increasing. The speed and volume in which these documents are being generated call for an automated process, based on machine learning techniques, for cost-effective and timely analysis. Past studies responded to such needs by building knowledge graphs or technology trees from the bibliographic information of documents, or by relying on text mining techniques in order to extract keywords and/or phrases. While these approaches provide an intuitive glance into the technological hotspots or the key features of the select field, there still is room for improvement, especially in terms of recognizing the same entities appearing in different forms so as to interconnect closely related technological concepts properly. In this paper, we propose to build a patent knowledge network using the United States Patent and Trademark Office (USPTO) patent filings for the semiconductor device sector by fine-tuning Huggingface’s named entity recognition (NER) model with our novel edge weight updating neural network. For the named entity normalization, we employ edge weight updating neural network with positive and negative candidates that are chosen by substring matching techniques. Experiment results show that our proposed approach performs very competitively against the conventional keyword extraction models frequently employed in patent analysis, especially for the named entity normalization (NEN) and document retrieval tasks. By grouping entities with named entity normalization model, the resulting knowledge graph achieves higher scores in retrieval tasks. We also show that our model is robust to the out-of-vocabulary problem by employing the fine-tuned BERT NER model.

Publisher

IOS Press

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Theoretical Computer Science

Reference61 articles.

1. H. Ye, N. Zhang, H. Chen and H. Chen, Generative Knowledge Graph Construction: A Review, in: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 2022, pp. 1–17. https://aclanthology.org/2022.emnlp-main.1.

2. DDREL: From drug-drug relationships to drug repurposing;Allahgholi;Intelligent Data Analysis,2022

3. Stock market network based on bi-dimensional histogram and autoencoder;Choi;Intelligent Data Analysis,2022

4. Construction and application of a knowledge graph;Hao;Remote Sensing,2021

5. A literature review on the state-of-the-art in patent analysis;Abbas;World Patent Information,2014