Accuracy Analysis of the End-to-End Extraction of Related Named Entities from Russian Drug Review Texts by Modern Approaches Validated on English Biomedical Corpora-Reference-Cited by-同舟云学术

Accuracy Analysis of the End-to-End Extraction of Related Named Entities from Russian Drug Review Texts by Modern Approaches Validated on English Biomedical Corpora

Published:2023-01-09 Issue:2 Volume:11 Page:354
ISSN:2227-7390
Container-title:Mathematics
language:en
Short-container-title:Mathematics

Author:

Sboev Alexander^ORCID,Rybka Roman^ORCID,Selivanov Anton^ORCID,Moloshnikov Ivan^ORCID,Gryaznov Artem^ORCID,Naumov Alexander^ORCID,Sboeva Sanna,Rylkov Gleb,Zakirova Soyora

Abstract

An extraction of significant information from Internet sources is an important task of pharmacovigilance due to the need for post-clinical drugs monitoring. This research considers the task of end-to-end recognition of pharmaceutically significant named entities and their relations in texts in natural language. The meaning of “end-to-end” is that both of the tasks are performed within a single process on the “raw” text without annotation. The study is based on the current version of the Russian Drug Review Corpus—a dataset of 3800 review texts from the Russian segment of the Internet. Currently, this is the only corpus in the Russian language appropriate for research of the mentioned type. We estimated the accuracy of the recognition of the pharmaceutically significant entities and their relations in two approaches based on neural-network language models. The first core approach is to sequentially solve tasks of named-entities recognition and relation extraction (the sequential approach). The second one solves both tasks simultaneously with a single neural network (the joint approach). The study includes a comparison of both approaches, along with the hyperparameters selection to maximize resulting accuracy. It is shown that both approaches solve the target task at the same level of accuracy: 52–53% macro-averaged F1-score, which is the current level of accuracy for “end-to-end” tasks on the Russian language. Additionally, the paper presents the results for English open datasets ADE and DDI based on the joint approach, and hyperparameter selection for the modern domain-specific language models. The result is that the achieved accuracies of 84.2% (ADE) and 73.3% (DDI) are comparable or better than other published results for the datasets.

Funder

Russian Science Foundation

Publisher

MDPI AG

Subject

General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Link

https://www.mdpi.com/2227-7390/11/2/354/pdf

Reference86 articles.

1. A probabilistically entropic mechanism of topical clusterisation along with thematic annotation for evolution analysis of meaningful social information of internet sources;Gydovskikh;Lobachevskii J. Math.,2017

2. Naumov, A., Rybka, R., Sboev, A., Selivanov, A., and Gryaznov, A. (2020, January 10–16). Neural-network method for determining text author’s sentiment to an aspect specified by the named entity. Proceedings of the Russian Advances in Artificial Intelligence, Moscow, Russia. Number 2648 in CEUR Workshop Proceedings.

3. Fields, S., Cole, C.L., Oei, C., and Chen, A.T. (2022). Using named entity recognition and network analysis to distinguish personal networks from the social milieu in nineteenth-century Ottoman–Iraqi personal diaries. Digit. Scholarsh. Humanit., fqac047.

4. Topic segmentation via community detection in complex networks;Costa;Chaos Interdiscip. J. Nonlinear Sci.,2016

5. Selivanov, A.A., Moloshnikov, I.A., Rybka, R.B., and Sboev, A.G. (2020, January 10–16). Keyword Extraction Approach Based on Probabilistic-Entropy, Graph, and Neural Network Methods. Proceedings of the Russian Conference on Artificial Intelligence, Moscow, Russia. Number 12412 in Lecture Notes in Computer Science.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. SCREENER: Streamlined collaborative learning of NER and RE model for discovering gene-disease relations;PLOS ONE;2023-11-27

2. Prediction Of User Ratings For Drug Side Effects Using Deep Neural Network With Contextual Co-occurrence Based Word-Embedding Vector;2023 13th International Conference on Dependable Systems, Services and Technologies (DESSERT);2023-10-13

3. A Concise Relation Extraction Method Based on the Fusion of Sequential and Structural Features Using ERNIE;Mathematics;2023-03-16