Model of the Text of a Scientifc and Technical Article for Markup in the Corpus of Scientifc and Technical Texts-Reference-Cited by-同舟云学术

Model of the Text of a Scientifc and Technical Article for Markup in the Corpus of Scientifc and Technical Texts

Published:2023-02-12 Issue:3 Volume:20 Page:5-13
ISSN:2410-0420
Container-title:Vestnik NSU. Series: Information Technologies
language:
Short-container-title:jour

Author:

Butenko Yu. I.¹

Affiliation:

1. Bauman Moscow State Technical University

Abstract

The paper proposes a model of the text of a scientifc and technical article for the automation of markup in the corpus of scientifc and technical texts. It is proved that when creating a corpus of scientifc and technical texts, it is necessary to take into account the structural features of texts of scientifc and technical articles. The necessity of adding structural markup to the corpus of scientifc and technical texts has been shown. It is noted that the texts of scientifc and technical articles have the same narration structure for all texts in this class, and also contain a limited set of structural elements. The features of compositional organization of the texts of scientifc and technical articles are analyzed. The approximate content of each of the elements of article structure is described. Compositional structure of the texts of scientifc and technical articles in Bekus-Naur notation is presented. A model of the text of a scientifc and technical article in the form of a graph, the vertices and edges of which are the full-ﬂedged structural elements of a scientifc and technical article, is proposed. It is proved that the representation of a text of scientifc and technical article in the form of a graph makes it possible to determine the type of structural element and the degree of nesting in the process of computer analysis of the text by presenting the scientifc and technical article as a fnite set of its constituent parts. It is proved that the presence of structural markup in the corpus of scientifc and technical texts signifcantly expands its research potential and serves as the basis for the tasks of automatic processing of scientifc and technical texts.

Publisher

Novosibirsk State University (NSU)

Subject

Pharmacology (medical)

Reference14 articles.

1. Zakharov V. P. Russian corpora. Proceedings of Vinogradov Institute of the Russian Language, 2015. Vol. 6, pp. 20–65. (in Russ.)

2. Nagel O. V. Corpus linguistics and its use in computerized language learning. Language and Culture, 2008. No. 4, pp. 53–59. (in Russ.)

3. Kruzhkov M. G. Information resources of contrastive linguistic research: electronic corpus of texts. Systems and means of informatics, 2015. Vol. 25, no. 2, pp. 140–159. (in Russ.)

4. Lesnikov V. S. Types of markup of text corpus of the Russian language. Scientifc and Technical Information. Series 2. Information processes and systems, 2019. No. 9, pp. 27–30. (in Russ.)

5. Butenko Iu. I. Model of the text of the standard in the information search in the collection of documents of the normative base. Bulletin of Computer and Information Technologies, 2020. Vol. 17, no. 11, pp. 23–32. DOI: 10.14489/vkit. 2020.11 (in Russ.)