Markup of scientific and technical texts in the aspect of developing of the corpus
-
Published:2022-01
Issue:1
Volume:
Page:14-20
-
ISSN:2310-4287
-
Container-title:Philological Sciences. Scientific Essays of Higher Education
-
language:
-
Short-container-title:Filol. nauki, Naučh. dokl. vysš. šk.
Author:
Butenko Iulia I., ,Lukyanova Galina O.,
Abstract
The article deals with the peculiarities of the markup of scientific and technical texts in developing a corpus of highly specialized texts. The scientific and technical texts as sources of filling the corpus are listed. The scientific and technical texts are analyzed from the position of markup of textual elements of different levels. The necessity of introducing interlevel types of markup of scientific and technical texts is substantiated. The significance of introducing structural markup when creating a corpus of scientific and technical texts is emphasized. The structural elements of scientific and technical texts for filling the corpus are listed. The current state of the problem of automatic extraction of terms from scientific and technical texts is analyzed. It is shown that the greatest difficulty is the marking of multicomponent terminological units in the corpus of scientific and technical texts. We identify literary terms as objects that require the development of additional tools for their processing, which may include various letters, symbols, numbers or their combinations. References as a factor influencing the classification and rubrication of scientific and technical texts are analyzed. The necessity of studying the types of references, as well as the ways of their automatic marking in the corpus of scientific and technical texts is substantiated. The necessity of introducing a separate marking of examples in scientific and technical texts is substantiated.
Publisher
INOITs ALMAVEST Ltd.
Subject
General Arts and Humanities