Affiliation:
1. Moscow State Linguistic University
Abstract
The paper is aimed at building a model of a linguistic corpus, which is generated according to the rules of the spaCy natural language processing library. Scientific novelty lies in the fact that within the framework of humanities research, the method of modelling is used, which is combined with a corpus approach and takes into account the technological (software) component at the very stage of goal setting. In the research, firstly, a general structural model of a linguistic corpus as a sequence of blocks was determined and standard queries to the database were formulated; secondly, a model of the corpus manager interface able to implement these standard queries was built; thirdly, an analysis of the proposed model with the help of mini-programs that allow assessing the degree of technical feasibility of the queries and their practical value was conducted. At this stage, text arrays of fictional works by German-speaking (F. Kafka, E. M. Remarque) and English-speaking (A. C. Doyle, G. Orwell) writers were involved as linguistic material. The obtained results showed that the constructed model has a number of advantages with a limited number of disadvantages, is flexible in terms of further development and can be programmatically implemented in the short term.
Subject
General Earth and Planetary Sciences,General Environmental Science
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献