Author:
Nguyen T. V.,Duong Q. H. T.,Kravets A. G.
Abstract
The widespread use of information and communication technologies, database technologies and the Internet has led to the development of specialized digital libraries. These digital libraries serve a huge number of different users and play an important role as repositories and providers of information and knowledge. Therefore, the automatic extraction of useful information from texts stored in digital libraries is becoming an increasingly important research topic in the field of data mining. The article discusses the statistical analysis of texts in the digital library arXiv.org to identify the most common terms, bigrams and trigrams. After the hyper-parameters optimization process of neural network models, the trend prediction results in the use of terms in the field of computer sciences are presented. By analyzing statistics and predicting usage frequency of bigram and trigram terms our findings provide evidence that papers concerned with machine learning, reinforcement learning, generative adversarial network, convolutional neural network and recurrent neural network can be seen as main future research trend in Computer science in the next 3 years. Moreover, topics related to will experience a sudden increase in usage frequency. Being able to predict scientific trends in advance could potentially revolutionize the way science is done, for instance, by enabling funding agencies to optimize allocation of resources towards promising research areas.
Publisher
Izdatel'skii dom Spektr, LLC