The problem of analysis of big web data and the use of data mining technology for processing and searching patterns in big web data on a practical example

Author:

Mulyukova K. V.1,Kureichik V. M.1

Affiliation:

1. Engineering-Technological Academy of SFU

Abstract

The purpose of the work is to study the current problems and prospects of the solution for processing big data received or stored in the Internet (web data), as well as the possibility of practical realization of Data Mining technology for big web data on practical example. Materials and methods. The study included a review of bibliographic sources on big data analysis problems.Data Mining technology was used to analyze large web data, as well as computer modeling of a practical problem using the C # programming language and creating a DDL database structure for accumulating web data.Results. In the course of the work, the specifics of big data were described, the main characteristics of big data were highlighted, and modern approaches to processing big data were analyzed. A brief description of the horizontal-scalable architecture and the BI-solution architecture for big data processing is given. The problems of processing large web data are formulated: limiting the speed of access to data, providing access via network protocols through general-purpose networks.An example showing the approach to processing large web data was also implemented. Based on the idea of big data, the described complexities of web data processing and the methods of Data Mining, techniques were proposed for effectively solving the practical problem of processing and searching patterns in a large data array.The following classes have been developed in the C # programming language:Class of receiving web data via the Internet; Data conversion class;Intelligent data processing class;Created DDL script that creates a structure for the accumulation of web data.A single UML class diagram has been developed.The constructed system of data and classes allows to solve the main part of the problems of processing large web data and perform intelligent processing using Data Mining technology in order to solve the problem posed of identifying certain records in a large array. The combination of object-oriented approach, neural networks and BI-analysis to filter data will speed up the process of data processing and obtaining the result of the studyConclusion. According to the results of the study, it can be argued that the current state of technology for analyzing large web data allows you to efficiently process data objects, identify patterns, get hidden data and get full-fledged statistical data.The obtained results can be used both for the purpose of the initial study of big data processing technologies, and as a basis for developing an already real application for analyzing web data. The use of neural networks and the created universal classes-handlers makes the created architecture flexible and self-learning, and the class declarations and the base DDL structure will greatly simplify the development of program code.

Publisher

Plekhanov Russian University of Economics (PRUE)

Subject

General Earth and Planetary Sciences,General Environmental Science

Reference17 articles.

1. Khashkovskiy V.V., Shkurko A.N. Modern approaches in the organization of systems for processing large volumes of data.Izvestiya Yuzhnogo federal’nogo universiteta. Tekhnicheskiye nauki = News of the Southern Federal University. Technical science. 2014; 8 (157): 241–250. (In Russ.)

2. Barsegyan A.A., Kupriyanov M.S., Stepanenko V.V., KHolod I.I. Tekhnologii analiza dannykh. Data Mining, Visual Mining, Text Mining, OLAP. 2 izd. = Data analysis technologies. Data Mining, Visual Mining, Text Mining, OLAP. 2nd ed. SPb.: BHV-Petersburg; 2007. 384 p. (In Russ.)

3. Marts N., Uorren D. Bol’shiye dannyye. Printsipy i praktika postroyeniya masshtabiruyemykh sistem obrabotki dannykh v real’nom vremeni = Big data. Principles and practice of building scalable data processing systems in real time. Moscow: Williams; 2017. 368 p. (In Russ.)

4. Koshik A. Veb-analitika 2.0 na praktike. Tonkosti i luchshiye metodiki = Web Analytics 2.0 in practice. Subtleties and best practices. Moscow: Williams; 2014. 528 p. (In Russ.)

5. Bol’shiye Dannyye = Big Data [Internet]. Tolkovyy slovar’ na Akademike = Explanatory Dictionary on Academician. 2014. URL: https://dic.academic.ru/dic.nsf/ruwiki/1422719 (Cited: 04.04.2019). (In Russ.)

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3