Research of the methods of creating content aggregation systems-Reference-Cited by-同舟云学术

Research of the methods of creating content aggregation systems

Published:2022-01 Issue:1 Volume: Page:9-31
ISSN:2454-0714
Container-title:Программные системы и вычислительные методы
language:en
Short-container-title:

Author:

Kiryanov Denis Aleksandrovich

Abstract

The subject of this research is the key methods for creating the architecture of information aggregators, methods for increasing scalability and effectiveness of such systems, methods for reducing the delay between the publication of new content by the source and emergence of its copy in the information aggregator. In this research, the content aggregator implies the distributed high-load information system that automatically collects information from various sources, process and displays it on a special website or mobile application. Particular attention is given to the basic principles of content aggregation: key stages of aggregation and criteria for data sampling, automation of aggregation processes, content copy strategies, and content aggregation approaches. The author's contribution consists in providing detailed description of web crawling and fuzzy duplicate detection systems. The main research result lies in the development of high-level architecture of the content aggregation system. Recommendations are given on the selection of the architecture of styles and special software regime that allows creating the systems for managing distributed databases and message brokers. The presented architecture aims to provide high availability, scalability for high query volumes, and big data performance. To increase the performance of the proposed system, various caching methods, load balancers, and message queues should be actively used. For storage of the content aggregation system, replication and partitioning must be used to improve availability, latency, and scalability. In terms of architectural styles, microservice architecture, event-driven architecture, and service-based architecture are the most preferred architectural approaches for such system.

Publisher

Aurora Group, s.r.o

Reference95 articles.

1. August 2021 Web Server Survey // Netcraft News [Website]. 2021. URL: https://news.netcraft.com/archives/2021/08/25/august-2021-web-server-survey.html (last accessed: 18.01.2022).

2. Maurice de Kunder. The size of the World Wide Web (The Internet) // WorldWideWebSize.com. Daily Estimated Size of The World Wide Web [Website]. 2021. URL: https://www.worldwidewebsize.com (last accessed: 18.01.2022).

3. World Internet Users and 2021 Population Stats // Internet World Stats [Website]. 2021. URL: https://www.internetworldstats.com/stats.htm (last accessed: 18.01.2022).

4. G. Paliouras, A. Mouzakidis, C. Skourlas, M. Virvou, C. L. Jain. PNS: A Personalized News Aggregator on the Web // Intelligent Interactive Systems in Knowledge-Based Environments. 2008. URL: https://doi.org/10.1007/978-3-540-77471-6_10 (last accessed: 18.01.2022).

5. David Reinsel, John Gantz, John Rydning. The Digitization of the World – From Edge to Core // An IDC White Paper. 2018. 28p. URL: https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf (last accessed: 18.01.2022).

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Scalable Aggregation System Designed to Process 50,000 RSS Feeds;Программные системы и вычислительные методы;2022-04