A Scalable Aggregation System Designed to Process 50,000 RSS Feeds-Reference-Cited by-同舟云学术

A Scalable Aggregation System Designed to Process 50,000 RSS Feeds

Published:2022-04 Issue:4 Volume: Page:20-38
ISSN:2454-0714
Container-title:Программные системы и вычислительные методы
language:en
Short-container-title:

Author:

Kiryanov Denis Aleksandrovich

Abstract

The subject of the study is the architecture of the RSS feed aggregation system. The author considers in detail such aspects of the topic as choosing the right data aggregation strategy, an approach to scaling a distributed system, designing and implementing the main modules of the system, such as an aggregation strategy definition module, a content aggregation module, a data processing module, a search module. Particular attention in this study is given to a detailed description of the libraries and frameworks chosen for the implementation of the system under consideration, as well as databases. The main part of the system under consideration is implemented in the C# programming language (.Net Core) and is cross-platform. The study describes the interaction with the main data stores used in the development of the aggregation system, which are PostgreSQL and Elasticsearch. The main conclusion of the study is that before developing an aggregation system, it is necessary to analyze the publication activity of data sources, on the basis of which it is possible to form an acceptable strategy for updating the search index, saving a significant amount of resources. computing power. Content aggregation systems, such as the one considered in this study, should be distributed, built on the basis of event-driven and microservice architectures. This approach will make the system resistant to high loads and failures, as well as easily expandable. The author's special contribution to the study of the topic is a detailed description of the high-level architecture of the RSS aggregator, designed to process 50,000 channels.

Publisher

Aurora Group, s.r.o

Subject

Media Technology

Reference66 articles.

1. IT v Rossii [Elektronnyi resurs]. URL: https://devsday.ru/ (data obrashcheniya: 07.11.2022).

2. Kir'yanov D. A. Issledovanie metodov postroeniya sistem agregatsii kontenta // Programmnye sistemy i vychislitel'nye metody. 2022. № 1. URL: https://doi.org/10.7256/2454-0714.2022.1.37341 (data obrashcheniya: 07.11.2022).

3. PostgreSQL: Documentation. Chapter 12. Full Text Search [Elektronnyi resurs]. URL: https://www.postgresql.org/docs/current/textsearch-intro.html (data obrashcheniya: 07.11.2022).

4. Elasticsearch: The Official Distributed Search & Analytics Engine [Elektronnyi resurs]. URL: https://www.elastic.co/elasticsearch/ (data obrashcheniya: 07.11.2022).

5. Christopher Olston, Marc Najork. Web Crawling // Foundations and Trends. 2010. №3. URL: http://dx.doi.org/10.1561/1500000017 (data obrashcheniya: 07.11.2022).