Affiliation:
1. North China Institute of Science and Technology
2. Central University of Finance and Economics
Abstract
With advances in data collection and generation technologies, environments that produce data streams is more and more. In recent years, the network application is further universal and the applications of a single data stream transfer toward a multi-node distributed data streams, such as sensor network, network monitoring, web log analysis and the credit card transaction data of multiple sites. These data is not only real-time, continuous and large scale, but also distributed. How to manage and analyze large dynamic datasets is an important subject that researchers are faced with. In view of the situation, it presented the formalization description of homogeneous and heterogeneous distributed data stream in this paper, analyzed advantages and disadvantages of the centralized stream processing architecture and distributed streaming processing architecture, discussed the recent progress in distributed data stream classification algorithm, summed up the problems and challenges faced by the distributed data stream mining, and possible future research directions.
Publisher
Trans Tech Publications, Ltd.