Affiliation:
1. Department of Mathematics and Computer Science, University of Sciences and Technology-Mohammed Boudiaf USTO, Oran, Algeria
2. Department of Computer Science, Faculty of Exact and Applied Sciences, University of Oran 1, Ahmed Ben Bella, Oran, Algeria
Abstract
The volume of business data is increasing very quickly, most of these data are relational. The need to extract knowledge with Data Mining requires keeping all historical data. This complicates more and more the processing and storage of data, and requires further power and capacity which surpass the ability of any machine. So, using distributed environments like cloud computing becomes very useful to share storage and processing between multiple nodes. Unfortunately, data based on relational model cannot be easily used in cloud because of its rigidity and elasticity in such environments. To solve this issue, new big data systems appear such as NoSQL that make data easier to share and distribute in cloud environments. So, this is theoretically beneficial for data mining use case. However, in practice we need to prove it by evaluating performance for both multi-nodes NoSQL and mono-node relational. Also, in case of cloud, it is very interesting to know if performance is still proportionally increasing according to the number of nodes, and if there is an optimum number of nodes in which performance becomes nearly steady or starts dropping off. Motivated by this topic, we propose in this paper an approach to migrate relational data to an appropriate NoSQL system in cloud environment, and then evaluate their performance to capture some interesting results for Data mining. As experimentation, we use industrial data deployed in a data mining process of an oil and gas company. After migrating these data, we perform some experiments to compare and evaluate storage, processing and execution time. As objective, we verify data elasticity, run time performance, and try to find the optimum number of nodes.
Publisher
World Scientific Pub Co Pte Lt
Subject
Library and Information Sciences,Computer Networks and Communications,Computer Science Applications
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献