Author:
Shinghal Deepti,Shinghal Kshitij,Saxena Amit,Saxena Shuchita,Misra Rajul
Abstract
Certain applications requires a scalable cost effective storage and execution system with facility to store data and have feature to analyze data to its finest granularity level in future. This increase the quality and accuracy of result analysis. Wireless sensor Network (WSN) nodes deployed for certain data intensive applications such as surveillance, war zone monitoring etc. generates a massive amount of raw data. There is an essential requirement of storing this data in its native format for analytics purpose in anticipation of future requirements. In present work, a data lake implemented on Amazon AWS is presented for storage of data in original version for future reference. Data Lake implementation service is utilized for storing the data generated in big volumes, high speed and in variety. The data in Data Lake is stored in three zones i.e. raw, reformed and curated. This paper proposes an efficient method of storing structured, unstructured and semi-structured, data in to Data Lake for future retrieval and analytics purpose. The results are comprehensively presented highlighting the advantages of using Data Lake in place of data warehouses.