1. Dean J, Ghemawat S (2004) Mapreduce: simplified data processing on large clusters, vol 51. pp 137–150. https://doi.org/10.1145/1327452.1327492
2. Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on: 2010, vol 26. https://doi.org/10.1109/MSST.2010.5496972
3. Zaharia M, Chowdhury NMM, Franklin M, Shenker S, Stoica I (2010) Spark: Cluster computing with working sets. Technical Report UCB/EECS-2010-53, EECS Department, University of California, Berkeley. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-53.html
4. Shoro TRSAG (2015) Big data analysis: apache spark perspective. Glob J Comput Sci Technol
5. Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauly M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In NSDI 15–28