1. Dean, Jeffrey and Ghemawat, Sanjay: MapReduce: simplified data processing on large clusters. Communications of the ACM. 51(1), 107–113 (2008)
2. Apache Hadoop, https://hadoop.apache.org/
3. Zaharia, Matei and Chowdhury, Mosharaf and Das, Tathagata and Dave, Ankur and Ma, Justin and McCauly, Murphy and Franklin, Michael J and Shenker, Scott and Stoica, Ion: Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. Presented as part of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). 15–28 (2012)
4. Zaharia, Matei and Xin, Reynold S and Wendell, Patrick and Das, Tathagata and Armbrust, Michael and Dave, Ankur and Meng, Xiangrui and Rosen, Josh and Venkataraman, Shivaram and Franklin, Michael J and others: Apache spark: a unified engine for big data processing. Communications of the ACM. 59(11), 56–65 (2016)
5. Apache Mahout, http://mahout.apache.org/