1. Mostafaeipour A, Rafsanjani AJ, Ahmadi M, Dhanraj JA (2020) Investigating the performance of hadoop and spark platforms on machine learning algorithms. J Supercomput 77(2):1–28
2. Apache. Apache Spark. https://spark.apache.org. Accessed 24 Oct 2021
3. Karau H, Konwinski A, Wendell P, Zaharia M (2015) Learning Spark. O’Reilly Media Inc. Sebastopol pp 1-30
4. Zhang XW, Li ZH, Liu GS, Xu JJ, Xie TK (2018) A spark scheduling strategy for heterogeneous cluster. Comput Mater Continua 55(3):405–417
5. Ahmed N, Barczak A, Susnjak T et al (2020) A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench. J Big Data 110(7):1–18