1. Agarwal S, Kandula S, Bruno N, Wu M-C, Stoica I, Zhou J (2012) Reoptimizing data parallel computing. In: 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12), pp 281–294
2. Al-Sayeh H, Hagedorn S, Sattler K (2020) A gray-box modeling methodology for runtime prediction of apache spark jobs. Distribut Parall Databases 38(4):819–839. https://doi.org/10.1007/s10619-020-07286-y
3. Al-Sayeh H, Memishi B, Jibril MA, Paradies M, Sattler K (2022) Juggler: Autonomous cost optimization and performance prediction of big data applications. In: SIGMOD ’22: International Conference on Management of Data, Philadelphia, PA, USA, 12–17 June 2022, pp 1840–1854. https://doi.org/10.1145/3514221.3517892
4. Al-Sayeh H, Memishi B, Paradies M, Sattler K-U (2020) Masha: sampling-based performance prediction of big data applications in resource-constrained clusters. In: The 1st Workshop on Distributed Infrastructure, Systems, Programming and AI (DISPA). Very Large Data Base Endowment Inc.(VLDB Endowment)
5. Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K (2015) Apache flink™: stream and batch processing in a single engine. IEEE Data Eng Bull 38(4):28–38