1. Sara Alspaugh, Nava Zokaei, Andrea Liu, Cindy Jin, and Marti A Hearst. 2018. Futzing and moseying: Interviews with professional data analysts on exploration practices. IEEE transactions on visualization and computer graphics 25, 1 (2018), 22--31.
2. Sumon Biswas, Mohammad Wardat, and Hridesh Rajan. 2022. The art and practice of data science pipelines: A comprehensive study of data science pipelines in theory, in-the-small, and in-the-large. In Proceedings of the 44th International Conference on Software Engineering. 2091--2103.
3. A reflexive exploration of two qualitative data coding techniques
4. Data management for machine learning: A survey;Chai Chengliang;IEEE Transactions on Knowledge and Data Engineering,2022
5. Nadiia Chepurko, Ryan Marcus, Emanuel Zgraggen, Raul Castro Fernandez, Tim Kraska, and David Karger. 2020. ARDA: automatic relational data augmentation for machine learning. PVLDB (2020), 1373--1387.