1. S. E. Sim, S. Easterbrook, and R. C. Holt, "Using benchmarking to advance research: A challenge to software engineering," in 25th International Conference on Software Engineering, 2003. Proceedings. IEEE, 2003, pp. 74--83.
2. H. K. Wright, M. Kim, and D. E. Perry, "Validity concerns in software engineering research," in Proceedings of the FSE/SDP workshop on Future of software engineering research, 2010, pp. 411--414.
3. R. Just, D. Jalali, and M. D. Ernst, "Defects4j: A database of existing faults to enable controlled testing studies for java programs," in Proceedings of the 2014 international symposium on software testing and analysis, 2014, pp. 437--440.
4. A. Jacovi, A. Caciularu, O. Goldman, and Y. Goldberg, "Stop uploading test data in plain text: Practical strategies for mitigating data contamination by evaluation benchmarks," arXiv preprint arXiv:2305.10160, 2023.
5. Q. Zhang, T. Zhang, J. Zhai, C. Fang, B. Yu, W. Sun, and Z. Chen, "A critical review of large language model on software engineering: An example from chatgpt and automated program repair," 2023.