Affiliation:
1. School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
Abstract
Machine learning (ML) technology is rapidly evolving, and the quality of ML systems is becoming an increasingly focal point of attention. Since the ML system is shaped by the dataset it learns from, its quality largely depends on the quality of the dataset. However, the dataset is often collected in a non-standardized process and few requirements and analysis methods are given to assist in identifying the needed dataset. This leads to no guarantee for the quality of dataset, affecting the generalization ability of model and resulting in low training efficiency. To address these issues, this paper proposes an ontology-based requirement analysis method where ontology integrates domain knowledge into the process of data requirements analysis and the coverage criteria on ontology are given for specifying data requirements which can later be used to guide the high-quality construction of the dataset. We held an experiment on an image recognition system in the field of autonomous driving to validate our approach. The result shows that the ML system trained by the dataset constructed through our data requirements analysis method has a better performance.
Funder
school-level project of Shanghai University
Reference27 articles.
1. Andrew Ng, AI Minimalist: The Machine-Learning Pioneer Says Small is the New Big;Strickland;IEEE Spectr.,2022
2. Non-functional requirements for machine learning: Understanding current use and challenges among practitioners;Habibullah;Requir. Eng.,2023
3. Ahmad, K., Bano, M., Abdelrazek, M., Arora, C., and Grundy, J. (2021, January 20–24). What’s up with requirements engineering for artificial intelligence systems?. Proceedings of the 2021 IEEE 29th International Requirements Engineering Conference (RE), Notre Dame, IN, USA.
4. CCTSDB 2021: A more comprehensive traffic sign detection benchmark;Zhang;Hum.-Centric Comput. Inf. Sci.,2022
5. Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W.W., Salakhutdinov, R., and Manning, C.D. (2018). HotpotQA: A dataset for diverse, explainable multi-hop question answering. arXiv.