Affiliation:
1. University of Massachusetts, Boston
2. University of Rhode Island
Abstract
Data quality remains a persistent problem in practice and a challenge for research. In this study we focus on the four dimensions of data quality noted as the most important to information consumers, namely accuracy, completeness, consistency, and timeliness. These dimensions are of particular concern for operational systems, and most importantly for data warehouses, which are often used as the primary data source for analyses such as classification, a general type of data mining. However, the definitions and conceptual models of these dimensions have not been collectively considered with respect to data mining in general or classification in particular. Nor have they been considered for problem complexity. Conversely, these four dimensions of data quality have only been indirectly addressed by data mining research. Using definitions and constructs of data quality dimensions, our research evaluates the effects of both data quality and problem complexity on generated data and tests the results in a real-world case. Six different classification outcomes selected from the spectrum of classification algorithms show that data quality and problem complexity have significant main and interaction effects. From the findings of significant effects, the economics of higher data quality are evaluated for a frequent application of classification and illustrated by the real-world case.
Publisher
Association for Computing Machinery (ACM)
Subject
Information Systems and Management,Information Systems
Cited by
77 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. The METRIC-framework for assessing data quality for trustworthy AI in medicine: a systematic review;npj Digital Medicine;2024-08-03
2. AI Data Readiness Inspector (AIDRIN) for Quantitative Assessment of Data Readiness for AI;Proceedings of the 36th International Conference on Scientific and Statistical Database Management;2024-07-10
3. Towards an End-to-End Data Quality Optimizer;2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW);2024-05-13
4. Unified Data Framework for Enhanced Data Management, Consumption, Provisioning, Processing and Movement;Proceedings of the 7th International Conference on Networking, Intelligent Systems and Security;2024-04-18
5. Software Engineering Approach for Designing Apparel Business Data Analytics;Advances in Business Information Systems and Analytics;2024-02-23