Context-aware Big Data Quality Assessment: A Scoping Review-Reference-Cited by-同舟云学术

Context-aware Big Data Quality Assessment: A Scoping Review

Published:2023-08-22 Issue:3 Volume:15 Page:1-33
ISSN:1936-1955
Container-title:Journal of Data and Information Quality
language:en
Short-container-title:J. Data and Information Quality

Author:

Fadlallah Hadi¹^ORCID,Kilany Rima¹^ORCID,Dhayne Houssein¹^ORCID,El Haddad Rami¹^ORCID,Haque Rafiqul²^ORCID,Taher Yehia³^ORCID,Jaber Ali⁴^ORCID

Affiliation:

1. Saint-Joseph University, Lebanon

2. Intelligencia R & D, France

3. University of Versailles Saint-Quentin-en-Yvelines (UVSQ), France

4. Lebanese University, Lebanon

Abstract

The term data quality refers to measuring the fitness of data regarding the intended usage. Poor data quality leads to inadequate, inconsistent, and erroneous decisions that could escalate the computational cost, cause a decline in profits, and cause customer churn. Thus, data quality is crucial for researchers and industry practitioners. Different factors drive the assessment of data quality. Data context is deemed one of the key factors due to the contextual diversity of real-world use cases of various entities such as people and organizations. Data used in a specific context (e.g., an organization policy) may need to be more efficacious for another context. Hence, implementing a data quality assessment solution in different contexts is challenging. Traditional technologies for data quality assessment reached the pinnacle of maturity. Existing solutions can solve most of the quality issues. The data context in these solutions is defined as validation rules applied within the ETL (extract, transform, load) process, i.e., the data warehousing process. In contrast to traditional data quality management, it is impossible to specify all the data semantics beforehand for big data. We need context-aware data quality rules to detect semantic errors in a massive amount of heterogeneous data generated at high speed. While many researchers tackle the quality issues of big data, they define the data context from a specific standpoint. Although data quality is a longstanding research issue in academia and industries, it remains an open issue, especially with the advent of big data, which has fostered the challenge of data quality assessment more than ever. This article provides a scoping review to study the existing context-aware data quality assessment solutions, starting with the existing big data quality solutions in general and then covering context-aware solutions. The strength and weaknesses of such solutions are outlined and discussed. The survey showed that none of the existing data quality assessment solutions could guarantee context awareness with the ability to handle big data. Notably, each solution dealt only with a partial view of the context. We compared the existing quality models and solutions to reach a comprehensive view covering the aspects of context awareness when assessing data quality. This led us to a set of recommendations framed in a methodological framework shaping the design and implementation of any context-aware data quality service for big data. Open challenges are then identified and discussed.

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems and Management,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/3603707

Reference143 articles.

1. Ziawasch Abedjan Lukasz Golab and Felix Naumann. 2017. Data profiling: A tutorial. In Proceedings of the 2017 ACM International Conference on Management of Data (2017) 1747–1751.

2. Data profiling;Abedjan Ziawasch;Synthes. Lect. Data Manag.,2018

3. Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Sören Auer, and Jens Lehmann. 2013. Crowdsourcing linked data quality assessment. In The Semantic Web–ISWC 2013: 12th International Semantic Web Conference, Sydney, NSW, Australia, October 21–25, 2013, Proceedings, Part II 12. Springer, 260–276.

4. Divyakant Agrawal, Philip Bernstein, Elisa Bertino, Susan Davidson, Umeshwas Dayal, Michael Franklin, Johannes Gehrke, Laura Haas, Alon Halevy, Jiawei Han et al. 2011. Challenges and Opportunities with Big Data [White Paper]. Technical Report. Computing Research Association. Retrieved from http://cra.org/ccc/docs/init/bigdatawhitepaper.pdf.

5. Jameela Al-Jaroodi and Nader Mohamed. 2018. Service-oriented architecture for big data analytics in smart cities. In 18th IEEE/ACM International Symposium on Cluster Cloud and Grid Computing (CCGRID’18) . 633–640.

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. cuallee: A Python package for data quality checks across multiple DataFrame APIs;Journal of Open Source Software;2024-06-23

2. AI for science: Predicting infectious diseases;Journal of Safety Science and Resilience;2024-06

3. Current Challenges of Big Data Quality Management in Big Data Governance: A Literature Review;Lecture Notes on Data Engineering and Communications Technologies;2024

4. Addressing the Velocity Challenge of Big Data in Radiation Pollution Monitoring: Implementation and Demonstration;2023 IEEE 4th International Multidisciplinary Conference on Engineering Technology (IMCET);2023-12-12

5. CTXDQ: An Automated Context-Driven Data Quality Assessment;2023 IEEE 4th International Multidisciplinary Conference on Engineering Technology (IMCET);2023-12-12