Context-aware Big Data Quality Assessment: A Scoping Review

Author:

Fadlallah Hadi1ORCID,Kilany Rima1ORCID,Dhayne Houssein1ORCID,El Haddad Rami1ORCID,Haque Rafiqul2ORCID,Taher Yehia3ORCID,Jaber Ali4ORCID

Affiliation:

1. Saint-Joseph University, Lebanon

2. Intelligencia R & D, France

3. University of Versailles Saint-Quentin-en-Yvelines (UVSQ), France

4. Lebanese University, Lebanon

Abstract

The term data quality refers to measuring the fitness of data regarding the intended usage. Poor data quality leads to inadequate, inconsistent, and erroneous decisions that could escalate the computational cost, cause a decline in profits, and cause customer churn. Thus, data quality is crucial for researchers and industry practitioners. Different factors drive the assessment of data quality. Data context is deemed one of the key factors due to the contextual diversity of real-world use cases of various entities such as people and organizations. Data used in a specific context (e.g., an organization policy) may need to be more efficacious for another context. Hence, implementing a data quality assessment solution in different contexts is challenging. Traditional technologies for data quality assessment reached the pinnacle of maturity. Existing solutions can solve most of the quality issues. The data context in these solutions is defined as validation rules applied within the ETL (extract, transform, load) process, i.e., the data warehousing process. In contrast to traditional data quality management, it is impossible to specify all the data semantics beforehand for big data. We need context-aware data quality rules to detect semantic errors in a massive amount of heterogeneous data generated at high speed. While many researchers tackle the quality issues of big data, they define the data context from a specific standpoint. Although data quality is a longstanding research issue in academia and industries, it remains an open issue, especially with the advent of big data, which has fostered the challenge of data quality assessment more than ever. This article provides a scoping review to study the existing context-aware data quality assessment solutions, starting with the existing big data quality solutions in general and then covering context-aware solutions. The strength and weaknesses of such solutions are outlined and discussed. The survey showed that none of the existing data quality assessment solutions could guarantee context awareness with the ability to handle big data. Notably, each solution dealt only with a partial view of the context. We compared the existing quality models and solutions to reach a comprehensive view covering the aspects of context awareness when assessing data quality. This led us to a set of recommendations framed in a methodological framework shaping the design and implementation of any context-aware data quality service for big data. Open challenges are then identified and discussed.

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems and Management,Information Systems

Reference143 articles.

1. Ziawasch Abedjan Lukasz Golab and Felix Naumann. 2017. Data profiling: A tutorial. In Proceedings of the 2017 ACM International Conference on Management of Data (2017) 1747–1751.

2. Data profiling;Abedjan Ziawasch;Synthes. Lect. Data Manag.,2018

3. Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Sören Auer, and Jens Lehmann. 2013. Crowdsourcing linked data quality assessment. In The Semantic Web–ISWC 2013: 12th International Semantic Web Conference, Sydney, NSW, Australia, October 21–25, 2013, Proceedings, Part II 12. Springer, 260–276.

4. Divyakant Agrawal, Philip Bernstein, Elisa Bertino, Susan Davidson, Umeshwas Dayal, Michael Franklin, Johannes Gehrke, Laura Haas, Alon Halevy, Jiawei Han et al. 2011. Challenges and Opportunities with Big Data [White Paper]. Technical Report. Computing Research Association. Retrieved from http://cra.org/ccc/docs/init/bigdatawhitepaper.pdf.

5. Jameela Al-Jaroodi and Nader Mohamed. 2018. Service-oriented architecture for big data analytics in smart cities. In 18th IEEE/ACM International Symposium on Cluster Cloud and Grid Computing (CCGRID’18) . 633–640.

Cited by 8 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. cuallee: A Python package for data quality checks across multiple DataFrame APIs;Journal of Open Source Software;2024-06-23

2. AI for science: Predicting infectious diseases;Journal of Safety Science and Resilience;2024-06

3. Current Challenges of Big Data Quality Management in Big Data Governance: A Literature Review;Lecture Notes on Data Engineering and Communications Technologies;2024

4. Addressing the Velocity Challenge of Big Data in Radiation Pollution Monitoring: Implementation and Demonstration;2023 IEEE 4th International Multidisciplinary Conference on Engineering Technology (IMCET);2023-12-12

5. CTXDQ: An Automated Context-Driven Data Quality Assessment;2023 IEEE 4th International Multidisciplinary Conference on Engineering Technology (IMCET);2023-12-12

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3