Affiliation:
1. University of Potsdam, Potsdam, Germany
Abstract
Raw data are often messy: they follow different encodings, records are not well structured, values do not adhere to patterns, etc. Such data are in general not fit to be ingested by downstream applications, such as data analytics tools, or even by data management systems. The act of obtaining information from raw data relies on some data preparation process. Data preparation is integral to advanced data analysis and data management, not only for data science but for any data-driven applications. Existing data preparation tools are operational and useful, but there is still room for improvement and optimization. With increasing data volume and its messy nature, the demand for prepared data increases day by day.
To cater to this demand, companies and researchers are developing techniques and tools for data preparation. To better understand the available data preparation systems, we have conducted a survey to investigate (1) prominent data preparation tools, (2) distinctive tool features, (3) the need for preliminary data processing even for these tools and, (4) features and abilities that are still lacking. We conclude with an argument in support of automatic and intelligent data preparation beyond traditional and simplistic techniques.
Publisher
Association for Computing Machinery (ACM)
Subject
Information Systems,Software
Reference32 articles.
1. Trifacta end user data preparation. https://www.trifacta.com/wp-content/ uploads/2018/02/ End-User-Data-Preparation-Market-Study-2018. pdf. Accessed: 2019-09--19. Trifacta end user data preparation. https://www.trifacta.com/wp-content/ uploads/2018/02/ End-User-Data-Preparation-Market-Study-2018. pdf. Accessed: 2019-09--19.
2. Detecting data errors
3. Self-Service Data Preparation and Analysis by Business Users
4. AIDE
Cited by
27 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献