Affiliation:
1. Corning Incorporated, New York, USA
Abstract
In this chapter, the benefits that can be derived by using different existing data formats for industrial IoT (IIoT) and factory of the future (FoF) applications are analyzed. For factory floor automation, in-depth performance evaluation in terms of storage memory footprint and usage advantages and disadvantages are provided for various traditional and state-of-the-art data formats including: YAML, Feather, JSON, XML, Parquet, CSV, TXT, and Msgpack. Benefits or otherwise of using these data formats for cloud based FoF applications including for setting up robust Delta Lakes having very reactive bronze, silver, and gold data tables are also discussed. Based on extensive literature survey, this chapter provides the most comprehensive data storage performance evaluation of different data formats when IIoT and FoF applications are considered. The companion chapter, Part II, provides an extensive Pythonlibraries and examples that are useful for converting data from one format to another.
Reference46 articles.
1. Accenture. (n.d.). Closing the Data-value Gap: How to Become Data Driven and Pivot to the New. White Paper, Accenture. https://www.accenture.com/_acnmedia/pdf-108/accenture-closing-data-value-gap-fixed.pdf
2. AckermanH.KingJ. (2019). Operationalizing the Data Lake – Building and Extracting Value from a Data Lake with a Cloud Native Data Platform. O’Reilly Media, Incorporated.
3. Ahmed, S., Ferzund, J., Rehman, A., Usman Ali, A., Sarwar, M., & Mehmood, A. (2017). Modern Data Formats for Big Bioinformatics Data Analytics. Int’l Journal of Advanced Computer Sc. & Applications (IJACSA), 8(4).
4. Apache Arrow. (2019). Feather File Format. Apache Arrow. https://arrow.apache.org/docs/python/feather.html#:~:text=There%20are%20two%20file%20format,available%20in%20Apache%20Arrow%200.17
5. Belov, V., Tatarintsev, A., & Nikulchev, E. (2021). Choosing a Data Storage Format in the Apache Hadoop System. Symmetry, 13.