Abstract
AbstractThe advent of the Big Data era has necessitated a transformational shift in statistical research, responding to the novel demands of data science. Despite extensive discourse within statistical communities on confronting these emerging challenges, we offer our unique perspectives, underscoring the extended responsibilities of statisticians in pre-analysis and post-analysis tasks. Moreover, we propose a new definition and classification of Big Data based on data sources: Type I Big Data, which is the result of aggregating a large number of small datasets via data sharing and curation, and Type II Big Data, which is the Real-World Data (RWD) amassed from business operations and practices. Each category necessitates distinct data preprocessing and preparation (DPP) methods, and the objectives of analysis as well as the interpretation of results can significantly diverge between these two types of Big Data. We further suggest that the statistical communities should consider adopting and rapidly incorporating new paradigms and cultures by learning from other disciplines. Particularly, beyond Breiman’s (Stat Sci 16(3):199–231, 2021) two modeling cultures, statisticians may need to pay more attention to a newly emerging third culture: the integration of algorithmic modeling with multi-scale dynamic modeling based on fundamental physics laws or mechanisms that generate the data. We draw from our experience in numerous related research projects to elucidate these novel concepts and perspectives.
Publisher
Springer Science and Business Media LLC
Reference38 articles.
1. Donoho DL (2017) 50 years of data science. J Comput Graph Stat 26:745–766 (Based on a presentation at the Tukey Centennial Workshop, Princeton, NJ, September18, 2015)
2. Gelman A, Vehtari A (2021) What are the most important statistical ideas of the past 50 years?”. J Am Stat Assoc 116(536):2087–2097
3. He X, Madigan D, Yu B, Weller J (2019) Statistics at a crossroads: who is for the challenge? Report based on a NSF-funded workshop “Statistics at a Crossroads: Challenges and Opportunities in the Data Science Era”, on October 15–17, 2018
4. Wender BA (2017) Refining the concept of scientific inference when working with Big Data: proceedings of a workshop. US National Academy Report
5. Wild CJ, Pfannkuch M (1999) Statistical thinking in empirical enquiry. Int Stat Rev 67(3):223–265