Affiliation:
1. Statistical Research and Training Center, Tehran, Iran
2. Statistical Center of Iran, Tehran, Iran
Abstract
Data pervasiveness was made possible by the advent of new technologies such as the Internet and the World Wide Web in every human and non-human activity. This created an exponential increase or data explosion in data generation, coined under the term Big data. Alternatively, Big Data sources can contribute to the reduction of the response burden or they can be used only to study some economic or social phenomena before designing a statistical survey which is inherently expensive to pilot. Also, incorporating Big Data sources into official statistics means maintaining a net competitive advantage and relevance of the official statistics products compared to those provided by a plethora of commercial players, with reference to large corporations that are active in the field of information technology. In this paper, the web scraping technique was used to extract the daily prices of the food and drinks products in order to replace them with conventional prices which had been used for price indices. Moreover, these sorts of new datasets enable us to calculate the indices in smaller time scales like weekly or daily basis in comparison to the conventional approach which is possible only on monthly basis. Although web scraping has its own problems, it is more economically friendly, accurate, and time-saving, especially in urban areas. Findings revealed that the web scraping technique can be applied as an effective alternative to conventional methods for CPI. Also, this technique can be used for other price statistics.
Subject
Statistics, Probability and Uncertainty,Economics and Econometrics,Management Information Systems
Reference3 articles.
1. Basic statistics of jevons and carli indices under the gbm price model;Bialek;Journal of Official Statistics,2020
2. Automated data collection from web sources for official statistics: First experiences;Hoekstra;Statistical Journal of the IAOS,2012
3. Web scraping techniques to collect data on consumer electronics and airfares for Italian HICP compilation;Polidoro;Statistical Journal of the IAOS,2015