Abstract
AbstractIn the Brazilian state of São Paulo, contaminated sites (CSs) constitute threats to health, environment and socioeconomic situation of populations. Over the past two decades, the Environmental Agency of São Paulo (CETESB) has monitored these known CSs. This paper discusses the produced dataset through digitising the CETESB reports and making them accessible to the public in English. The dataset reports on qualitative aspects of contamination within the registered sites (e.g., contamination type and spread) and their management status. The data was extracted from CETESB reports using a machine-learning computer vision algorithm. It comprises two components: an optical character recognition (OCR) engine for text extraction and a convolutional neural network (CNN) image classifier to identify checked boxes. The digitisation was followed by harmonisation and quality assurance processes to ensure the consistency and validity of the data. Making this dataset accessible will allow future work on predictive analysis and decision-making and will inform the required policy-making to improve the management of the CSs in Brazil.
Publisher
Springer Science and Business Media LLC
Reference42 articles.
1. World Health Organization, Regional Office for Europe. Contaminated sites and health: Report of two WHO workshops: Syracuse, Italy, 18 November 2011 & Catania, Italy, 21-22 June 2012. Available at: https://iris.who.int/handle/10665/108623 (2013).
2. Pasetto, R., Di Fonzo, D., De Santis, M., Porcu, R. & Zona, A. Environmental health inequalities among municipalities affected by contaminated sites in Italy. Environ. Justice 15, 228–234 (2022).
3. Martuzzi, M., Pasetto, R. & Martin-Olmedo, P. Industrially contaminated sites and health. J. Environ. Public Health 2014, 198574, 2 pages (2014).
4. Pirastu, R. et al. The health profile of populations living in contaminated sites: Sentieri Approach. J. Environ. Public Health 2013, 939267, 13 pages (2013).
5. Fent, K. Ecotoxicological effects at contaminated sites. Toxicology 205, 223–240 (2004).