Affiliation:
1. Information Management Systems Institute, R.C. Athena, Marousi, Greece
2. Department of Informatics 8 Telecommunications, University of Athens, Panepistimiopolis, Ilisia, Athens, Greece
Abstract
Data analytics has an ever increasing impact on tackling various societal challenges. In this article, we investigate how data from several heterogeneous online sources can be used to discover insights and make predictions about the spatial distribution of crime in large urban environments. A series of important research questions is addressed, following a purely data-driven approach and methodology. First, we examine how useful different types of data are for the task of crime levels prediction, focusing especially on how prediction accuracy can be improved by combining data from multiple information sources. To that end, we not only investigate prediction accuracy across all individual areas studied, but also examine how these predictions affect the accuracy of identified crime hotspots. Then, we look into individual features, aiming to identify and quantify the most important factors. Finally, we drill down to different crime types, elaborating on how the prediction accuracy and the importance of individual features vary across them. Our analysis involves six different datasets, from which more than 3,000 features are extracted, filtered, and used to learn models for predicting crime rates across 14 different crime categories. Our results indicate that combining data from multiple information sources can significantly improve prediction accuracy. They also highlight which features affect prediction accuracy the most, as well as for which particular crime categories the predictions are more accurate.
Publisher
Association for Computing Machinery (ACM)
Subject
Discrete Mathematics and Combinatorics,Geometry and Topology,Computer Science Applications,Modelling and Simulation,Information Systems,Signal Processing
Cited by
20 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献