Abstract
Abstract
Racial identification is a critical factor in understanding a multitude of important outcomes in many fields. However, inferring an individual’s race from ecological data is prone to bias and error. This process was only recently improved via Bayesian improved surname geocoding (BISG). With surname and geographic-based demographic data, it is possible to more accurately estimate individual racial identification than ever before. However, the level of geography used in this process varies widely. Whereas some existing work makes use of geocoding to place individuals in precise census blocks, a substantial portion either skips geocoding altogether or relies on estimation using surname or county-level analyses. Presently, the trade-offs of such variation are unknown. In this letter, we quantify those trade-offs through a validation of BISG on Georgia’s voter file using both geocoded and nongeocoded processes and introduce a new level of geography—ZIP codes—to this method. We find that when estimating the racial identification of White and Black voters, nongeocoded ZIP code-based estimates are acceptable alternatives. However, census blocks provide the most accurate estimations when imputing racial identification for Asian and Hispanic voters. Our results document the most efficient means to sequentially conduct BISG analysis to maximize racial identification estimation while simultaneously minimizing data missingness and bias.
Publisher
Cambridge University Press (CUP)
Subject
Political Science and International Relations,Sociology and Political Science
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献