Race and ethnicity data for first, middle, and surnames-Reference-Cited by-同舟云学术

Race and ethnicity data for first, middle, and surnames

Published:2023-05-19 Issue:1 Volume:10 Page:
ISSN:2052-4463
Container-title:Scientific Data
language:en
Short-container-title:Sci Data

Author:

Rosenman Evan T. R.,Olivella Santiago,Imai Kosuke^ORCID

Abstract

AbstractWe provide the largest compiled publicly available dictionaries of first, middle, and surnames for the purpose of imputing race and ethnicity using, for example, Bayesian Improved Surname Geocoding (BISG). The dictionaries are based on the voter files of six U.S. Southern States that collect self-reported racial data upon voter registration. Our data cover the racial make-up of a larger set of names than any comparable dataset, containing 136 thousand first names, 125 thousand middle names, and 338 thousand surnames. Individuals are categorized into five mutually exclusive racial and ethnic groups — White, Black, Hispanic, Asian, and Other — and racial/ethnic probabilities by name are provided for every name in each dictionary. We provide both probabilities of the form ℙ(race|name) and ℙ(name|race), and conditions under which they can be assumed to be representative of a given target population. These conditional probabilities can then be deployed for imputation in a data analytic task for which self-reported racial and ethnic data is not available.

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences,Statistics, Probability and Uncertainty,Computer Science Applications,Education,Information Systems,Statistics and Probability

Link

https://www.nature.com/articles/s41597-023-02202-2.pdf

Reference22 articles.

1. Elliott, M. N., Fremont, A., Morrison, P. A., Pantoja, P. & Lurie, N. A new method for estimating race/ethnicity and associated disparities where administrative records lack self-reported race/ethnicity. Health Services Research 43, 1772–1736 (2008).

2. Elliott, M. N. et al. Using the Census Bureau’s surname list to improve estimates of race/ethnicity and associated disparities. Health Services and Outcomes Research Methodology 9, 69–83 (2009).

3. Fiscella, K. & Fremont, A. M. Use of geocoding and surname analysis to estimate race and ethnicity. Health Services Research 41, 1482–1500 (2006).

4. Imai, K. & Khanna, K. Improving ecological inference by predicting individual ethnicity from voter registration records. Political Analysis 24, 263–272 (2016).

5. McCartan, C., Goldin, J., Ho, D. E. & Imai, K. Estimating racial disparities when race is not observed. arXiv 2303:02580 (2023).

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Crime-First Labels and Public Attitudes Toward Adolescent Girls in the Juvenile Legal System;Research on Social Work Practice;2024-08-23

2. Pronoun usage and gender identity's effects on market outcomes: Evidence from a preregistered field experiment;Economics Letters;2024-03

3. Comparing Methods for Estimating Demographics in Racially Polarized Voting Analyses;Sociological Methods & Research;2023-08-28

4. wru: Who are You? Bayesian Prediction of Racial Category Using Surname, First Name, Middle Name, and Geolocation;CRAN: Contributed Packages;2015-12-09