Using a Probabilistic Model to Assist Merging of Large-Scale Administrative Records-Reference-Cited by-同舟云学术

Using a Probabilistic Model to Assist Merging of Large-Scale Administrative Records

Published:2019-01-02 Issue:2 Volume:113 Page:353-371
ISSN:0003-0554
Container-title:American Political Science Review
language:en
Short-container-title:Am Polit Sci Rev

Author:

ENAMORADO TED,FIFIELD BENJAMIN,IMAI KOSUKE^ORCID

Abstract

Since most social science research relies on multiple data sources, merging data sets is an essential part of researchers’ workflow. Unfortunately, a unique identifier that unambiguously links records is often unavailable, and data may contain missing and inaccurate information. These problems are severe especially when merging large-scale administrative records. We develop a fast and scalable algorithm to implement a canonical model of probabilistic record linkage that has many advantages over deterministic methods frequently used by social scientists. The proposed methodology efficiently handles millions of observations while accounting for missing data and measurement error, incorporating auxiliary information, and adjusting for uncertainty about merging in post-merge analyses. We conduct comprehensive simulation studies to evaluate the performance of our algorithm in realistic scenarios. We also apply our methodology to merging campaign contribution records, survey data, and nationwide voter files. An open-source software package is available for implementing the proposed methodology.

Publisher

Cambridge University Press (CUP)

Subject

Political Science and International Relations,Sociology and Political Science

Reference66 articles.

1. Enamorado Ted . 2018. “Active Learning for Probabilisitic Record Linkage.” Social Science Research Network (SSRN). URL: https://ssrn.com/abstract=3257638.

2. de Bruin Jonathan . 2017. “Record Linkage. Python library. Version 0.8.1.” https://recordlinkage.readthedocs.io/.

3. Methodological Developments in Data Linkage

4. Why Partisans Do Not Sort: The Constraints on Political Segregation

5. A Statistical Method for Empirical Testing of Competing Theories

Cited by 101 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Countering Capture in Local Politics: Evidence from Eight Field Experiments;The Journal of Politics;2024-10-01

2. Evaluating the risk of SARS-CoV-2 reinfection with the Omicron or Delta variant in Wales, UK;PLOS ONE;2024-09-06

3. The partisanship of American inventors;Research Policy;2024-09

4. Improved energy retrofit decision making through enhanced bottom-up building stock modelling;Energy and Buildings;2024-09

5. Party affiliation predicts homeowners’ decisions to install solar PV, but partisan gap wanes with improved economics of solar;Proceedings of the National Academy of Sciences;2024-07-08