Abstract
In this precision report, we provide information on the specificity and sensitivity of using regular expressions to retrieve identifying information. Overall, technical identifiers are highly specific and sensitive (e.g., email, IP addresses), except for IMEI numbers (hardware numbers for phones). Phone numbers can be detected if reported in their international form (e.g., +1 555 55 555). For location information, regular expressions are highly precise in the case of latitude/longitude combinations, but not for street addresses (sensitive but not specific). For direct identifiers, we see that gender is hardest to detect with high specificity, but given the risk of disclosing marginalized gender identities, we consider this important to check for nonetheless (specificity gets worse as the dataset is larger in size). Similar issues exist for credit card information, where the risk is high if disclosed. As a result, we recommend scanning for all identifying information looked into in this report, except for IMEI and street addresses, as they underperform.
Reference1 articles.
1. Privacy Protection in the Era of Open Science;Jelte M. Wicherts;Center for Open Science,2022
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献