The use of surnames to impute missing ethnicity data in the South African National Cancer Registry database

Author:

Chen Wenlong Carl1ORCID,Kellett Patricia1,Greyling Mike2,Singh Elvira1,Sengayi-Muchengeti Mazvita1

Affiliation:

1. National Cancer Registry, National Health Laboratory Service

2. University of the Witwatersrand

Abstract

Abstract The National Cancer Registry (NCR) of South Africa (SA) calculates cancer incidence rates based on full pathology reports from South African private and public health care laboratories and presents the cancer incidence data by ethnic groups. The sensitivity of collecting ethnicity data in post-apartheid South Africa by reporting sources has resulted in large proportions of cancer cases being reported without population group/ethnicity information. The absence of ethnicity data is a significant challenge to cancer incidence reporting. An imputation method was developed to impute the missing ethnicities by using surnames with known patient-reported ethnicities. A hold-out test done by masking the ethnicities of 50% (n = 332232) of the NCR dataset with known ethnicities, from 1986 to 2014, was used to evaluate this imputation method. The masked ethnicities were imputed and then compared to the patient-reported ethnicities. 94.31% of ethnicities were correctly classified using this imputation method. Sensitivities and specificities were calculated per ethnicity group (Asian, Black, Coloured, White). The imputation method performed well for the Asian, Black and White ethnic groups, but performed poorly for the Coloured ethnic group. The strong relationship between surnames and ethnic groups, as evidenced by the results, mitigates the significant concern of whether surname itself is predictive of ethnicity. Despite the increasing proportion of missing data over the years, the percentage of correctly classified individuals remains high across the test dataset. The strength of this imputation methodology is demonstrated in this study, however, with the large disparities across the private and public healthcare sectors in SA, all cancer cases should be reported with complete information, from all sources, for accurate cancer incidence reporting without the need for having to impute for missing data. There are still challenges around collecting sensitive data such as ethnicities in a SA that warrant further discussions.

Publisher

Research Square Platform LLC

Reference14 articles.

1. South African National Cancer Registry: Effect of withheld data from private health systems on cancer incidence estimates;Singh E;South African Med J,2015

2. Council for Medical Schemes (2016) Annual report 2015/2016. ISBN:978-0-621-44536-7.

3. Statistics South Africa (2017) Statistical release P0318 General Household Survey 2016. Statistics South Africa, Pretoria

4. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls;Sterne JAC;BMJ,2009

5. Statistics South Africa (2015) Statistical release P0302 Mid-year population estimates 2015. pp20. Statistics South Africa, Pretoria

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3