Evaluation of the updated SOCcer v2 algorithm for coding free-text job descriptions in three epidemiologic studies

Author:

Russ Daniel E12ORCID,Josse Pabitra1,Remen Thomas3ORCID,Hofmann Jonathan N1,Purdue Mark P1,Siemiatycki Jack3,Silverman Debra T1,Zhang Yawei4,Lavoué Jerome3,Friesen Melissa C1ORCID

Affiliation:

1. Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute , Bethesda, MD , United States

2. Data Science and Engineering Research Group, Division of Cancer Epidemiology and Genetics, National Cancer Institute , Bethesda, MD , United States

3. CHUM Research Center, Université de Montréal , Montréal, QC , Canada

4. Department of Environmental Health Sciences, Yale School of Public Health , New Haven, CT , United States

Abstract

Abstract Objectives Computer-assisted coding of job descriptions to standardized occupational classification codes facilitates evaluating occupational risk factors in epidemiologic studies by reducing the number of jobs needing expert coding. We evaluated the performance of the 2nd version of SOCcer, a computerized algorithm designed to code free-text job descriptions to US SOC-2010 system based on free-text job titles and work tasks, to evaluate its accuracy. Methods SOCcer v2 was updated by expanding the training data to include jobs from several epidemiologic studies and revising the algorithm to account for nonlinearity and incorporate interactions. We evaluated the agreement between codes assigned by experts and the highest scoring code (a measure of confidence in the algorithm-predicted assignment) from SOCcer v1 and v2 in 14,714 jobs from three epidemiology studies. We also linked exposure estimates for 258 agents in the job-exposure matrix CANJEM to the expert and SOCcer v2-assigned codes and compared those estimates using kappa and intraclass correlation coefficients. Analyses were stratified by SOCcer score, score distance between the top two scoring codes from SOCcer, and features from CANJEM. Results SOCcer’s v2 agreement at the 6-digit level was 50%, compared to 44% in v1, and was similar for the three studies (38%–45%). Overall agreement for v2 at the 2-, 3-, and 5-digit was 73%, 63%, and 56%, respectively. For v2, median ICCs for the probability and intensity metrics were 0.67 (IQR 0.59–0.74) and 0.56 (IQR 0.50–0.60), respectively. The agreement between the expert and SOCcer assigned codes linearly increased with SOCcer score. The agreement also improved when the top two scoring codes had larger differences in score. Conclusions Overall agreement with SOCcer v2 applied to job descriptions from North American epidemiologic studies was similar to the agreement usually observed between two experts. SOCcer’s score predicted agreement with experts and can be used to prioritize jobs for expert review.

Funder

Intramural Research Programs Center for Information Technology

NIH

National Cancer Institute

Division of Cancer Epidemiology and Genetics

Publisher

Oxford University Press (OUP)

Subject

Public Health, Environmental and Occupational Health

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3