Fast and optimal algorithm for case-control matching using registry data: application on the antibiotics use of colorectal cancer patients

Author:

Mamouris Pavlos,Nassiri Vahid,Molenberghs Geert,van den Akker Marjan,van der Meer Joep,Vaes Bert

Abstract

Abstract Background In case-control studies most algorithms allow the controls to be sampled several times, which is not always optimal. If many controls are available and adjustment for several covariates is necessary, matching without replacement might increase statistical efficiency. Comparing similar units when having observational data is of utter importance, since confounding and selection bias is present. The aim was twofold, firstly to create a method that accommodates the option that a control is not resampled, and second, to display several scenarios that identify changes of Odds Ratios (ORs) while increasing the balance of the matched sample. Methods The algorithm was derived in an iterative way starting from the pre-processing steps to derive the data until its application in a study to investigate the risk of antibiotics on colorectal cancer in the INTEGO registry (Flanders, Belgium). Different scenarios were developed to investigate the fluctuation of ORs using the combination of exact and varying variables with or without replacement of controls. To achieve balance in the population, we introduced the Comorbidity Index (CI) variable, which is the sum of chronic diseases as a means to have comparable units for drawing valid associations. Results This algorithm is fast and optimal. We simulated data and demonstrated that the run-time of matching even with millions of patients is minimal. Optimal, since the closest controls is always captured (using the appropriate ordering and by creating some auxiliary variables), and in the scenario that a case has only one control, we assure that this control will be matched to this case, thus maximizing the cases to be used in the analysis. In total, 72 different scenarios were displayed indicating the fluctuation of ORs, and revealing patterns, especially a drop when balancing the population. Conclusions We created an optimal and computationally efficient algorithm to derive a matched case-control sample with and without replacement of controls. The code and the functions are publicly available as an open source in an R package. Finally, we emphasize the importance of displaying several scenarios and assess the difference of ORs while using an index to balance population in observational data.

Publisher

Springer Science and Business Media LLC

Subject

Health Informatics,Epidemiology

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3