Characterizing subgroup performance of probabilistic phenotype algorithms within older adults: a case study for dementia, mild cognitive impairment, and Alzheimer’s and Parkinson’s diseases-Reference-Cited by-同舟云学术

Characterizing subgroup performance of probabilistic phenotype algorithms within older adults: a case study for dementia, mild cognitive impairment, and Alzheimer’s and Parkinson’s diseases

Published:2023-04-06 Issue:2 Volume:6 Page:
ISSN:2574-2531
Container-title:JAMIA Open
language:en
Short-container-title:

Author:

Banda Juan M¹^ORCID,Shah Nigam H²^ORCID,Periyakoil Vyjeyanthi S³⁴

Affiliation:

1. Department of Computer Science, College of Arts and Sciences, Georgia State University , Atlanta, Georgia, USA

2. Stanford Center for Biomedical Informatics Research, Stanford University School of Medicine , Stanford, California, USA

3. Stanford Department of Medicine , Palo Alto, California, USA

4. VA Palo Alto Health Care System , Palo Alto, California, USA

Abstract

Abstract Objective Biases within probabilistic electronic phenotyping algorithms are largely unexplored. In this work, we characterize differences in subgroup performance of phenotyping algorithms for Alzheimer’s disease and related dementias (ADRD) in older adults. Materials and methods We created an experimental framework to characterize the performance of probabilistic phenotyping algorithms under different racial distributions allowing us to identify which algorithms may have differential performance, by how much, and under what conditions. We relied on rule-based phenotype definitions as reference to evaluate probabilistic phenotype algorithms created using the Automated PHenotype Routine for Observational Definition, Identification, Training and Evaluation framework. Results We demonstrate that some algorithms have performance variations anywhere from 3% to 30% for different populations, even when not using race as an input variable. We show that while performance differences in subgroups are not present for all phenotypes, they do affect some phenotypes and groups more disproportionately than others. Discussion Our analysis establishes the need for a robust evaluation framework for subgroup differences. The underlying patient populations for the algorithms showing subgroup performance differences have great variance between model features when compared with the phenotypes with little to no differences. Conclusion We have created a framework to identify systematic differences in the performance of probabilistic phenotyping algorithms specifically in the context of ADRD as a use case. Differences in subgroup performance of probabilistic phenotyping algorithms are not widespread nor do they occur consistently. This highlights the great need for careful ongoing monitoring to evaluate, measure, and try to mitigate such differences.

Funder

National Institute on Aging of the National Institutes of Health

Publisher

Oxford University Press (OUP)

Subject

Health Informatics

Link

https://academic.oup.com/jamiaopen/article-pdf/6/2/ooad043/50735997/ooad043.pdf

Reference63 articles.

1. Can AI help reduce disparities in general medical and mental health care?;Chen;AMA J Ethics,2019

2. Dissecting racial bias in an algorithm used to manage the health of populations;Obermeyer;Science,2019

3. Addressing artificial intelligence bias in retinal diagnostics;Burlina;Transl Vis Sci Technol,2021

4. Bias and fairness assessment of a natural language processing opioid misuse classifier: detection and mitigation of electronic health record data disadvantages across racial subgroups;Thompson;J Am Med Inform Assoc,2021