Abstract
AbstractContextPatient stratification is the cornerstone of numerous health studies, serving to enhance medicine efficacy estimation and facilitate patient matching. To stratify patients, similarity measured between patients can be computed from medical health records databases, such as medico-administrative databases. Importantly, the variables included in medico-administrative databases can be associated with labels, which can be organized in ontologies or other classification systems. However, to the best of our knowledge, the relevance of considering such label classification in the computation of patient similarity measures has been poorly studied.ObjectiveWe propose and evaluate several weighted versions of the Cosine similarity that consider structured label relationships to compute patient similarities from a medico-administrative database.Material and MethodsAs a use case, we analyze medicine reimbursements contained in theÉchantillon Généraliste des Bénéficiaires, a French medico-administrative database. We compute the standard Cosine similarity between patients based on their medicine reimbursement. In addition, we computed a weighted Cosine similarity measure that includes variable frequencies and two weighted Cosine similarity measures that consider label relationships. We construct patient networks from each similarity measure and identify clusters of patients. We evaluate the performance of the different similarity measures with enrichment tests using information on chronic diseases.ResultsThe similarity measures that include label relationships perform better to identify similar patients. Indeed, using these weighted measures, we identify distinct patient clusters with a higher number of chronic disease enrichments as compared to the other measures. Importantly, the enrichment tests provide clinically interpretable insights into these patient clusters.ConclusionConsidering label relationships when computing patient similarities improves stratification of patients regarding their health status.
Publisher
Cold Spring Harbor Laboratory
Reference27 articles.
1. Next generation phenotyping using narrative reports in a rare disease clinical data warehouse;Orphanet journal of rare diseases,2018
2. Identification of type 2 diabetes subgroups through topological analysis of patient similarity
3. Novel subgroups of adult-onset diabetes and their association with outcomes: a data-driven cluster analysis of six variables. The Lancet Diabetes &;Endocrinology,2018
4. International classification of diseases (ICD);KO KNOWLEDGE ORGANIZATION,2023
5. SNOMED-CT: The advanced terminology and coding system for eHealth;Studies in health technology and informatics,2006