Abstract
AbstractClustering methods are often applied to electronic medical records (EMR) data for various objectives, including the discovery of previously unrecognized disease subtypes. The abundance and redundancy of information in EMR data raises the need to identify and rank the features that are most relevant for clustering. Here we propose FRIGATE, an ensemble feature ranking algorithm for clustering, which uses game-theoretic concepts. FRIGATE derives the importance of features from solving multiple clustering problems on subgroups of features. In every such problem, a Shapley-like framework is utilized to rank a selected set of features, and multiplicative weights are employed to reduce the randomness in their selection. It outperforms extant ensemble ranking algorithms, both in solution quality and in speed. FRIGATE can improve disease understanding by enabling better subtype discovery from EMR data.
Publisher
Cold Spring Harbor Laboratory
Reference38 articles.
1. Axes of a revolution: challenges and promises of big data in healthcare;Nature medicine,2020
2. A. Johnson , T. Pollard , and R. Mark , “Mimic-iii clinical database (version 1.4),” 2016.
3. Uk biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age;PLoS Medicine,2015
4. A. Garg and V. Mago , “Role of machine learning in medical research: A survey,” 5 2021.