Correcting machine learning models using calibrated ensembles with ‘mlensemble’-Reference-Cited by-同舟云学术

Correcting machine learning models using calibrated ensembles with ‘mlensemble’

Published:2021-07-26 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Konopka Tomasz^ORCID

Abstract

AbstractMachine learning models in bioinformatics are often trained and used within the scope of a single project, but some models are also reused across projects and deployed in translational settings. Over time, trained models may turn out to be maladjusted to the properties of new data. This creates the need to improve their performance under various constraints. This work explores correcting models without retraining from scratch and without accessing the original training data. It uses a taxonomy of strategies to guide the development of a software package, ‘mlensemble’. Key features include joining heterogeneous models into ensembles and calibrating ensembles to the properties of new data. These are well-established techniques but are often hidden within more complex tools. By exposing them to the application level, the package enables analysts to use expert knowledge to adjust models whenever needed. Calculations with imaging data show benefits when the noise characteristics of the training and the application datasets differ. An example using genomic single-cell data demonstrates model portability despite batch effects. The generality of the framework makes it applicable also in other subject domains.

Publisher

Cold Spring Harbor Laboratory

Reference33 articles.

1. Algorithms on regulatory lockdown in medicine

2. Comparison of risk factor associations in UK Biobank against representative, general population based studies with conventional response rates: prospective cohort study and individual participant meta-analysis;BMJ,2019

3. Dimensionality reduction for visualizing single-cell data using UMAP

4. Biecek, Przemyslaw , and Tomasz Burzykowski . Explanatory Model Analysis. New York, Chapman and Hall/CRC, 2021.

5. Next-Generation Machine Learning for Biological Networks