A review of model evaluation metrics for machine learning in genetics and genomics-Reference-Cited by-同舟云学术

A review of model evaluation metrics for machine learning in genetics and genomics

Published:2024-09-10 Issue: Volume:4 Page:
ISSN:2673-7647
Container-title:Frontiers in Bioinformatics
language:
Short-container-title:Front. Bioinform.

Author:

Miller Catriona,Portlock Theo,Nyaga Denis M.,O’Sullivan Justin M.

Abstract

Machine learning (ML) has shown great promise in genetics and genomics where large and complex datasets have the potential to provide insight into many aspects of disease risk, pathogenesis of genetic disorders, and prediction of health and wellbeing. However, with this possibility there is a responsibility to exercise caution against biases and inflation of results that can have harmful unintended impacts. Therefore, researchers must understand the metrics used to evaluate ML models which can influence the critical interpretation of results. In this review we provide an overview of ML metrics for clustering, classification, and regression and highlight the advantages and disadvantages of each. We also detail common pitfalls that occur during model evaluation. Finally, we provide examples of how researchers can assess and utilise the results of ML models, specifically from a genomics perspective.

Publisher

Frontiers Media SA

Reference121 articles.

1. PyCaret: an open source, low-code machine learning library in Python Ali M. 2020

2. Machine learning models for the identification of prognostic and predictive cancer biomarkers: a systematic review;Al-Tashi;Int. J. Mol. Sci.,2023

3. Nearest consensus clustering classification to identify subclasses and predict disease;Alyousef;J. Healthc. Inf. Res.,2018

4. Machine learning integrates genomic signatures for subclassification beyond primary and secondary acute myeloid leukemia;Awada;Blood,2021

5. Criterial analysis of gene expression sequences to create the objective clustering inductive technology;Babichev,2017