BiGAMi: Bi-Objective Genetic Algorithm Fitness Function for Feature Selection on Microbiome Datasets-Reference-Cited by-同舟云学术

BiGAMi: Bi-Objective Genetic Algorithm Fitness Function for Feature Selection on Microbiome Datasets

Published:2022-05-23 Issue:3 Volume:5 Page:42
ISSN:2409-9279
Container-title:Methods and Protocols
language:en
Short-container-title:MPs

Author:

Leske Mike,Bottacini Francesca,Afli Haithem^ORCID,Andrade Bruno G. N.^ORCID

Abstract

The relationship between the host and the microbiome, or the assemblage of microorganisms (including bacteria, archaea, fungi, and viruses), has been proven crucial for its health and disease development. The high dimensionality of microbiome datasets has often been addressed as a major difficulty for data analysis, such as the use of machine-learning (ML) and deep-learning (DL) models. Here, we present BiGAMi, a bi-objective genetic algorithm fitness function for feature selection in microbial datasets to train high-performing phenotype classifiers. The proposed fitness function allowed us to build classifiers that outperformed the baseline performance estimated by the original studies by using as few as 0.04% to 2.32% features of the original dataset. In 35 out of 42 performance comparisons between BiGAMi and other feature selection methods evaluated here (sequential forward selection, SelectKBest, and GARS), BiGAMi achieved its results by selecting 6–93% fewer features. This study showed that the application of a bi-objective GA fitness function against microbiome datasets succeeded in selecting small subsets of bacteria whose contribution to understood diseases and the host state was already experimentally proven. Applying this feature selection approach to novel diseases is expected to quickly reveal the microbes most relevant to a specific condition.

Funder

Horizon 2020

Marie Skłodowska-Curie

Publisher

MDPI AG

Subject

Biochemistry, Genetics and Molecular Biology (miscellaneous),Structural Biology,Biotechnology

Link

https://www.mdpi.com/2409-9279/5/3/42/pdf

Reference56 articles.

1. A comprehensive evaluation of multicategory classification methods for microbiomic data

2. Predicting the HMA-LMA Status in Marine Sponges by Machine Learning