High throughput biological sequence analysis using machine learning-based integrative pipeline for extracting functional annotation and visualization-Reference-Cited by-同舟云学术

High throughput biological sequence analysis using machine learning-based integrative pipeline for extracting functional annotation and visualization

Published:2024-03-07 Issue: Volume:13 Page:161
ISSN:2046-1402
Container-title:F1000Research
language:en
Short-container-title:F1000Res

Author:

Al Amin Md^ORCID,Naznin Feroza,Yeasmin Most Nilufa,Sarkar Md Sumon^ORCID,Misor Mia Md,Chowdhury Abdullahi^ORCID,Islam Md Zahidul

Abstract

The Differential Gene Expression (DGE) approach to find out the expressed genes relies on measures such as log-fold change and adjusted p-values. Although fold change is commonly employed in gene expression studies, especially in microarray and RNA sequencing experiments to quantify alterations in a gene’s expression level, a limitation and potential hazard of relying on fold change in this context is its inherent bias. As a consequence, it might incorrectly categorize genes that have significant differences but minor ratios, resulting in poor detection of mutations in genes with high expression levels. In contrast, machine learning offers a more comprehensive view, adept at capturing the non-linear complexities of gene expression data and providing robustness against noise that inspired us to utilize machine learning models to explore differential gene expression based on feature importance in Type 2 Diabetes (T2D), a significant global health concern, in this study. Moreover, we validated biomarkers based on our findings expressed genes with previous studies to ensure the effectiveness of our ML models in this work which led us to go through to analysis pathways, gene ontologies, protein-protein interactions, transcription factors, miRNAs, and drug predictions to deal with T2D. This study aims to consider the machine learning technique as a good way to know about expressed genes profoundly not relying on the DGE approach, and to control or reduce the risk of T2D patients by helping drug developer researchers.

Publisher

F1000 Research Ltd

Link

https://f1000research.com/articles/13-161/v1/pdf

Reference75 articles.

1. Differential analysis of count data–the deseq2 package.;M Love;Genome Biol.,2014

2. Interpretation of differential gene expression results of rna-seq data: review and integration.;A McDermaid;Brief. Bioinform.,2019

3. Fold change.

4. Machine learning in bioinformatics.;I Kumar;Bioinformatics.,2022

5. Integrating oversampling and ensemble-based machine learning techniques for an imbalanced dataset in dyslexia screening tests.;S Kaisar;ICT Express.,2022