Author:
Huang Kerui,Tian Jianhong,Sun Lei,Xie Peng,Zhou Shiqi,Deng Aihua,Mo Ping,Zhou Zhibo,Jiang Ming,Li Guiwu,Wang Yun,Jiang Xiaocheng
Abstract
AbstractGene mining, particularly from small sample sizes such as in plants, remains a challenge in life sciences. Traditional methods often omit significant genes, while deep learning techniques are hindered by small sample constraints and lack specialized gene mining approaches. This paper presents TransGeneSelector, the first deep learning method tailored for key gene mining in small transcriptomic datasets, ingeniously integrating data augmentation, sample filtering, and a Transformer-based classifier. Tested onArabidopsis thalianaseeds’ germination classification using just 79 samples, it not only achieves classification performance on par with, if not superior to, Random Forest and SVM but also excels in identifying upstream regulatory genes that Random Forest might miss, and these pinpointed genes more accurately reflect the metabolic processes inherent in seed germination. TransGeneSelector’s ability to mine vital genes from limited datasets signifies its potential as the current state-of-the-art in gene mining in small sample scenarios, providing an efficient and versatile solution for this critical research area.
Publisher
Cold Spring Harbor Laboratory
Reference60 articles.
1. Arjovsky M , Chintala S , Bottou L (2017) Wasserstein Generative Adversarial Networks. In P Doina , T Yee Whye , eds, Proceedings of the 34th International Conference on Machine Learning, Vol 70. PMLR, Proceedings of Machine Learning Research, pp 214--223
2. Cao F , Chen F , Sun H , Zhang G , Chen Z-H , Wu F (2014) Genome-wide transcriptome and functional analysis of two contrasting genotypes reveals key genes for cadmium tolerance in barley. BMC Genomics 15
3. Transformer for one stop interpretable cell type annotation;Nat. Commun,2023
4. Transcriptome analysis revealed key genes and pathways related to cadmium-stress tolerance in Kenaf (Hibiscus cannabinus L.);Ind Crop Prod,2020
5. Chen W , Alexandre PA , Ribeiro G , Fukumasu H , Sun W , Reverter A , Li Y (2021) Identification of Predictor Genes for Feed Efficiency in Beef Cattle by Applying Machine Learning Methods to Multi-Tissue Transcriptome Data. Front. Genet. 12