GSCNN: A genomic selection convolutional neural network model based on SNP genotype and physical distance features and data augmentation strategy-Reference-Cited by-同舟云学术

GSCNN: A genomic selection convolutional neural network model based on SNP genotype and physical distance features and data augmentation strategy

Published:2024-03-07 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Ji Lu¹,Hou Wei¹,Xiong Liwen²,Zhou Heng¹,Liu Chunhai¹,Li Lanzhi¹,Yuan Zheming¹

Affiliation:

1. Hunan Agricultural University

2. University of Chinese Academy of Sciences

Abstract

Background Genomic selection (GS) proves to be an effective method for augmenting plant and animal breeding efficiency. Deep learning displays remarkable flexibility and vast capacity for representation, enabling it to capture complex associations, and is deemed one of the most auspicious models for GS. Methods The present study proposed a deep-learning technique named genomic selection convolutional neural network (GSCNN) that introduces innovation in three aspects. GSCNN encodes adjacent single nucleotide polymorphisms (SNPs) using the genotypes and physical distance (PD) between SNPs, allowing more accurate determination of the complex associative relationships of SNPs. Secondly, we generate new samples by perturbing SNP sequences based on chromosomes to solve the data scarcity problem and improve the performance of the GS deep learning model. In addition, the GSCNN uses advanced deep learning techniques - Bidirectional Encoder Representation from Transformers (BERT) embedding and attention pooling - to interpret biosequence information. Results Compared to widely used GS models, such as genomic best linear unbiased prediction, reproducing kernel Hilbert space, Bayes B, Bayesian lasso, and deep learning genome-wide association study, the GSCNN demonstrated superior performance in six prediction tasks. Conclusion The GSCNN is a promising model for GS and provides a reference for applying deep learning to other life science fields.

Publisher

Research Square Platform LLC

Reference46 articles.

1. Prediction of total genetic value using genome-wide dense marker maps;Meuwissen TH;Genetics,2001

2. Joint analysis of phenotype-effect-generation identifies loci associated with grain quality traits in rice hybrids;Li L;Nat Commun,2023

3. Genomic Selection in Plant Breeding: Methods, Models, and Perspectives;Crossa J;Trends Plant Sci,2017

4. Epistasis and evolution: recent advances and an outlook for prediction;Johnson MS;BMC Biol,2023

5. Epistasis in Neuropsychiatric Disorders;Webber C;Trends Genet,2017