Abstract
Background
Genomic selection (GS) proves to be an effective method for augmenting plant and animal breeding efficiency. Deep learning displays remarkable flexibility and vast capacity for representation, enabling it to capture complex associations, and is deemed one of the most auspicious models for GS.
Methods
The present study proposed a deep-learning technique named genomic selection convolutional neural network (GSCNN) that introduces innovation in three aspects. GSCNN encodes adjacent single nucleotide polymorphisms (SNPs) using the genotypes and physical distance (PD) between SNPs, allowing more accurate determination of the complex associative relationships of SNPs. Secondly, we generate new samples by perturbing SNP sequences based on chromosomes to solve the data scarcity problem and improve the performance of the GS deep learning model. In addition, the GSCNN uses advanced deep learning techniques - Bidirectional Encoder Representation from Transformers (BERT) embedding and attention pooling - to interpret biosequence information.
Results
Compared to widely used GS models, such as genomic best linear unbiased prediction, reproducing kernel Hilbert space, Bayes B, Bayesian lasso, and deep learning genome-wide association study, the GSCNN demonstrated superior performance in six prediction tasks.
Conclusion
The GSCNN is a promising model for GS and provides a reference for applying deep learning to other life science fields.