A Deep Learning Approach to Population Structure Inference in Inbred Lines of Maize-Reference-Cited by-同舟云学术

A Deep Learning Approach to Population Structure Inference in Inbred Lines of Maize

Published:2020-11-24 Issue: Volume:11 Page:
ISSN:1664-8021
Container-title:Frontiers in Genetics
language:
Short-container-title:Front. Genet.

Author:

López-Cortés Xaviera Alejandra,Matamala Felipe,Maldonado Carlos,Mora-Poblete Freddy,Scapim Carlos Alberto

Abstract

Analysis of population genetic variation and structure is a common practice for genome-wide studies, including association mapping, ecology, and evolution studies in several crop species. In this study, machine learning (ML) clustering methods, K-means (KM), and hierarchical clustering (HC), in combination with non-linear and linear dimensionality reduction techniques, deep autoencoder (DeepAE) and principal component analysis (PCA), were used to infer population structure and individual assignment of maize inbred lines, i.e., dent field corn (n = 97) and popcorn (n = 86). The results revealed that the HC method in combination with DeepAE-based data preprocessing (DeepAE-HC) was the most effective method to assign individuals to clusters (with 96% of correct individual assignments), whereas DeepAE-KM, PCA-HC, and PCA-KM were assigned correctly 92, 89, and 81% of the lines, respectively. These findings were consistent with both Silhouette Coefficient (SC) and Davies–Bouldin validation indexes. Notably, DeepAE-HC also had better accuracy than the Bayesian clustering method implemented in InStruct. The results of this study showed that deep learning (DL)-based dimensional reduction combined with ML clustering methods is a useful tool to determine genetically differentiated groups and to assign individuals into subpopulations in genome-wide studies without having to consider previous genetic assumptions.

Publisher

Frontiers Media SA

Subject

Genetics (clinical),Genetics,Molecular Medicine

Reference58 articles.

1. Comparisons between data clustering algorithms.;Abbas;Int. Arab J. Inf. Technol.,2008

2. Integration of random forest classifiers and deep convolutional neural networks for classification and biomolecular modeling of cancer driver mutations.;Agajanian;Front. Mol. Biosci.,2019

3. Enhancements to the ADMIXTURE algorithm for individual ancestry estimation.;Alexander;BMC Bioinform.,2011

4. Fast model-based estimation of ancestry in unrelated individuals.;Alexander;Genome Res.,2009

5. Nonparametric approaches for population structure analysis.;Alhusain;Hum. Genet.,2018

Cited by 15 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Tracing the genealogy origin of geographic populations based on genomic variation and deep learning;Molecular Phylogenetics and Evolution;2024-09

2. Genetic Diversity within a Collection of Italian Maize Inbred Lines: A Resource for Maize Genomics and Breeding;Plants;2024-01-23

3. Enhancing Biogeographical Ancestry Prediction with Deep Learning: A Long Short-Term Memory Approach;Lecture Notes in Networks and Systems;2024

4. Post-GWAS Prioritization of Genome-Phenome Associations in Sorghum;2023-12-07

5. Genetic Diversity and Population Structure Analyses in Bitter Gourd (Momordica charantia L.) Based on Agro-Morphological and Microsatellite Markers;Plants;2023-10-09