Identification of population‐informative markers from high‐density genotyping data through combined feature selection and machine learning algorithms: Application to European autochthonous and cosmopolitan pig breeds

Author:

Schiavo Giuseppina1,Bertolini Francesca1,Bovo Samuele1ORCID,Galimberti Giuliano2,Muñoz María3,Bozzi Riccardo4,Čandek‐Potokar Marjeta5,Óvilo Cristina3,Fontanesi Luca1ORCID

Affiliation:

1. Animal and Food Genomics Group, Division of Animal Sciences, Department of Agricultural and Food Sciences University of Bologna Bologna Italy

2. Department of Statistical Sciences ‘Paolo Fortunati’ University of Bologna Bologna Italy

3. Departamento Mejora Genética Animal INIA‐CSIC Madrid Spain

4. Animal Science Division, Dipartimento di Scienze e Tecnologie Agrarie, Alimentari, Ambientali e Forestali Università di Firenze Firenze Italy

5. Kmetijski Inštitut Slovenije Ljubljana Slovenia

Abstract

AbstractLarge genotyping datasets, obtained from high‐density single nucleotide polymorphism (SNP) arrays, developed for different livestock species, can be used to describe and differentiate breeds or populations. To identify the most discriminating genetic markers among thousands of genotyped SNPs, a few statistical approaches have been proposed. In this study, we applied the Boruta algorithm, a wrapper of the machine learning random forest algorithm, on a database of 23 European pig breeds (20 autochthonous and three cosmopolitan breeds) genotyped with a 70k SNP chip, to pre‐select informative SNPs. To identify different sets of SNPs, these pre‐selected markers were then ranked with random forest based on their mean decrease accuracy and mean decrease gene indexes. We evaluated the efficiency of these subsets for breed classification and the usefulness of this approach to detect candidate genes affecting breed‐specific phenotypes and relevant production traits that might differ among breeds. The lowest overall classification error (2.3%) was reached with a subpanel including only 398 SNPs (ranked based on their mean decrease accuracy), with no classification error in seven breeds using up to 49 SNPs. Several SNPs of these selected subpanels were in genomic regions in which previous studies had identified signatures of selection or genes associated with morphological or production traits that distinguish the analysed breeds. Therefore, even if these approaches have not been originally designed to identify signatures of selection, the obtained results showed that they could potentially be useful for this purpose.

Funder

Javna Agencija za Raziskovalno Dejavnost RS

Horizon 2020 Framework Programme

Università di Bologna

Publisher

Wiley

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3