Mycorrhiza: genotype assignment using phylogenetic networks

Author:

Georges-Filteau Jeremy1ORCID,Hamelin Richard C23ORCID,Blanchette Mathieu1ORCID

Affiliation:

1. School of Computer Science, McGill University, Montreal, QC, Canada

2. Department of Forest and Conservation Sciences, The University of British Columbia, Vancouver, BC, Canada

3. Département des sciences du bois et de la forêt, Université Laval, Québec, Canada

Abstract

Abstract Motivation The genotype assignment problem consists of predicting, from the genotype of an individual, which of a known set of populations it originated from. The problem arises in a variety of contexts, including wildlife forensics, invasive species detection and biodiversity monitoring. Existing approaches perform well under ideal conditions but are sensitive to a variety of common violations of the assumptions they rely on. Results In this article, we introduce Mycorrhiza, a machine learning approach for the genotype assignment problem. Our algorithm makes use of phylogenetic networks to engineer features that encode the evolutionary relationships among samples. Those features are then used as input to a Random Forests classifier. The classification accuracy was assessed on multiple published empirical SNP, microsatellite or consensus sequence datasets with wide ranges of size, geographical distribution and population structure and on simulated datasets. It compared favorably against widely used assessment tests or mixture analysis methods such as STRUCTURE and Admixture, and against another machine-learning based approach using principal component analysis for dimensionality reduction. Mycorrhiza yields particularly significant gains on datasets with a large average fixation index (FST) or deviation from the Hardy-Weinberg equilibrium. Moreover, the phylogenetic network approach estimates mixture proportions with good accuracy. Availability and implementation Mycorrhiza is released as an easy to use open-source python package at github.com/jgeofil/mycorrhiza. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

Genome Canada, Genome British Columbia, Genome Quebec

Large-Scale Applied Research

Resources and the Environment BioSurveillance of Forest Alien Enemies

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Reference71 articles.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3