Abstract
AbstractWe present a geometry-based interpretation of thef-statistics framework, commonly used to determine phylogenetic relationships from genetic data. The focus is on the determination of the mixing coefficients in population admixture events subject to post-admixture drift. The interpretation takes advantage of the high dimension of the dataset and analyzes the problem as a dimensional reduction issue. We show that it is possible to think of thef-statistics technique as an implicit transformation of the genetic data from a phase space into a subspace where the mapped data structure is more similar to the ancestral admixture configuration. The positive effect of the map can be explicitly assessed. The overarching geometric framework provides slightly more general formulas than thef-formalism by using a different rationale as a starting point. Explicitly addressed are two- and three-way admixtures. The mixture proportions are provided by suitable linear fits in two or three dimensions that can be easily visualized. The developments and findings are illustrated with numerical simulations from real world datasets.
Publisher
Cold Spring Harbor Laboratory