Author:
Pelizzola Marta,Behr Merle,Li Housen,Munk Axel,Futschik Andreas
Abstract
AbstractSince haplotype information is of widespread interest in biomedical applications, effort has been put into their reconstruction. Here, we propose a new, computationally efficient method, called haploSep, that is able to accurately infer major haplotypes and their frequencies just from multiple samples of allele frequency data. Our approach seems to be the first that is able to estimate more than one haplotype given such data. Even the accuracy of experimentally obtained allele frequencies can be improved by re-estimating them from our reconstructed haplotypes. From a methodological point of view, we model our problem as a multivariate regression problem where both the design matrix and the coefficient matrix are unknown. The design matrix, with 0/1 entries, models haplotypes and the columns of the coefficient matrix represent the frequencies of haplotypes, which are non-negative and sum up to one. We illustrate our method on simulated and real data focusing on experimental evolution and microbial data.
Publisher
Cold Spring Harbor Laboratory