Author:
Malikic Salem,Mehrabadi Farid Rashidi,Ciccolella Simone,Rahman Md. Khaledur,Ricketts Camir,Haghshenas Ehsan,Seidman Daniel,Hach Faraz,Hajirasouliha Iman,Sahinalp S. Cenk
Abstract
Available computational methods for tumor phylogeny inference via single-cell sequencing (SCS) data typically aim to identify the most likely perfect phylogeny tree satisfying the infinite sites assumption (ISA). However, the limitations of SCS technologies including frequent allele dropout and variable sequence coverage may prohibit a perfect phylogeny. In addition, ISA violations are commonly observed in tumor phylogenies due to the loss of heterozygosity, deletions, and convergent evolution. In order to address such limitations, we introduce the optimal subperfect phylogeny problem which asks to integrate SCS data with matching bulk sequencing data by minimizing a linear combination of potential false negatives (due to allele dropout or variance in sequence coverage), false positives (due to read errors) among mutation calls, and the number of mutations that violate ISA (real or because of incorrect copy number estimation). We then describe a combinatorial formulation to solve this problem which ensures that several lineage constraints imposed by the use of variant allele frequencies (VAFs, derived from bulk sequence data) are satisfied. We express our formulation both in the form of an integer linear program (ILP) and—as a first in tumor phylogeny reconstruction—a Boolean constraint satisfaction problem (CSP) and solve them by leveraging state-of-the-art ILP/CSP solvers. The resulting method, which we name PhISCS, is the first to integrate SCS and bulk sequencing data while accounting for ISA violating mutations. In contrast to the alternative methods, typically based on probabilistic approaches, PhISCS provides a guarantee of optimality in reported solutions. Using simulated and real data sets, we demonstrate that PhISCS is more general and accurate than all available approaches.
Funder
Indiana University Grand Challenges Program Precision Health Initiative
NSERC Discovery Frontiers Grant
Vanier Canada Graduate Scholarship
Mobility Exchange Fellowship
NIH
National Science Foundation
NSF
Publisher
Cold Spring Harbor Laboratory
Subject
Genetics (clinical),Genetics
Reference41 articles.
1. Toward understanding and exploiting tumor heterogeneity
2. Alviano M , Dodaro C , Ricca F . 2015. A MaxSAT algorithm using cardinality constraints of bounded size. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, pp. 2677–2683. Buenos Aires.
3. The binary perfect phylogeny with persistent characters
4. Bonizzoni P , Ciccolella S , Della Vedova G , Soto M . 2017. Beyond perfect phylogeny: multisample phylogeny reconstruction via ILP. In Proceedings of the Eighth ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, pp. 1–10. New York.
5. Chimani M , Rahmann S , Böcker S . 2010. Exact ILP solutions for phylogenetic minimum flip problems. In Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology, pp. 147–153. Niagara Falls, NY.
Cited by
72 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献