Affiliation:
1. Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Dr, Rockville MD 20850 USA
Abstract
Summary
Allele-specific copy number alteration (ASCNA) analysis is for identifying copy number abnormalities in tumor cells. Unlike normal cells, tumor cells are heterogeneous as a combination of dominant and minor subclones with distinct copy number profiles. Estimating the clonal proportion and identifying mainclone and subclone genotypes across the genome are important for understanding tumor progression. Several ASCNA tools have recently been developed, but they have been limited to the identification of subclone regions, and not the genotype of subclones. In this article, we propose subHMM, a hidden Markov model-based approach that estimates both subclone region and region-specific subclone genotype and clonal proportion. We specify a hidden state variable representing the conglomeration of clonal genotype and subclone status. We propose a two-step algorithm for parameter estimation, where in the first step, a standard hidden Markov model with this conglomerated state variable is fit. Then, in the second step, region-specific estimates of the clonal proportions are obtained by maximizing region-specific pseudo-likelihoods. We apply subHMM to study renal cell carcinoma datasets in The Cancer Genome Atlas. In addition, we conduct simulation studies that show the good performance of the proposed approach. The R source code is available online at https://dceg.cancer.gov/tools/analysis/subhmm. Expectation–Maximization algorithm; Forward–backward algorithm; Somatic copy number alteration; Tumor subclones.
Funder
Intramural Research Program of the National Cancer Institute
Publisher
Oxford University Press (OUP)
Subject
Statistics, Probability and Uncertainty,General Medicine,Statistics and Probability
Reference15 articles.
1. An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov process;Baum,;Inequalities,1972
2. Quantification of multiple tumor clones using gene array and sequencing data;Cheng,;The Annals of Applied Statistics,2017
3. hsegHMM: hidden Markov model-based allele-specific copy number alteration analysis accounting for hypersegmentation;Choo-Wosoba,;BMC Bioinformatics,2018
4. Tumors as organs: complex tissues that interface with the entire organism;Egeblad,;Developmental Cell,2010
5. Empirical Bayesian analysis of simultaneous changepoints in multiple data sequences;Fan,;The Annals of Applied Statistics,2017
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献