Affiliation:
1. Carnegie Mellon University
2. University of Pittsburgh
Abstract
The development of cancer is largely driven by the gain or loss of subsets of the genome, promoting uncontrolled growth or disabling defenses against it. Denoising array-based Comparative Genome Hybridization (aCGH) data is an important computational problem central to understanding cancer evolution. In this article, we propose a new formulation of the denoising problem that we solve with a “vanilla” dynamic programming algorithm, which runs in
O
(
n
2
) units of time. Then, we propose two approximation techniques. Our first algorithm reduces the problem into a well-studied geometric problem, namely halfspace emptiness queries, and provides an ϵ additive approximation to the optimal objective value in Õ(
n
4/3;+Δ
log (U/ϵ)) time, where Δ is an arbitrarily small positive constant and
U
= max{#8730;C,(|
P
i
|)
i
=1,…,
n
} (
P
=(
P
1
,
P
2
, …,
P
n
),
P
i
∈ ℝ, is the vector of the noisy aCGH measurements,
C
a normalization constant). The second algorithm provides a (1 ± ϵ) approximation (multiplicative error) and runs in
O
(
n
log
n
/ϵ) time. The algorithm decomposes the initial problem into a small (logarithmic) number of Monge optimization subproblems that we can solve in linear time using existing techniques.
Finally, we validate our model on synthetic and real cancer datasets. Our method consistently achieves superior precision and recall to leading competitors on the data with ground truth. In addition, it finds several novel markers not recorded in the benchmarks but supported in the oncology literature.
Funder
Division of Computing and Communication Foundations
Natural Sciences and Engineering Research Council of Canada
Publisher
Association for Computing Machinery (ACM)
Subject
Theoretical Computer Science
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献