Abstract
AbstractGenetic ancestry-oriented cancer research requires the ability to perform accurate and robust genetic ancestry inference from existing cancer-derived data, including whole exomes, transcriptomes and targeted gene panels, very often in the absence of matching cancer-free genomic data. Here we examine the feasibility and accuracy of such computation. In order to optimize and assess the performance of the ancestry inference for any given input cancer-derived molecular profile, we have developed a data synthesis framework. In its core procedure, the ancestral background of the profiled patient is replaced with one of any number of individuals with known ancestry. Data synthesis is applicable to multiple profiling platforms and makes it possible to assess the performance of inference specifically for a given molecular profile, and separately for each continental-level ancestry. This ability extends to all ancestries, including those without statistically sufficient representation in the existing cancer data. We further show that our inference procedure is accurate and robust in a wide range of sequencing depths. Testing our approach for three representative cancer types, and across three molecular profiling modalities, we demonstrate that global, continental-level ancestry of the patient can be inferred with high accuracy, as quantified by its agreement with the golden standard of the ancestry derived from matching cancer-free molecular data. Our study demonstrates that vast amounts of existing cancer-derived molecular data potentially are amenable to ancestry-oriented studies of the disease, without recourse to matching cancer-free genomes or patients’ self-identification by ancestry.
Publisher
Cold Spring Harbor Laboratory