SAVANA: reliable analysis of somatic structural variants and copy number aberrations in clinical samples using long-read sequencing

Author:

Cortés-Ciriano Isidro1ORCID,Elrick Hillary1,Sauer Carolin2ORCID,Valle-Inclan Jose Espejo3ORCID,Trevers Katherine4,Tanguy Melanie5,Zumalave Sonia1,De Noon Solange4,Muyas Francesc1,Cascao Rita6,Afonso Angela6,Amary Fernanda4,Tirabosco Roberto4,Giess Adam5,Freeman Timothy5,Sosinsky Alona5ORCID,Piculell Katherine7,Miller David7,Faria Claudia8,Elgar Greg5,Flanagan Adrienne4

Affiliation:

1. European Bioinformatics Institute

2. European Molecular Biology Laboratory

3. European Molecular Biology Laboratory, European Bioinformatics Institute

4. Department of Histopathology, Royal National Orthopaedic Hospital, Stanmore, UK

5. Genomics England

6. Instituto de Medicina Molecular João Lobo Antunes, Faculdade de Medicina, Universidade de Lisboa, Lisboa, Portugal.

7. Division of Genetics and Genomics, Boston Children's Hospital, Boston, Massachusetts, US.

8. Department of Neurosurgery, Hospital de Santa Maria, Centro Hospitalar Universitário Lisboa Norte (CHULN), Lisboa, Portugal.

Abstract

Abstract

Accurate detection of somatic structural variants (SVs) and copy number aberrations (SCNAs) is critical to inform the diagnosis and treatment of human cancers. Here, we describe SAVANA, a computationally efficient algorithm designed for the joint analysis of somatic SVs, SCNAs, tumour purity and ploidy using long-read sequencing data. SAVANA relies on machine learning to distinguish true somatic SVs from artefacts and provide prediction errors for individual SVs. Using high-depth Illumina and nanopore whole-genome sequencing data for 99 human tumours and matched normal samples, we establish best practices for benchmarking SV detection algorithms across the entire genome in an unbiased and data-driven manner using simulated and sequencing replicates of tumour and matched normal samples. SAVANA shows significantly higher sensitivity, and 9- and 59-times higher specificity than the second and third-best performing algorithms, yielding orders of magnitude fewer false positives in comparison to existing long-read sequencing tools across various clonality levels, genomic regions, SV types and SV sizes. In addition, SAVANA harnesses long-range phasing information to detect somatic SVs and SCNAs at single-haplotype resolution. SVs reported by SAVANA are highly consistent with those detected using short-read sequencing, including complex events causing oncogene amplification and tumour suppressor gene inactivation. In summary, SAVANA enables the application of long-read sequencing to detect SVs and SCNAs reliably in clinical samples.

Publisher

Springer Science and Business Media LLC

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3