GCAT|Panel, a comprehensive structural variant haplotype map of the Iberian population from high-coverage whole-genome sequencing


Valls-Margarit Jordi1ORCID,Galván-Femenía Iván2ORCID,Matías-Sánchez Daniel1ORCID,Blay Natalia2,Puiggròs Montserrat1,Carreras Anna2,Salvoro Cecilia1,Cortés Beatriz2,Amela Ramon1,Farre Xavier2,Lerga-Jaso Jon3,Puig Marta3ORCID,Sánchez-Herrero Jose Francisco4,Moreno Victor5678ORCID,Perucho Manuel910,Sumoy Lauro4,Armengol Lluís11,Delaneau Olivier1213,Cáceres Mario314,de Cid Rafael2ORCID,Torrents David114ORCID


1. Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain

2. Genomes for Life-GCAT lab Group, Institute for Health Science Research Germans Trias i Pujol (IGTP), Badalona 08916, Spain

3. Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain

4. High Content Genomics and Bioinformatics Unit, Institute for Health Science Research Germans Trias i Pujol (IGTP), 08916 Badalona, Spain

5. Catalan Institute of Oncology, Hospitalet del Llobregat, 08908, Spain

6. Bellvitge Biomedical Research Institute (IDIBELL), Hospitalet del Llobregat, 08908, Spain

7. CIBER Epidemiología y Salud Pública (CIBERESP), Madrid 28029, Spain

8. Universitat de Barcelona (UB), Barcelona 08007, Spain

9. Sanford Burnham Prebys Medical Discovery Institute (SBP), La Jolla, CA 92037, USA

10. Cancer Genetics and Epigenetics, Program of Predictive and Personalized Medicine of Cancer (PMPPC), Health Science Research Institute Germans Trias i Pujol (IGTP), Badalona 08916, Spain

11. Quantitative Genomic Medicine Laboratories (qGenomics), Esplugues del Llobregat, 08950, Spain

12. Department of Computational Biology, University of Lausanne, Génopode, 1015 Lausanne, Switzerland

13. Swiss Institute of Bioinformatics (SIB), University of Lausanne, Quartier Sorge – Batiment Amphipole, 1015 Lausanne, Switzerland

14. ICREA, Barcelona 08010, Spain


Abstract The combined analysis of haplotype panels with phenotype clinical cohorts is a common approach to explore the genetic architecture of human diseases. However, genetic studies are mainly based on single nucleotide variants (SNVs) and small insertions and deletions (indels). Here, we contribute to fill this gap by generating a dense haplotype map focused on the identification, characterization, and phasing of structural variants (SVs). By integrating multiple variant identification methods and Logistic Regression Models (LRMs), we present a catalogue of 35 431 441 variants, including 89 178 SVs (≥50 bp), 30 325 064 SNVs and 5 017 199 indels, across 785 Illumina high coverage (30x) whole-genomes from the Iberian GCAT Cohort, containing a median of 3.52M SNVs, 606 336 indels and 6393 SVs per individual. The haplotype panel is able to impute up to 14 360 728 SNVs/indels and 23 179 SVs, showing a 2.7-fold increase for SVs compared with available genetic variation panels. The value of this panel for SVs analysis is shown through an imputed rare Alu element located in a new locus associated with Mononeuritis of lower limb, a rare neuromuscular disease. This study represents the first deep characterization of genetic variation within the Iberian population and the first operational haplotype panel to systematically include the SVs into genome-wide genetic studies.


Genomes of Catalonia

Fundació Institut Germans Trias i Pujol

Acción de Dinamización del ISCIII-MINECO

Ministry of Health of the Generalitat of Catalunya

Agència de Gestió d’Ajuts Universitaris i de Recerca



European Regional Development Fund

Spanish Government

Spanish Ministry of Science

Innovation and by the Generalitat de Catalunya

Agencia Estatal de Investigación

Spanish Ministry of Science and Innovation

European Union's Horizon 2020


Barcelona Supercomputing Center

Netherlands Organization for Scientific Research


Oxford University Press (OUP)










Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3