Abstract
AbstractHLA (Human Leukocyte Antigens) is a highly polymorphic locus in the human genome which also has a high clinical significance. New alleles of HLA genes are constantly being discovered but mostly through the efforts of laboratories which primarily focus on HLA typing and are using field-specific experimental and data processing techniques, like enrichment of HLA region in high-throughput sequencing data. Nevertheless, a vast amount of whole genome sequencing (WGS) data was accumulated over the past years and continues to expand rapidly. Therefore it is an appealing possibility to identify new HLA alleles and refine the information on known alleles from already available WGS data. Currently there are many tools designed for HLA typing, e.g. assigning known alleles, from non HLA enriched WGS data, but none of them specifically tailored towards identification and immediate thorough description of new HLA alleles. Here we are presenting a pipeline HLAchecker, which is specifically designed to identify potentially new HLA alleles based on discrepancies between predicted HLA types, made by any other dedicated tool, and underlying raw 30x WGS data. HLAchecker reports structured in a way which simplifies further validation of potentially new HLA alleles and streamlines submission of alleles to appropriate databases. We validated this tool on 4195 30x WGS samples typed by HLA-HD, discovered 17 potentially new HLA alleles with substitutions in exonic regions and validated five randomly chosen alleles by Sanger sequencing.
Publisher
Cold Spring Harbor Laboratory