Iterative Variable Gene Discovery from Whole Genome Sequencing with a Bootstrapped Multiresolution Algorithm-Reference-Cited by-同舟云学术

Iterative Variable Gene Discovery from Whole Genome Sequencing with a Bootstrapped Multiresolution Algorithm

Published:2019-02-11 Issue: Volume:2019 Page:1-13
ISSN:1748-670X
Container-title:Computational and Mathematical Methods in Medicine
language:en
Short-container-title:Computational and Mathematical Methods in Medicine

Author:

Olivieri David N.¹^ORCID,Gambón-Deza Francisco²

Affiliation:

1. Department of Computer Science, University of Vigo, Ourense 32004, Spain

2. Department of Immunology, Hospital of Meixoeiro, Vigo, Spain

Abstract

In jawed vertebrates, variable (V) genes code for antigen-binding regions of B and T lymphocyte receptors, which generate a specific response to foreign pathogens. Obtaining the detailed repertoire of these genes across the jawed vertebrate kingdom would help to understand their evolution and function. However, annotations of V-genes are known for only a few model species since their extraction is not amenable to standard gene finding algorithms. Also, the more distant evolution of a taxon is from such model species, and there is less homology between their V-gene sequences. Here, we present an iterative supervised machine learning algorithm that begins by training a small set of known and verified V-gene sequences. The algorithm successively discovers homologous unaligned V-exons from a larger set of whole genome shotgun (WGS) datasets from many taxa. Upon each iteration, newly uncovered V-genes are added to the training set for the next predictions. This iterative learning/discovery process terminates when the number of new sequences discovered is negligible. This process is akin to “online” or reinforcement learning and is proven to be useful for discovering homologous V-genes from successively more distant taxa from the original set. Results are demonstrated for 14 primate WGS datasets and validated against Ensembl annotations. This algorithm is implemented in the Python programming language and is freely available at http://vgenerepertoire.org.

Publisher

Hindawi Limited

Subject

Applied Mathematics,General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology,Modeling and Simulation,General Medicine

Link

http://downloads.hindawi.com/journals/cmmm/2019/3780245.pdf

Reference29 articles.

1. An automated algorithm for extracting functional immunologic V-genes from genomes in jawed vertebrates

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Distinct evolution at TCRα and TCRβ loci in the genusMus;2024-09-07

2. Conceptual Modeling of the V Gene Annotation Process in Antibodies;2023 11th International Conference in Software Engineering Research and Innovation (CONISOFT);2023-11-06

3. MHC class I and II genes in Serpentes;2020-06-12