Anchor Clustering for million-scale immune repertoire sequencing data-Reference-Cited by-同舟云学术

Anchor Clustering for million-scale immune repertoire sequencing data

Published:2024-01-25 Issue:1 Volume:25 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Chang Haiyang,Ashlock Daniel A.,Graether Steffen P.,Keller Stefan M.

Abstract

Abstract Background The clustering of immune repertoire data is challenging due to the computational cost associated with a very large number of pairwise sequence comparisons. To overcome this limitation, we developed Anchor Clustering, an unsupervised clustering method designed to identify similar sequences from millions of antigen receptor gene sequences. First, a Point Packing algorithm is used to identify a set of maximally spaced anchor sequences. Then, the genetic distance of the remaining sequences to all anchor sequences is calculated and transformed into distance vectors. Finally, distance vectors are clustered using unsupervised clustering. This process is repeated iteratively until the resulting clusters are small enough so that pairwise distance comparisons can be performed. Results Our results demonstrate that Anchor Clustering is faster than existing pairwise comparison clustering methods while providing similar clustering quality. With its flexible, memory-saving strategy, Anchor Clustering is capable of clustering millions of antigen receptor gene sequences in just a few minutes. Conclusions This method enables the meta-analysis of immune-repertoire data from different studies and could contribute to a more comprehensive understanding of the immune repertoire data space.

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1186/s12859-024-05659-z.pdf

Reference34 articles.

1. Liu X, Wu J. History, applications, and challenges of immune repertoire research. Cell Biol Toxicol. 2018;34(6):441–57.

2. Briney B, Inderbitzin A, Joyce C, Burton DR. Commonality despite exceptional diversity in the baseline human antibody repertoire. Nature. 2019;566(7744):393–7.

3. Yaari G, Kleinstein SH. Practical guidelines for B-cell receptor repertoire sequencing analysis. Genome Med. 2015;7:1–14.

4. Hou XL, et al. Current status and recent advances of next generation sequencing techniques in immunological repertoire. Genes Immun. 2016;17(3):153–64.

5. Shugay M, Bagaev DV, Zvyagin IV, Vroomans RM, Crawford JC, Dolton G, Komech EA, Sycheva AL, Koneva AE, Egorov ES, et al. Vdjdb: a curated database of t-cell receptor sequences with known antigen specificity. Nucl Acids Res. 2018;46(D1):419–27.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Estimates of Sequences with Ultralong and Short CDR3s in the Bovine IgM B Cell Receptor Repertoire Using the Long-read Oxford Nanopore MinION Platform;ImmunoHorizons;2024-09-01

2. Single cell RNA-sequencing of feline peripheral immune cells with V(D)J repertoire and cross species analysis of T lymphocytes;2024-05-21

3. Anchor-based scalable multi-view subspace clustering;Information Sciences;2024-05