Metagenomic Geolocation Using Read Signatures-Reference-Cited by-同舟云学术

Metagenomic Geolocation Using Read Signatures

Published:2022-02-28 Issue: Volume:13 Page:
ISSN:1664-8021
Container-title:Frontiers in Genetics
language:
Short-container-title:Front. Genet.

Author:

Chappell Timothy,Geva Shlomo,Hogan James M.,Lovell David,Trotman Andrew,Perrin Dimitri

Abstract

We present a novel approach to the Metagenomic Geolocation Challenge based on random projection of the sample reads from each location. This approach explores the direct use of k-mer composition to characterise samples so that we can avoid the computationally demanding step of aligning reads to available microbial reference sequences. Each variable-length read is converted into a fixed-length, k-mer-based read signature. Read signatures are then clustered into location signatures which provide a more compact characterisation of the reads at each location. Classification is then treated as a problem in ranked retrieval of locations, where signature similarity is used as a measure of similarity in microbial composition. We evaluate our approach using the CAMDA 2020 Challenge dataset and obtain promising results based on nearest neighbour classification. The main findings of this study are that k-mer representations carry sufficient information to reveal the origin of many of the CAMDA 2020 Challenge metagenomic samples, and that this reference-free approach can be achieved with much less computation than methods that need reads to be assigned to operational taxonomic units—advantages which become clear through comparison to previously published work on the CAMDA 2019 Challenge data.

Publisher

Frontiers Media SA

Subject

Genetics (clinical),Genetics,Molecular Medicine

Reference11 articles.

1. Alignment-free Inference of Hierarchical and Reticulate Phylogenomic Relationships;Bernard;Brief. Bioinform.,2017

2. Antibiotic Resistance and Metabolic Profiles as Functional Biomarkers that Accurately Predict the Geographic Origin of City Metagenomics Samples;Casimiro-Soriguer;Biol. Direct,2019

3. Rapid Analysis of Metagenomic Data Using Signature-Based Clustering;Chappell;BMC Bioinformatics,2018

4. Vector Quantization and Signal Compression

5. TOPSIG: Topology Preserving Document Signatures;Geva,2011

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multiblock partial least squares and rank aggregation: Applications to detection of bacteriophages associated with antimicrobial resistance in the presence of potential confounding factors;Statistics in Medicine;2024-04-15