Correcting the Estimation of Viral Taxa Distributions in Next-Generation Sequencing Data after Applying Artificial Neural Networks-Reference-Cited by-同舟云学术

Correcting the Estimation of Viral Taxa Distributions in Next-Generation Sequencing Data after Applying Artificial Neural Networks

Published:2021-10-31 Issue:11 Volume:12 Page:1755
ISSN:2073-4425
Container-title:Genes
language:en
Short-container-title:Genes

Author:

Kohls Moritz^ORCID,Kircher Magdalena,Krepel Jessica,Liebig Pamela,Jung Klaus^ORCID

Abstract

Estimating the taxonomic composition of viral sequences in a biological samples processed by next-generation sequencing is an important step in comparative metagenomics. Mapping sequencing reads against a database of known viral reference genomes, however, fails to classify reads from novel viruses whose reference sequences are not yet available in public databases. Instead of a mapping approach, and in order to classify sequencing reads at least to a taxonomic level, the performance of artificial neural networks and other machine learning models was studied. Taxonomic and genomic data from the NCBI database were used to sample labelled sequencing reads as training data. The fitted neural network was applied to classify unlabelled reads of simulated and real-world test sets. Additional auxiliary test sets of labelled reads were used to estimate the conditional class probabilities, and to correct the prior estimation of the taxonomic distribution in the actual test set. Among the taxonomic levels, the biological order of viruses provided the most comprehensive data base to generate training data. The prediction accuracy of the artificial neural network to classify test reads to their viral order was considerably higher than that of a random classification. Posterior estimation of taxa frequencies could correct the primary classification results.

Funder

Deutsche Forschungsgemeinschaft

Publisher

MDPI AG

Subject

Genetics (clinical),Genetics

Link

https://www.mdpi.com/2073-4425/12/11/1755/pdf

Reference44 articles.

1. Novel canine circovirus strains from Thailand: Evidence for genetic recombination

2. Canine Bocavirus Type 2 Infection Associated With Intestinal Lesions

3. Human Papillomavirus Genotype Association With Survival in Head and Neck Squamous Cell Carcinoma

4. Bioinformatics for NGS-based metagenomics and the application to biogas research

5. NetCoMi: network construction and comparison for microbiome data in R