Improving viral annotation with artificial intelligence-Reference-Cited by-同舟云学术

Improving viral annotation with artificial intelligence

Published:2024-09-04 Issue: Volume: Page:
ISSN:2150-7511
Container-title:mBio
language:en
Short-container-title:mBio

Author:

Flamholz Zachary N.¹^ORCID,Li Charlotte¹^ORCID,Kelly Libusha¹²^ORCID

Affiliation:

1. Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA

2. Department of Microbiology and Immunology, Albert Einstein College of Medicine, Bronx, New York, USA

Abstract

ABSTRACT Viruses of bacteria, “phages,” are fundamental, poorly understood components of microbial community structure and function. Additionally, their dependence on hosts for replication positions phages as unique sensors of ecosystem features and environmental pressures. High-throughput sequencing approaches have begun to give us access to the diversity and range of phage populations in complex microbial community samples, and metagenomics is currently the primary tool with which we study phage populations. The study of phages by metagenomic sequencing, however, is fundamentally limited by viral diversity, which results in the vast majority of viral genomes and metagenome-annotated genomes lacking annotation. To harness bacteriophages for applications in human and environmental health and disease, we need new methods to organize and annotate viral sequence diversity. We recently demonstrated that methods that leverage self-supervised representation learning can supplement statistical sequence representations for remote viral protein homology detection in the ocean virome and propose that consideration of the functional content of viral sequences allows for the identification of similarity in otherwise sequence-diverse viruses and viral-like elements for biological discovery. In this review, we describe the potential and pitfalls of large language models for viral annotation. We describe the need for new approaches to annotate viral sequences in metagenomes, the fundamentals of what protein language models are and how one can use them for sequence annotation, the strengths and weaknesses of these models, and future directions toward developing better models for viral annotation more broadly.

Funder

HHS | National Institutes of Health

Publisher

American Society for Microbiology

Link

https://journals.asm.org/doi/pdf/10.1128/mbio.03206-23

Reference97 articles.

1. Global Organization and Proposed Megataxonomy of the Virus World

2. Reticulate Representation of Evolutionary and Functional Relationships between Phage Genomes

3. Minimum Information about an Uncultivated Virus Genome (MIUViG)

4. Perspective on taxonomic classification of uncultivated viruses

5. Minnesota peat viromes reveal terrestrial and aquatic niche partitioning for local and global viral populations