Abstract
ABSTRACTDiscovery of microbial hydrocarbon degradation pathways has traditionally relied on laboratory isolation and characterization of microorganisms. Although many metabolic pathways for hydrocarbon degradation have been discovered, the absence of tools dedicated to their annotation makes it difficult to identify the relevant genes and predict the hydrocarbon degradation potential of microbial genomes and metagenomes. Furthermore, sequence homology between hydrocarbon degradation genes and genes with other functions often results in misannotation. A tool that systematically identifies hydrocarbon metabolic potential is therefore needed. We present the Calgary approach to ANnoTating HYDrocarbon degradation genes (CANT-HYD), a database containing HMMs of 37 marker genes involved in anaerobic and aerobic degradation pathways of aliphatic and aromatic hydrocarbons. Using this database, we show that hydrocarbon metabolic potential is widespread in the tree of life and identify understudied or overlooked hydrocarbon degradation potential in many phyla. We also demonstrate scalability by analyzing large metagenomic datasets for the prediction of hydrocarbon utilization in diverse environments. To the best of our knowledge, CANT-HYD is the first comprehensive tool for robust and accurate identification of marker genes associated with aerobic and anaerobic hydrocarbon degradation.
Publisher
Cold Spring Harbor Laboratory