Affiliation:
1. Department of Electrical Engineering, City University of Hong Kong, Hong Kong (SAR), China
2. Key Laboratory of South China Sea Fishery Resources Exploitation and Utilization, Ministry of Agriculture, South China Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Guangzhou, Guangdong Province, China
Abstract
Abstract
Motivation
Bacteriophages (aka phages), which mainly infect bacteria, play key roles in the biology of microbes. As the most abundant biological entities on the planet, the number of discovered phages is only the tip of the iceberg. Recently, many new phages have been revealed using high-throughput sequencing, particularly metagenomic sequencing. Compared to the fast accumulation of phage-like sequences, there is a serious lag in taxonomic classification of phages. High diversity, abundance and limited known phages pose great challenges for taxonomic analysis. In particular, alignment-based tools have difficulty in classifying fast accumulating contigs assembled from metagenomic data.
Results
In this work, we present a novel semi-supervised learning model, named PhaGCN, to conduct taxonomic classification for phage contigs. In this learning model, we construct a knowledge graph by combining the DNA sequence features learned by convolutional neural network and protein sequence similarity gained from gene-sharing network. Then we apply graph convolutional network to utilize both the labeled and unlabeled samples in training to enhance the learning ability. We tested PhaGCN on both simulated and real sequencing data. The results clearly show that our method competes favorably against available phage classification tools.
Availability and implementation
The source code of PhaGCN is available via: https://github.com/KennthShang/PhaGCN.
Funder
Research Grants Council of the Hong Kong Special Administrative Region
HKIDS
NSF of China
Publisher
Oxford University Press (OUP)
Subject
Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability
Cited by
47 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献