Abstract
AbstractBackgroundRecent studies have demonstrated that phylogenomics is an important basis for answering many fundamental evolutionary questions. With more high-quality whole genome sequences published, more efficient phylogenomics analysis workflows are required urgently.ResultsTo this end and in order to capture putative differences among evolutionary histories of gene families and species, we developed a phylogenomics workflow for gene family classification, gene family tree inference, species tree inference and duplication/loss events dating. Our analysis framework is on the basis of two guiding ideas: 1) gene trees tend to be different from species trees but they influence each other in evolution; 2) different gene families have undergone different evolutionary mechanisms. It has been applied to the genomic data from 64 vertebrates and 5 out-group species. And the results showed high accuracy on species tree inference and few false-positives in duplication events dating.ConclusionsBased on the inferred gene duplication and loss event, only 9∼16% gene families have duplication retention after a whole genome duplication (WGD) event. A large part of these families have ohnologs from two or three WGDs. Consistent with the previous study results, the gene function of these families are mainly involved in nervous system and signal transduction related biological processes. Specifically, we found that the gene families with ohnologs from the teleost-specific (TS) WGD are enriched in fat metabolism, this result implyng that the retention of such ohnologs might be associated with the environmental status of high concentration of oxygen during that period.
Publisher
Cold Spring Harbor Laboratory