Abstract
AbstractGenome sequencing of the human parasiteSchistosoma mansonirevealed an interesting gene superfamily calledmicro-exon gene(MEG) that encodes MEG secreted proteins. The genes are composed of short exons (3-81 base pairs) with symmetrically inserted long introns (up to 5 kbp). This article recollects 35S. mansonispecificmeggenes that are distributed over 7 autosomes and one pair of sex chromosomes and that code for at least 87 verified MEG proteins. We used various bioinformatics tools to produce an optimal alignment, propose a phylogenetic analysis and highlight intriguing conserved patterns/motifs in the sequences of these MEG proteins. Based on the analyses, we were able to classify the MEG proteins into two subfamilies and to hypothesize their duplication and colonization of all the chromosomes. Together with motif identification, we also proposed to revisit MEGs’ common names and annotation in order to avoid duplication, to help reproducibility of research results and to avoid possible misunderstandings.Author AbstractSchistosoma mansoniis a parasitic worm, the etiological agent of schistosomiasis or bilharzia, a chronic tropical disease. It is a vector-borne parasite with a complex life cycle and an equally complex genome, assembled in 7 autosomes and a pair of sexual chromosomes. Within the gene products, one superfamily is particularly interesting, since it is specific toSchistosomatidae, highly variable and redundant: the micro-exon gene (MEG) family. As the name implies, these genes are made by short coding exons (3 to 81 base pairs), symmetrically interspersed by long introns (from 0.2 to 5 kbp). There are 35megsallover the chromosomes, which code for at least 87 MEG proteins. We have aligned all of them, constructed a phylogenetic tree and proposed a theory for their duplication and genome colonization. Based on that, we propose a rational nomenclature to help the community to study MEG’s elusive role. We also propose to help WormBaseParaSite to adopt this new nomenclature to avoid giving the same acronym to different protein sequences.
Publisher
Cold Spring Harbor Laboratory