Abstract
AbstractAlthough previously thought to be unlikely, recent studies have shown thatde novogene origination from previously non-genic sequences is a relatively common mechanism for gene innovation in many species and taxa. These young genes provide a unique set of candidates to study the structural and functional origination of proteins. However, our understanding of their protein structures and how these structures originate and evolve are still limited, due to a lack of systematic studies. Here, we combined high-quality base-level whole genome alignments, bioinformatic analysis, and computational structure modeling to study the origination, evolution, and protein structure of lineage-specificde novogenes. We identified 555de novogene candidates inD. melanogasterthat originated within theDrosophilinaelineage. We found a gradual shift in sequence composition, evolutionary rates, and expression patterns with their gene ages, which indicates possible gradual shifts or adaptations of their functions. Surprisingly, we found little overall protein structural changes forde novogenes in theDrosophilinaelineage. Using Alphafold2, ESMFold, and molecular dynamics, we identified a number ofde novogene candidates with protein products that are potentially well-folded, many of which are more likely to contain transmembrane and signal proteins compared to other annotated protein-coding genes. Using ancestral sequence reconstruction, we found that most potentially well-folded proteins are often born folded. Interestingly, we observed one case where disordered ancestral proteins become ordered within a relatively short evolutionary time. Single-cell RNA-seq analysis in testis showed that although mostde novogenes are enriched in spermatocytes, several youngde novogenes are biased in the early spermatogenesis stage, indicating potentially important but less emphasized roles of early germline cells in thede novogene origination in testis. This study provides a systematic overview of the origin, evolution, and structural changes ofDrosophilinae-specificde novogenes.
Publisher
Cold Spring Harbor Laboratory
Reference83 articles.
1. Gromacs: High performance molecular simulations through multi-level parallelism from laptops to supercomputers;SoftwareX,2015
2. Integration of New Genes into Cellular Networks, and Their Structural Maturation
3. SignalP 5.0 improves signal peptide predictions using deep neural networks
4. An, N. A. , Zhang, J. , Mo, F. , Luan, X. , Tian, L. , Shen, Q. S. , Li, X. , Li, C. , Zhou, F. , Zhang, B. , Ji, M. , Qi, J. , Zhou, W.-Z. , Ding, W. , Chen, J.-Y. , Yu, J. , Zhang, L. , Shu, S. , Hu, B. , & Li, C.-Y. (2023). De novo genes with an lncRNA origin encode unique human brain developmental functionality. Nature Ecology & Evolution. https://doi.org/10.1038/s41559-022-01925-6