A two-sequence motif-based method for the inventory of gene families in fragmented and poorly annotated genome sequences-Reference-Cited by-同舟云学术

A two-sequence motif-based method for the inventory of gene families in fragmented and poorly annotated genome sequences

Published:2024-01-03 Issue:1 Volume:25 Page:
ISSN:1471-2164
Container-title:BMC Genomics
language:en
Short-container-title:BMC Genomics

Author:

Nørrevang Anton Frisgaard,Shabala Sergey,Palmgren Michael

Abstract

AbstractDatabases of genome sequences are growing exponentially, but, in some cases, assembly is incomplete and genes are poorly annotated. For evolutionary studies, it is important to identify all members of a given gene family in a genome. We developed a method for identifying most, if not all, members of a gene family from raw genomes in which assembly is of low quality, using the P-type ATPase superfamily as an example. The method is based on the translation of an entire genome in all six reading frames and the co-occurrence of two family-specific sequence motifs that are in close proximity to each other. To test the method’s usability, we first used it to identify P-type ATPase members in the high-quality annotated genome of barley (Hordeum vulgare). Subsequently, after successfully identifying plasma membrane H+-ATPase family members (P3A ATPases) in various plant genomes of varying quality, we tested the hypothesis that the number of P3A ATPases correlates with the ability of the plant to tolerate saline conditions. In 19 genomes of glycophytes and halophytes, the total number of P3A ATPase genes was found to vary from 7 to 22, but no significant difference was found between the two groups. The method successfully identified P-type ATPase family members in raw genomes that are poorly assembled.

Funder

Australian Research Council

National Natural Science Foundation of China

Novo Nordisk Fonden

Carlsbergfondet

Copenhagen University

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1186/s12864-023-09859-4.pdf

Reference61 articles.

1. RefSeq. growth statistics. http://www.ncbi.nlm.nih.gov/genbank/statistics/.

2. GenBank, Statistics WGS. http://www.ncbi.nlm.nih.gov/genbank/.

3. NCBI’s Sequence Read Archive SRA. database growth http://www.ncbi.nlm.nih.gov/sra/docs/sragrowth/.

4. O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, Rajput B, Robbertse B, Smith-White B, Ako-Adjei D, et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 2016;44:D733-745.

5. Leinonen R, Sugawara H, Shumway M. The sequence read archive. Nucleic Acids Res. 2011;39:D19-21.