kmerDB: A Database Encompassing the Set of Genomic and Proteomic Sequence Information for Each Species-Reference-Cited by-同舟云学术

kmerDB: A Database Encompassing the Set of Genomic and Proteomic Sequence Information for Each Species

Published:2023-11-16 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Mouratidis Ioannis,Baltoumas Fotis A.,Chantzi Nikol,Chan Candace S.Y.,Montgomery Austin,Konnaris Maxwell A.,Georgakopoulos George C.,Das Anshu,Chartoumpekis Dionysios,Kovac Jasna^ORCID,Pavlopoulos Georgios A.,Georgakopoulos-Soares Ilias

Abstract

ABSTRACTThe rapid decline in sequencing cost has enabled the generation of reference genomes and proteomes for a growing number of organisms. However, at the present time, there is no established repository that provides information about organism-specific genomic and proteomic sequences of certain lengths, also known as kmers, that are either present or absent in each genome or proteome. In this article, we present kmerDB, a database accessible through an interactive web interface that provides kmer based information from genomic and proteomic sequences in a systematic way. kmerDB currently contains 202,340,859,107 base pairs and 19,304,903,356 amino acids, spanning 45,785 and 22,386 reference genomes and proteomes, respectively, as well as 14,658,776 and 149,264,442 genomic and proteomic species-specific sequences, termed quasi-primes. Additionally, we provide access to 5,186,757 nucleic and 214,904,089 peptide sequences that are absent from every genome and proteome, termed primes. kmerDB features a user-friendly interface offering various search options and filters for easy parsing and searching. The service is available at:www.kmerdb.com.

Publisher

Cold Spring Harbor Laboratory

Reference33 articles.

1. Nullomers: Really a Matter of Natural Selection?

2. Nullomer Derived Anticancer Peptides (NulloPs): Differential Lethal Effects on Normal and Cancer Cells in Vitro;Peptides,2012

3. The Effect of Nullomer-Derived Peptides 9R, 9S1R and 124R on the NCI-60 Panel and Normal Cell Lines;BMC Cancer,2017

4. GenBank

5. The InterPro protein families and domains database: 20 years on

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A survey of k-mer methods and applications in bioinformatics;Computational and Structural Biotechnology Journal;2024-12

2. The determinants of the rarity of nucleic and peptide short sequences in nature;NAR Genomics and Bioinformatics;2024-04-04