Abstract
AbstractK-mer can be used for the description of biological sequences and k-mer distribution is a tool for solving sequences analysis problems in bioinformatics. We can use k-mer vector as a representation method of the k-mer distribution of the biological sequence. Problems, such as similarity calculations or sequence assembly, can be described in the k-mer vector space. It helps us to identify new features of an old sequence-based problem in bioinformatics and develop new algorithms using the concepts and methods from linear space theory. In this study, we defined the k-mer vector space for the generalized biological sequences. The meaning of corresponding vector operations is explained in the biological context. We presented the vector/matrix form of several widely seen sequence-based problems, including read quantification, sequence assembly, and pattern detection problem. Its advantages and disadvantages are discussed. Also, we implement a tool for the sequence assembly problem based on the concepts of k-mer vector methods. It shows the practicability and convenience of this algorithm design strategy.
Publisher
Springer Science and Business Media LLC
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献