A Character Frequency based Approach to Search for Substrings of a Circular Pattern and its Conjugates in an Online Text
-
Published:2021-04-15
Issue:2
Volume:22
Page:
-
ISSN:2300-7036
-
Container-title:Computer Science
-
language:
-
Short-container-title:csci
Abstract
A fundamental problem in computational biology is to deal with circular patterns. The problem consists of finding the least certain length substrings of a pattern and its rotations in the database. In this paper, a novel method is presented to deal with circular patterns. The problem is solved using two incremental steps. First, an algorithm is provided that reports all substrings of a given linear pattern in an online text. Next, without losing efficiency, the algorithm is extended to process all circular rotations of the pattern. For a given pattern P of size M, and a text T of size N, the algorithm reports all locations in the text where a substring of Pc is found, where Pc is one of the rotations of P. For an alphabet size σ, using O(M) space, desired goals are achieved in an average O(MN/σ) time, which is O(N) for all patterns of length M ≤ σ. Traditional string processing algorithms make use of advanced data structures such as suffix trees and automaton. We show that basic data structures such as arrays can be used in the text processing algorithms without compromising the efficiency.
Publisher
AGHU University of Science and Technology Press
Subject
Artificial Intelligence,Computational Theory and Mathematics,Computer Graphics and Computer-Aided Design,Computer Networks and Communications,Computer Vision and Pattern Recognition,Modelling and Simulation,Computer Science (miscellaneous)