A Character Frequency based Approach to Search for Substrings of a Circular Pattern and its Conjugates in an Online Text

Authors

  • Vinod Prasad

DOI:

https://doi.org/10.7494/csci.2021.22.2.3401

Keywords:

Pattern matching in Strings, Circular Pattern Matching, Similarity Search

Abstract

A fundamental problem in computational biology is to deal with circular patterns. The problem consists of finding the least certain length substrings of a pattern and its rotations in the database. In this paper, a novel method is presented to deal with circular patterns. The problem is solved using two incremental steps. First, an algorithm is provided that reports all substrings of a given linear pattern in an online text. Next, without losing efficiency, the algorithm is extended to process all circular rotations of the pattern. For a given pattern P of size M, and a text T of size N, the algorithm reports all locations in the text where a substring of Pc is found, where Pc is one of the rotations of P. For an alphabet size σ, using O(M) space, desired goals are achieved in an average O(MN/σ) time, which is O(N) for all patterns of length M ≤ σ. Traditional string processing algorithms make use of advanced data structures such as suffix trees and automaton. We show that basic data structures such as arrays can be used in the text processing algorithms without compromising the efficiency.

Downloads

Download data is not yet available.

References

M. Crochemore and W. Rytter, Text Algorithms. Oxford University Press, 1994.

A. Apostolico and Z. Galil, Pattern Matching Algorithms. Oxford University Press, 1997.

E. McCreight, A space-economical suffix tree construction algorithm, Journal of ACM 23, 1976, pp. 262-272.

E. Ukkonen, On-line construction of suffix trees. Algorithmica, 41, 1995, pp. 249-260.

D. Gusfield , Algorithms on strings, trees and sequences. Computer Science and Computational Biology, Cambridge University Press, 1999, ISBN 0-521-58519-8.

C. Iliopoulos and M. Rahman, Indexing circular patterns, WALCOM’08, Berlin, Springer-Verlag, 2008, pp. 46–57.

F. Fernandes, L. Pereira, and A Freitas, CSA: An efficient algorithm to improve circular DNA multiple alignments. BMC Bioinformatics, 2009, 10:1–13.

G.Navarro and M. Raffinot, , Fast and Flexible String Matching by Combining bit-parallelism and Suffix Automata. ACM Journal of Experimental Algorithmics 2000, 5(4).

K. Chen, G. Huang, and R. Lee: Exact circular pattern matching using the BNDM algorithm. In Proceedings of the 28th Workshop on Combinatorial Mathematics and Computation Theory, 2011, pp. 152-161.

T. Hirvola and J. Tarhio, Approximate Online Matching of Circular Strings. Lecture Notes in Computer Science, Vol 8504, 2014, pp 315-325.

K. Chen, G. Huang, and R. Lee, Bit-parallel algorithms for exact circular string matching, Computer Journal. 57 (5), 2014, pp. 731-743.

J. Lin and D. Adjeroh, All-against-all circular pattern matching. Computer Journal 55(7), 2012, pp. 897–906.

M. Lothaire, Applied Combinatorics on Words, Cambridge University Press, 2005.

C. Barton, C Iliopoulos, and S Pissis, Fast Algorithms for approximate circular matching. Algorithms for Molecular Biology, 9:9, 2014.

P. Vinodprasad, A Novel Algorithm for String Matching with Mismatches, 5th international conference of Pattern Recognitions Applications and Methods Rome, Feb 2016, pp. 638-644.

http://www.ncbi.nlm.nih.gov/protein/

Downloads

Published

2021-04-15

Issue

Section

Articles

How to Cite

Prasad, V. (2021). A Character Frequency based Approach to Search for Substrings of a Circular Pattern and its Conjugates in an Online Text. Computer Science, 22(2). https://doi.org/10.7494/csci.2021.22.2.3401