Previous |  Up |  Next

Article

Keywords:
exact pattern matching; approximate pattern matching; finite automata; dynamic programming; bitwise parallelism; suffix automaton; border array; degenerate symbol
Summary:
We present an overview of four approaches of the finite automata use in stringology: deterministic finite automaton, deterministic simulation of nondeterministic finite automaton, finite automaton as a model of computation, and compositions of finite automata solutions. We also show how the finite automata can process strings build over more complex alphabet than just single symbols (degenerate symbols, strings, variables).
References:
[1] Abrahamson, K.: Generalized string matching. SIAM J. Comput. 16 (1987), 6, 1039–1051. DOI 10.1137/0216067 | MR 0917040 | Zbl 0646.68079
[2] Aho, A. V., Corasick, M. J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18 (1975), 6, 333–340. DOI 10.1145/360825.360855 | MR 0371172 | Zbl 0301.68048
[3] Allauzen, C., Crochemore, M., Raffinot, M.: Factor oracle: A new structure for pattern matching. In: SOFSEM'99, Theory and Practice of Informatics (J. Pavelka, G. Tel, and M. Bartošek, eds.), Milovy 1999, Lect. Notes Comput. Sci. 1725 (1999), Springer-Verlag, Berlin, pp. 295–310. MR 1784520 | Zbl 0964.68078
[4] Antoniou, P., Holub, J., Iliopoulos, C. S., Melichar, B., Peterlongo, P.: Finding common motifs with gaps using finite automata. In: Implementation and Application of Automata (O. H. Ibarra and H.-C. Yen, eds.), Lect. Notes Comput. Sci. 4094 (2006), Springer-Verlag, Heidelberg, pp. 69–77. MR 2296448 | Zbl 1160.68683
[5] Baeza-Yates, R. A., Gonnet, G. H.: A new approach to text searching. Commun. ACM 35 (1992), 10, 74–82. DOI 10.1145/135239.135243
[6] Baker, B. S.: Parameterized duplication in strings: algorithms and an application to software maintenance. SIAM J. Comput. 26 (1997), 5, 1343–1362. DOI 10.1137/S0097539793246707 | MR 1471985 | Zbl 0885.68085
[7] Blumer, A., Blumer, J., Ehrenfeucht, A., Haussler, D., Chen, M. T., Seiferas, J.: The smallest automaton recognizing the subwords of a text. Theor. Comput. Sci. 40 (1985), 1, 31–44. DOI 10.1016/0304-3975(85)90157-4 | MR 0828515 | Zbl 0574.68070
[8] Blumer, A., Blumer, J., Ehrenfeucht, A., Haussler, D., McConnel, R.: Complete inverted files for efficient text retrieval and analysis. J. Assoc. Comput. Mach. 34 (1987), 3, 578–595. DOI 10.1145/28869.28873 | MR 0904194
[9] Boyer, R. S., Moore, J. S.: A fast string searching algorithm. Commun. ACM 20 (1977), 10, 762–772. DOI 10.1145/359842.359859 | Zbl 1219.68165
[10] Cambouropoulos, E., Crochemore, M., Iliopoulos, C. S., Mouchard, L., Pinzon, Y. J.: Algorithms for computing approximate repetitions in musical sequences. In: Proc. Tenth Australian Workshop on Combinatorial Algorithms (R. Raman and J. Simpson, eds.), AWOCA'99, School of Computing, Curtin University of Technology, Perth 1999, pp. 129–144. Zbl 1008.68043
[11] Commentz-Walter, B.: A string matching algorithm fast on the average. In: Proc. 6th International Colloquium on Automata, Languages and Programming (H. A. Maurer, ed.), Graz 1979, Lect. Notes Comput. Sci. 71 (1979), Springer-Verlag, Berlin, pp. 118–132. Zbl 0407.68092
[12] Crochemore, M.: Transducers and repetitions. Theor. Comput. Sci. 45 (1986), 1, 63–86. DOI 10.1016/0304-3975(86)90041-1 | MR 0865967 | Zbl 0615.68053
[13] Damerau, F.: A technique for computer detection and correction of spelling errors. Commun. ACM 7 (1964), 3, 171–176. DOI 10.1145/363958.363994
[14] Dömölki, B.: An algorithm for syntactical analysis. Comput. Linguistics 3 (1964), 29–46, 1964.
[15] Fischer, M. J., Paterson, M.: String matching and other products. In: Proc. SIAM-AMS Complexity of Computation (R. M. Karp, ed.), Providence 1974, pp. 113–125. MR 0400782 | Zbl 0301.68027
[16] Fredriksson, K., Mozgovoy, M.: Efficient parameterized string matching. Inf. Process. Lett. 100 (2006), 3, 91–96. DOI 10.1016/j.ipl.2006.06.009 | MR 2254198 | Zbl 1185.68282
[17] Galil, Z.: Open problems in stringology. In: Combinatorial Algorithms on Words (A. Apostolico and Z. Galil, eds.), NATO ASI Series F 12 (1985), Springer-Verlag, Berlin, pp. 1–8. MR 0815328 | Zbl 0607.68054
[18] Hamming, R. W.: Error detecting and error correcting codes. The Bell System Techn. J. 29 (1950), 2, 147–160. MR 0035935
[19] Holub, J.: Bit parallelism-NFA simulation. In: Implementation and Application of Automata (B. W. Watson and D. Wood, eds.), Lect. Notes in Comput. Sci. 2494 (2002), Springer-Verlag, Heidelberg, pp. 149–160. Zbl 1015.68524
[20] Holub, J.: Finite automata in pattern matching. In: Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications (M. Elloumi and A. Y. Zomaya, eds.), John Wiley and Sons, 2011, pp. 51–71.
[21] Holub, J., Kadlec, T.: NFA simulation using deterministic state cache. In: London Algorithmics 2008: Theory and Practice (J. Chan, J. W. Daykin, and M. S. Rahman, eds.), College Publications 2009, pp. 152–166.
[22] Holub, J., Melichar, B.: Implementation of nondeterministic finite automata for approximate pattern matching. In: Proc. 3rd International Workshop on Implementing Automata, Rouen 1998, pp. 74–81.
[23] Holub, J., Melichar, B.: Approximate string matching using factor automata. Theor. Comput. Sci. 249 (2000), 2, 30–311. DOI 10.1016/S0304-3975(00)00064-5 | MR 1798315 | Zbl 0949.68088
[24] Holub, J., Smyth, W. F.: Algorithms on indeterminate strings. In: Proc. 14th Australasian Workshop On Combinatorial Algorithms (M. Miller and K. Park, eds.), Seoul National University, Seoul 2003, pp. 36–45.
[25] Holub, J., Smyth, W. F., Wang, S.: Hybrid pattern-matching algorithms on indeterminate strings. In: London Stringology Day and London Algorithmic Workshop 2006 (J. Daykin, M. Mohamed, and K. Steinhoefel, eds.) King's College London Series Texts in Algorithmics 2007, pp. 115–133.
[26] Holub, J., Smyth, W. F., Wang, S.: Fast pattern-matching on indeterminate strings. J. Discret. Algorithms 6 (2008), 1, 37–50. DOI 10.1016/j.jda.2006.10.003 | MR 2397082 | Zbl 1162.68808
[27] Holub, J., Špiller, P.: Practical experiments with NFA simulation. In: Proc. Eindhoven FASTAR Days 2004 (L. Cleophas and B. W. Watson, eds.), TU Eindhoven 2004, The Finite Automata Approaches in Stringology 15, pp. 73–95.
[28] Hopcroft, J. E., Ullman, J. D.: Introduction to automata, languages and computations. Addison-Wesley, Reading 1979. MR 0645539
[29] Huffman, D. A.: The synthesis of sequential switching circuits. J. Franklin Inst. 257 (1954), 161–190, 275–303. DOI 10.1016/0016-0032(54)90574-8 | MR 0062648 | Zbl 0166.27201
[30] Huffman, D. A.: A method for the construction of minimum-redundancy codes. Proc. Inst. Radio Engrs 40 (1952), 9, 1098–1101. Zbl 0137.13605
[31] Iliopoulos, C. S., Jayasekera, I., Melichar, B., Šupol, J.: Weighted degenerated approximate pattern matching. In: Proc. 1st International Conference on Language and Automata Theory and Applications, Tarragona 2007.
[32] Kleene, S. C.: Representation of events in nerve nets and finite automata. Automata Studies (1956), 3–42. MR 0077478
[33] Knuth, D. E., Morris, J. H., Jr, Pratt, V. R.: Fast pattern matching in strings. SIAM J. Comput. 6 (1977), 2, 323–350. DOI 10.1137/0206024 | MR 0451916 | Zbl 0372.68005
[34] Kurtz, S.: Reducing the space requirements of suffix trees. Softw. Pract. Exp. 29 (1999), 13, 1149–1171. DOI 10.1002/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO;2-O
[35] Lahoda, J., Melichar, B.: Pattern matching in text coded by finite translation automaton. In: Proc. 7th International Multiconference Information Society (M. Heričko et al., eds.), Institut Jožef Stefan, Ljubljana 2004, pp. 212–214.
[36] Lahoda, J., Melichar, B.: General pattern matching on regular collage system. In: Proc. Prague Stringology Conference'05 (J. Holub and M. Šimánek, eds.), Czech Technical University in Prague 2005, pp. 153–162.
[37] Levenshtein, V. I.: Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 6 (1966), 707–710. MR 0189928
[38] McCulloch, W. S., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5 (1943), 115–133. DOI 10.1007/BF02478259 | MR 0010388 | Zbl 0063.03860
[39] Mealy, G. H.: A method for synthetizing sequential circuits. Bell System Techn. J. 34 (1955), 5, 1045–1079. DOI 10.1002/j.1538-7305.1955.tb03788.x | MR 0073450
[40] Melichar, B.: Repetitions in text and finite automata. In: Proc. Eindhoven FASTAR Days 2004 (L. Cleophas and B. W. Watson, eds.), TU Eindhoven 2004, pp. 1–46.
[41] Melichar, B., Holub, J.: 6D classification of pattern matching problems. In: Proceedings of the Prague Stringology Club Workshop'97 (J. Holub, ed.), Czech Technical University in Prague, 1997, pp. 24–32. Collaborative Report DC-97-03.
[42] Melichar, B., Holub, J., Polcar, T.: Text searching algorithms. Vol. I: Forward string matching. Textbook for course Text Searching Algorithms, 2005.
[43] Moore, E. F.: Gedanken experiments on sequential machines. Automata Studies (1965), 129–153. MR 0078059
[44] Morris, J. H., Jr, Pratt, V. R.: A Linear Pattern-Matching Algorithm. Report 40, University of California, Berkeley 1970.
[45] Navarro, G., Raffinot, M.: A bit-parallel approach to suffix automata: Fast extended string matching. In: Proc. 9th Annual Symposium on Combinatorial Pattern Matching (M. Farach-Colton, ed.), Lect. Notes in Comput. Sci. 1448, Piscataway, NJ, 1998. Springer-Verlag, Berlin, pp. 14–33. MR 1672618
[46] Rabin, M. O., Scott, D.: Finite automata and their decision problems. IBM J. Res. Dev. 3 (1959) 114–125. DOI 10.1147/rd.32.0114 | MR 0103795 | Zbl 0158.25404
[47] Sellers, P. H.: The theory and computation of evolutionary distances: Pattern recognition. J. Algorithms 1 (1980), 4, 359–373. DOI 10.1016/0196-6774(80)90016-4 | MR 0604870 | Zbl 0454.68110
[48] Shyamasundar, R. K.: A Simple String Matching Algorithm. Technical Report, Tata Institute of Fundamental Research 1976.
[49] M. Šimůnek, Melichar, B.: Borders and finite automata. In: Implementation and Application of Automata (O. H. Ibarra and H.-C. Yen, eds.), Lecture Notes in Comput. Sci. 4094, Springer-Verlag, Heidelberg 2006, pp. 58–68. MR 2296447 | Zbl 1142.68416
[50] Ukkonen, E.: Finding approximate patterns in strings. J. Algorithms 6 (1985), 1–3, 132–137. DOI 10.1016/0196-6774(85)90023-9 | MR 0780855 | Zbl 0566.68072
[51] Voráček, M., Vagner, L., Rahman, S., Iliopoulos, C. S.: The constrained longest common subsequence problem for degenerate strings. In: Implementation and Application of Automata (J. Holub and J. Žďárek, eds.), Lecture Notes in Comput. Sci. 4783, Springer-Verlag, Heidelberg 2007, pp. 309–311. Zbl 1137.68621
[52] Wu, S., Manber, U.: Fast text searching allowing errors. Commun. ACM 35 (1992), 10, 83–91. DOI 10.1145/135239.135244
[53] Ziv, J., Lempel, A.: Compression of individual sequences via variable length coding. IEEE Trans. Inf. Theory 24 (1978), 530–536. DOI 10.1109/TIT.1978.1055934 | MR 0507465
Partner of
EuDML logo