ENHANCING REGULAR EXPRESSIONS FOR POLISH TEXT PROCESSING
DOI:
https://doi.org/10.7494/csci.2009.10.3.19Keywords:
regular expressions, regex, natural language, the Polish language processing, CLP libraryAbstract
The paper presents proposition of regular expressions engine based on the modified Thompson’salgorithm dedicated to the Polish language processing. The Polish inflectional dictionaryhas been used for enhancing regular expressions engine and syntax. Instead of usingcharacters as a basic element of regular expressions patterns (as it takes place in BRE orERE standards) presented tool gives possibility of using words from a natural language orlabels describing words grammar properties in regex syntax.Downloads
References
W. Lubaszewski et. al.: Słowniki komputerowe i automatyczna ekstrakcja informacji z tekstu. Wydawnictwo AGH, pp. 107–126, 2009
E. Branny, M. Gajecki: Text Summarizing in Polish. Computer Science, Annual of AGH University Of Science and Technology, pp. 31–46, 2005
G. Grefenstette, P. Tapanainen: What is a word, What is a sentence? Problems of Tokenization.. 3rd Conference on Computational Lexicography and Text Research COMPLEX’94 Budapest, 1994
A. A. R. Sethi, J. D. Ullman: Compilers: Principles, Techniques, and Tools.. Addison-Wesley, 1988
J. Hopcroft, J. Ullman: Introduction to Automata Theory, Languages and Computation. Addison-Wesley, 1979
Regular Expressions. The Single UNIX Specification, Version 2, The Open Group, 1997, http://opengroup.org/onlinepubs/007908775/xbd/re.html