ENHANCING REGULAR EXPRESSIONS FOR POLISH TEXT PROCESSING

Krzysztof Dorosz, Anna Szczerbińska

Abstract


The paper presents proposition of regular expressions engine based on the modified Thompson’salgorithm dedicated to the Polish language processing. The Polish inflectional dictionaryhas been used for enhancing regular expressions engine and syntax. Instead of usingcharacters as a basic element of regular expressions patterns (as it takes place in BRE orERE standards) presented tool gives possibility of using words from a natural language orlabels describing words grammar properties in regex syntax.

Keywords


regular expressions; regex; natural language; the Polish language processing; CLP library

Full Text:

PDF

References


W. Lubaszewski et. al.: Słowniki komputerowe i automatyczna ekstrakcja informacji z tekstu. Wydawnictwo AGH, pp. 107–126, 2009

E. Branny, M. Gajecki: Text Summarizing in Polish. Computer Science, Annual of AGH University Of Science and Technology, pp. 31–46, 2005

G. Grefenstette, P. Tapanainen: What is a word, What is a sentence? Problems of Tokenization.. 3rd Conference on Computational Lexicography and Text Research COMPLEX’94 Budapest, 1994

A. A. R. Sethi, J. D. Ullman: Compilers: Principles, Techniques, and Tools.. Addison-Wesley, 1988

J. Hopcroft, J. Ullman: Introduction to Automata Theory, Languages and Computation. Addison-Wesley, 1979

Regular Expressions. The Single UNIX Specification, Version 2, The Open Group, 1997, http://opengroup.org/onlinepubs/007908775/xbd/re.html




DOI: https://doi.org/10.7494/csci.2009.10.3.19

Refbacks

  • There are currently no refbacks.