ENHANCING REGULAR EXPRESSIONS FOR POLISH TEXT PROCESSING

Authors

  • Krzysztof Dorosz AGH University of Science and Technology, Jagiellonian University, Krakow
  • Anna Szczerbińska AGH University of Science and Technology

DOI:

https://doi.org/10.7494/csci.2009.10.3.19

Keywords:

regular expressions, regex, natural language, the Polish language processing, CLP library

Abstract

The paper presents proposition of regular expressions engine based on the modified Thompson’salgorithm dedicated to the Polish language processing. The Polish inflectional dictionaryhas been used for enhancing regular expressions engine and syntax. Instead of usingcharacters as a basic element of regular expressions patterns (as it takes place in BRE orERE standards) presented tool gives possibility of using words from a natural language orlabels describing words grammar properties in regex syntax.

Downloads

Download data is not yet available.

Author Biographies

Krzysztof Dorosz, AGH University of Science and Technology, Jagiellonian University, Krakow

PhD Student, Institute of Computer Science, Computational Linguistics Department

Anna Szczerbińska, AGH University of Science and Technology

Msc. student, Institute of Computer Science

References

W. Lubaszewski et. al.: Słowniki komputerowe i automatyczna ekstrakcja informacji z tekstu. Wydawnictwo AGH, pp. 107–126, 2009

E. Branny, M. Gajecki: Text Summarizing in Polish. Computer Science, Annual of AGH University Of Science and Technology, pp. 31–46, 2005

G. Grefenstette, P. Tapanainen: What is a word, What is a sentence? Problems of Tokenization.. 3rd Conference on Computational Lexicography and Text Research COMPLEX’94 Budapest, 1994

A. A. R. Sethi, J. D. Ullman: Compilers: Principles, Techniques, and Tools.. Addison-Wesley, 1988

J. Hopcroft, J. Ullman: Introduction to Automata Theory, Languages and Computation. Addison-Wesley, 1979

Regular Expressions. The Single UNIX Specification, Version 2, The Open Group, 1997, http://opengroup.org/onlinepubs/007908775/xbd/re.html

Downloads

Published

2013-03-20

How to Cite

Dorosz, K., & Szczerbińska, A. (2013). ENHANCING REGULAR EXPRESSIONS FOR POLISH TEXT PROCESSING. Computer Science, 10(3), 19. https://doi.org/10.7494/csci.2009.10.3.19

Issue

Section

Articles