Adapting a Constituency Parser to User-Generated Content in Polish Opinion Mining

Agnieszka Pluwak, Wojciech Korczynski, Marek Kisiel-Dorohinicki


The paper focuses on the adjustment of NLP tools for Polish; e.g., morphological analyzers and parsers, to user-generated content (UGC). The authors discuss two rule-based techniques applied to improve their efficiency: pre-processing (text normalization) and parser adaptation (modified segmentation and parsing rules). A new solution to handle OOVs based on inflectional translation is also offered.


user generated content; text normalization; parsing; sentiment analysis

