BENCHMARKING HIGH PERFORMANCE ARCHITECTURES WITH NATURAL LANGUAGE PROCESSING ALGORITHMS

Marcin Kuta; Jacek Kitowski

doi:10.7494/csci.2011.12.0.19

Authors

Marcin Kuta AGH University of Science and Technology
Jacek Kitowski AGH University of Science and Technology

DOI:

https://doi.org/10.7494/csci.2011.12.0.19

Keywords:

benchmarking, part-of-speech tagging, document clustering, natural language processing, high performance architectures

Abstract

Natural Language Processing algorithms are resource demanding, especially when tuning toinflective language like Polish is needed. The paper presents time and memory requirementsof part of speech tagging and clustering algorithms applied to two corpora of the Polishlanguage. The algorithms are benchmarked on three high performance platforms of differentarchitectures. Additionally sequential versions and OpenMP implementations of clusteringalgorithms were compared.

Downloads

Download data is not yet available.

Author Biographies

Marcin Kuta, AGH University of Science and Technology

Faculty of Electrical Engineering, Automatics, ITand Electronics, Department of Computer Science
Jacek Kitowski, AGH University of Science and Technology

Faculty of Electrical Engineering, Automatics, ITand Electronics, Department of Computer Science, ACC CYFRONET AGH

References

Broda B., Piasecki M.: Experiments in clustering documents for automatic acquisition of lexical semantic networks for Polish. [in:] Proc. of the 16th International Conference Intelligent Information Systems, Zakopane, Poland, 2008, pp. 203–212.

Piskorski J., Homola P., Marciniak M., Mykowiecka A., Przepiórkowski A., Wolinski M.: Information extraction for Polish using the SProUT platform. [in:] Proc. of the International Conference Intelligent Information Systems (IIS 2004), Siedlce, Poland, 2004, pp. 227–236.

G. Karypis.: CLUTO. A clustering toolkit. Technical Report 02–017, University of Minnesota, Department of Computer Science, 2003.

Kuta M., Chrzaszcz P., Kitowski J.: A case study of algorithms for morphosyntactic tagging of Polish language. Computing and Informatics, 26(6), 2007, pp. 627

Kuta M., Chrzaszcz P., Kitowski J.: Increasing quality of the Corpus of Frequency Dictionary of Contemporary Polish for morphosyntactic tagging of the Polish language. Computing and Informatics, 28(3), 2009, pp. 319–338.

Kuta M., Kitowski J.: Clustering Polish texts with latent semantic analysis. [in:] Proc. of the 10th International Conference on Artificial Intelligence and Soft Computing, Zakopane, Poland, 2010, pp. 532–539.

Kuta M., Wójcik W., Wrzeszcz M., Kitowski J.: Application of stacked methods to part-of-speech tagging of Polish. [in:] Proc. of the 8th International Conference on Parallel Proc. and Applied Mathematics, Wrocław, Poland, 2009, pp. 340–349.

Kuta M., Wójcik W., Wrzeszcz M., Kitowski J.: Application of weighted voting taggers to languages described with large tagsets. Computing and Informatics, (2), 2010, pp. 203–225.

Radovanovic M., Ivanovic M., Budimac Z.: Text categorization and sorting of web search results. Computing and Informatics, 28(6), 2009, pp. 861–893.

Halteren H. van, Zavrel J., Daelemans W.: Improving accuracy in word class tagging through the combination of machine learning systems. Computational Linguistics, 27(2), 2001, pp. 199–229.

Zhao Y., Karypis G.: Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery, 10(2), 2005, pp. 141–168.

BENCHMARKING HIGH PERFORMANCE ARCHITECTURES WITH NATURAL LANGUAGE PROCESSING ALGORITHMS

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

References

Downloads

Published

Issue

Section

How to Cite

Most read articles by the same author(s)

Latest publications

Information

Make a Submission