BENCHMARKING HIGH PERFORMANCE ARCHITECTURES WITH NATURAL LANGUAGE PROCESSING ALGORITHMS

Authors

  • Marcin Kuta AGH University of Science and Technology
  • Jacek Kitowski AGH University of Science and Technology

DOI:

https://doi.org/10.7494/csci.2011.12.0.19

Keywords:

benchmarking, part-of-speech tagging, document clustering, natural language processing, high performance architectures

Abstract

Natural Language Processing algorithms are resource demanding, especially when tuning toinflective language like Polish is needed. The paper presents time and memory requirementsof part of speech tagging and clustering algorithms applied to two corpora of the Polishlanguage. The algorithms are benchmarked on three high performance platforms of differentarchitectures. Additionally sequential versions and OpenMP implementations of clusteringalgorithms were compared.

Downloads

Download data is not yet available.

Author Biographies

  • Marcin Kuta, AGH University of Science and Technology
    Faculty of Electrical Engineering, Automatics, ITand Electronics, Department of Computer Science
  • Jacek Kitowski, AGH University of Science and Technology
    Faculty of Electrical Engineering, Automatics, ITand Electronics, Department of Computer Science, ACC CYFRONET AGH

References

Broda B., Piasecki M.: Experiments in clustering documents for automatic acquisition of lexical semantic networks for Polish. [in:] Proc. of the 16th International Conference Intelligent Information Systems, Zakopane, Poland, 2008, pp. 203–212.

Piskorski J., Homola P., Marciniak M., Mykowiecka A., Przepiórkowski A., Wolinski M.: Information extraction for Polish using the SProUT platform. [in:] Proc. of the International Conference Intelligent Information Systems (IIS 2004), Siedlce, Poland, 2004, pp. 227–236.

G. Karypis.: CLUTO. A clustering toolkit. Technical Report 02–017, University of Minnesota, Department of Computer Science, 2003.

Kuta M., Chrzaszcz P., Kitowski J.: A case study of algorithms for morphosyntactic tagging of Polish language. Computing and Informatics, 26(6), 2007, pp. 627

Kuta M., Chrzaszcz P., Kitowski J.: Increasing quality of the Corpus of Frequency Dictionary of Contemporary Polish for morphosyntactic tagging of the Polish language. Computing and Informatics, 28(3), 2009, pp. 319–338.

Kuta M., Kitowski J.: Clustering Polish texts with latent semantic analysis. [in:] Proc. of the 10th International Conference on Artificial Intelligence and Soft Computing, Zakopane, Poland, 2010, pp. 532–539.

Kuta M., Wójcik W., Wrzeszcz M., Kitowski J.: Application of stacked methods to part-of-speech tagging of Polish. [in:] Proc. of the 8th International Conference on Parallel Proc. and Applied Mathematics, Wrocław, Poland, 2009, pp. 340–349.

Kuta M., Wójcik W., Wrzeszcz M., Kitowski J.: Application of weighted voting taggers to languages described with large tagsets. Computing and Informatics, (2), 2010, pp. 203–225.

Radovanovic M., Ivanovic M., Budimac Z.: Text categorization and sorting of web search results. Computing and Informatics, 28(6), 2009, pp. 861–893.

Halteren H. van, Zavrel J., Daelemans W.: Improving accuracy in word class tagging through the combination of machine learning systems. Computational Linguistics, 27(2), 2001, pp. 199–229.

Zhao Y., Karypis G.: Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery, 10(2), 2005, pp. 141–168.

Downloads

Published

2013-03-10

Issue

Section

Articles

How to Cite

BENCHMARKING HIGH PERFORMANCE ARCHITECTURES WITH NATURAL LANGUAGE PROCESSING ALGORITHMS. (2013). Computer Science, 12, 19. https://doi.org/10.7494/csci.2011.12.0.19