BENCHMARKING HIGH PERFORMANCE ARCHITECTURES WITH NATURAL LANGUAGE PROCESSING ALGORITHMS
Keywords:benchmarking, part-of-speech tagging, document clustering, natural language processing, high performance architectures
AbstractNatural Language Processing algorithms are resource demanding, especially when tuning toinflective language like Polish is needed. The paper presents time and memory requirementsof part of speech tagging and clustering algorithms applied to two corpora of the Polishlanguage. The algorithms are benchmarked on three high performance platforms of differentarchitectures. Additionally sequential versions and OpenMP implementations of clusteringalgorithms were compared.
Broda B., Piasecki M.: Experiments in clustering documents for automatic acquisition of lexical semantic networks for Polish. [in:] Proc. of the 16th International Conference Intelligent Information Systems, Zakopane, Poland, 2008, pp. 203–212.
Piskorski J., Homola P., Marciniak M., Mykowiecka A., Przepiórkowski A., Wolinski M.: Information extraction for Polish using the SProUT platform. [in:] Proc. of the International Conference Intelligent Information Systems (IIS 2004), Siedlce, Poland, 2004, pp. 227–236.
G. Karypis.: CLUTO. A clustering toolkit. Technical Report 02–017, University of Minnesota, Department of Computer Science, 2003.
Kuta M., Chrzaszcz P., Kitowski J.: A case study of algorithms for morphosyntactic tagging of Polish language. Computing and Informatics, 26(6), 2007, pp. 627
Kuta M., Chrzaszcz P., Kitowski J.: Increasing quality of the Corpus of Frequency Dictionary of Contemporary Polish for morphosyntactic tagging of the Polish language. Computing and Informatics, 28(3), 2009, pp. 319–338.
Kuta M., Kitowski J.: Clustering Polish texts with latent semantic analysis. [in:] Proc. of the 10th International Conference on Artificial Intelligence and Soft Computing, Zakopane, Poland, 2010, pp. 532–539.
Kuta M., Wójcik W., Wrzeszcz M., Kitowski J.: Application of stacked methods to part-of-speech tagging of Polish. [in:] Proc. of the 8th International Conference on Parallel Proc. and Applied Mathematics, Wrocław, Poland, 2009, pp. 340–349.
Kuta M., Wójcik W., Wrzeszcz M., Kitowski J.: Application of weighted voting taggers to languages described with large tagsets. Computing and Informatics, (2), 2010, pp. 203–225.
Radovanovic M., Ivanovic M., Budimac Z.: Text categorization and sorting of web search results. Computing and Informatics, 28(6), 2009, pp. 861–893.
Halteren H. van, Zavrel J., Daelemans W.: Improving accuracy in word class tagging through the combination of machine learning systems. Computational Linguistics, 27(2), 2001, pp. 199–229.
Zhao Y., Karypis G.: Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery, 10(2), 2005, pp. 141–168.