• Bartosz Ziółko AGH University of Science and Technology
  • Jakub Gałka AGH University of Science and Technology
  • Mariusz Ziółko AGH University of Science and Technology




NLP, triphone statistics, speech processing, Polish


The phonetical statistics were collected from several Polish corpora. The paper is a summaryof the data which are phoneme n-grams and some phenomena in the statistics. Triphonestatistics apply context-dependent speech units which have an important role in speech recognitionsystems and were never calculated for a large set of Polish written texts. The standardphonetic alphabet for Polish, SAMPA, and methods of providing phonetic transcriptions are described.


Download data is not yet available.

Author Biographies

Bartosz Ziółko, AGH University of Science and Technology

Department of Electronics

Jakub Gałka, AGH University of Science and Technology

Department of Electronics

Mariusz Ziółko, AGH University of Science and Technology

Department of Electronics


Agirre E., Ansa O., Martínez D., Hovy E.: Enriching wordnet concepts with topic signatures, Procceedings of the SIGLEX Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations, 2001

Bellegarda J. R.: Large vocabulary speech recognition with multispan statistical language models, IEEE Transactions on Speech and Audio Processing, vol. 8, no. 1, pp. 76–84, 2000

Denes P. B.: Statistics of spoken English, The Journal of the Acoustical Society of America, vol. 34, pp. 1978–1979, 1962

Yannakoudakis E. J., Hutton P. J.: An assessment of n-phoneme statistics in phoneme guessing algorithms which aim to incorporate phonotactic constraints, Speech Communication, vol. 11, pp. 581–602, 1992

Basztura C.: Rozmawiac z komputerem, (Eng. To speak with computers). Format, 1992

Young S., Evermann G., Gales M., Hain T., Kershaw D., Moore G., Odell J., Ollason D., Povey D., Valtchev V., Woodland P.: HTK Book. UK: Cambridge University Engineering Department, 2005

Ziółko B., Gałka J., Manandhar S., Wilson R., Ziółko M.: Triphone statistics for polish language, Proceedings of 3rd Language and Technology Conference, 2007

Demenko G., Wypych M., Baranowska E.: Implementation of grapheme-tophoneme rules and extended SAMPA alphabet in Polish text-to-speech synthesis, Speech and Language Technology, PTFon, Poznan, vol. 7, no. 17, 2003

Young S.: Large vocabulary continuous speech recognition: a review, IEEE Signal Processing Magazine, vol. 13(5), pp. 45–57, 1996

Rabiner L., Juang B. H.: Fundamentals of speech recognition. New Jersey: PTR Prentice-Hall, Inc., 1993

Ostaszewska D., Tambor J.: Fonetyka i fonologia współczesnego j¸ezyka Polskiego (eng. Phonetics and phonology of modern Polish language). PWN, 2000

Steffen-Batóg M., Nowakowski P.: An algorithm for phonetic transcription of ortographic texts in Polish, Studia Phonetica Posnaniensia, vol. 3, 1993

Daelemans W., Bosch, van den, A.: Language-independent data-oriented grapheme-to-phoneme conversion, Progress in Speech Synthesis, New York: Springer-Verlag, 1997

Jassem K.: A phonemic transcription and syllable division rule engine, Onomastica-Copernicus Research Colloquium, Edinburgh, 1996




How to Cite

Ziółko, B., Gałka, J., & Ziółko, M. (2013). POLISH PHONEME STATISTICS OBTAINED ON LARGE SET OF WRITTEN TEXTS. Computer Science, 10(3), 97. https://doi.org/10.7494/csci.2009.10.3.97




Most read articles by the same author(s)