POLISH PHONEME STATISTICS OBTAINED ON LARGE SET OF WRITTEN TEXTS

Bartosz Ziółko, Jakub Gałka, Mariusz Ziółko

Abstract


The phonetical statistics were collected from several Polish corpora. The paper is a summaryof the data which are phoneme n-grams and some phenomena in the statistics. Triphonestatistics apply context-dependent speech units which have an important role in speech recognitionsystems and were never calculated for a large set of Polish written texts. The standardphonetic alphabet for Polish, SAMPA, and methods of providing phonetic transcriptions are described.

Keywords


NLP; triphone statistics; speech processing; Polish

Full Text:

PDF

References


Agirre E., Ansa O., Martínez D., Hovy E.: Enriching wordnet concepts with topic signatures, Procceedings of the SIGLEX Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations, 2001

Bellegarda J. R.: Large vocabulary speech recognition with multispan statistical language models, IEEE Transactions on Speech and Audio Processing, vol. 8, no. 1, pp. 76–84, 2000

Denes P. B.: Statistics of spoken English, The Journal of the Acoustical Society of America, vol. 34, pp. 1978–1979, 1962

Yannakoudakis E. J., Hutton P. J.: An assessment of n-phoneme statistics in phoneme guessing algorithms which aim to incorporate phonotactic constraints, Speech Communication, vol. 11, pp. 581–602, 1992

Basztura C.: Rozmawiac z komputerem, (Eng. To speak with computers). Format, 1992

Young S., Evermann G., Gales M., Hain T., Kershaw D., Moore G., Odell J., Ollason D., Povey D., Valtchev V., Woodland P.: HTK Book. UK: Cambridge University Engineering Department, 2005

Ziółko B., Gałka J., Manandhar S., Wilson R., Ziółko M.: Triphone statistics for polish language, Proceedings of 3rd Language and Technology Conference, 2007

Demenko G., Wypych M., Baranowska E.: Implementation of grapheme-tophoneme rules and extended SAMPA alphabet in Polish text-to-speech synthesis, Speech and Language Technology, PTFon, Poznan, vol. 7, no. 17, 2003

Young S.: Large vocabulary continuous speech recognition: a review, IEEE Signal Processing Magazine, vol. 13(5), pp. 45–57, 1996

Rabiner L., Juang B. H.: Fundamentals of speech recognition. New Jersey: PTR Prentice-Hall, Inc., 1993

Ostaszewska D., Tambor J.: Fonetyka i fonologia współczesnego j¸ezyka Polskiego (eng. Phonetics and phonology of modern Polish language). PWN, 2000

Steffen-Batóg M., Nowakowski P.: An algorithm for phonetic transcription of ortographic texts in Polish, Studia Phonetica Posnaniensia, vol. 3, 1993

Daelemans W., Bosch, van den, A.: Language-independent data-oriented grapheme-to-phoneme conversion, Progress in Speech Synthesis, New York: Springer-Verlag, 1997

Jassem K.: A phonemic transcription and syllable division rule engine, Onomastica-Copernicus Research Colloquium, Edinburgh, 1996




DOI: https://doi.org/10.7494/csci.2009.10.3.97

Refbacks

  • There are currently no refbacks.