POLISH PHONEME STATISTICS OBTAINED ON LARGE SET OF WRITTEN TEXTS
DOI:
https://doi.org/10.7494/csci.2009.10.3.97Keywords:
NLP, triphone statistics, speech processing, PolishAbstract
The phonetical statistics were collected from several Polish corpora. The paper is a summaryof the data which are phoneme n-grams and some phenomena in the statistics. Triphonestatistics apply context-dependent speech units which have an important role in speech recognitionsystems and were never calculated for a large set of Polish written texts. The standardphonetic alphabet for Polish, SAMPA, and methods of providing phonetic transcriptions are described.Downloads
References
Agirre E., Ansa O., Martínez D., Hovy E.: Enriching wordnet concepts with topic signatures, Procceedings of the SIGLEX Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations, 2001
Bellegarda J. R.: Large vocabulary speech recognition with multispan statistical language models, IEEE Transactions on Speech and Audio Processing, vol. 8, no. 1, pp. 76–84, 2000
Denes P. B.: Statistics of spoken English, The Journal of the Acoustical Society of America, vol. 34, pp. 1978–1979, 1962
Yannakoudakis E. J., Hutton P. J.: An assessment of n-phoneme statistics in phoneme guessing algorithms which aim to incorporate phonotactic constraints, Speech Communication, vol. 11, pp. 581–602, 1992
Basztura C.: Rozmawiac z komputerem, (Eng. To speak with computers). Format, 1992
Young S., Evermann G., Gales M., Hain T., Kershaw D., Moore G., Odell J., Ollason D., Povey D., Valtchev V., Woodland P.: HTK Book. UK: Cambridge University Engineering Department, 2005
Ziółko B., Gałka J., Manandhar S., Wilson R., Ziółko M.: Triphone statistics for polish language, Proceedings of 3rd Language and Technology Conference, 2007
Demenko G., Wypych M., Baranowska E.: Implementation of grapheme-tophoneme rules and extended SAMPA alphabet in Polish text-to-speech synthesis, Speech and Language Technology, PTFon, Poznan, vol. 7, no. 17, 2003
Young S.: Large vocabulary continuous speech recognition: a review, IEEE Signal Processing Magazine, vol. 13(5), pp. 45–57, 1996
Rabiner L., Juang B. H.: Fundamentals of speech recognition. New Jersey: PTR Prentice-Hall, Inc., 1993
Ostaszewska D., Tambor J.: Fonetyka i fonologia współczesnego j¸ezyka Polskiego (eng. Phonetics and phonology of modern Polish language). PWN, 2000
Steffen-Batóg M., Nowakowski P.: An algorithm for phonetic transcription of ortographic texts in Polish, Studia Phonetica Posnaniensia, vol. 3, 1993
Daelemans W., Bosch, van den, A.: Language-independent data-oriented grapheme-to-phoneme conversion, Progress in Speech Synthesis, New York: Springer-Verlag, 1997
Jassem K.: A phonemic transcription and syllable division rule engine, Onomastica-Copernicus Research Colloquium, Edinburgh, 1996