Towards a new Approach for Arabic root extraction:Exploit relations between the word letters and their placement in the word for Arabic root extraction

Fatma Abu Hawas

doi:10.7494/csci.2013.14.2.327

Authors

Fatma Abu Hawas Yarmouk University

DOI:

https://doi.org/10.7494/csci.2013.14.2.327

Keywords:

Rule-based stemmer, word root, suffixes, prefixes, words patterns

Abstract

In this paper we present a new root-extraction approach for Arabic words. The approach tries to assign for Arabic word a unique root without having a database of word roots, a list of words patterns or even a list of all the prefixes and the suffixes of the Arabic words. Unlike most of Arabic rule-based stemmers, it tries to predict the letters positions that may form the word root one by one using some rules based on the relations among the Arabic word letters and their placement in the word. This paper will focus on two parts of the approach. The first one deals with the rules that distinguish between the Arabic definite article “ال -AL” and the permanent component “ال -AL” that may found in any Arabic word. The second part of the approach adopts the segmentation of the word into three parts and classifies Arabic letters in to groups according to their positions in each segment. The proposed approach is a system composed of several modules that corporate together to extract the word root. The approach has been tested and evaluated using the Holy Quran words. The results of the evaluation show a promising root extraction algorithm.

Downloads

Download data is not yet available.

Author Biography

Fatma Abu Hawas, Yarmouk University

Instructor and lecturer in CS Department at Yarmouk University.

References

R. Garside.: Stemming Arabic text. Technical report, Computing Department, Lancaster University, 1999.

Mohanned Momani and Jamil Faraj: A novel algorithm to extract tri-literal Arabic roots. In IEEE/ACS International Conference on Computer Systems and Applications (AICCSA), pages 309-315. IEEE, May 2007.

Riyad Al Shalabi. Pattern-based stemmer for finding Arabic roots. Information Technology Journal, 4(1): 38-43,2005.

Imad A. AI-Sughaiyer and Ibrahim A. Al-Kharashi.: Arabic morphological analysis techniques: A comprehensive survey. Journal of the American Society for Information Science and Technology, 55(3): 189213, 2004.

Savoy, J.: Stemming of French words based on grammatical categories, Journal of the American Society for Information Science, 44(1) (1993), 1-9.

Savoy, J.: A Stemming Procedure and Stop word List for General French Corpora, Journal of the American Society for Information Science, 50(10) (1999), 944-952.

Duwairi, R.: Machine Learning for Arabic Text Categorization, Journal of the American Society for Information Science and Technology (JASIST),vol.57,no.8, pp. 1005-1010,2005.

http://en.wikipedia.org/wiki/Arabic_language.

Daniel Jurafsky and James H. Martin.: Speech and Language Processing: An introduction to Speech Recognition, Natural Language Processing, and Computational Linguistics, and Speech Recognition. Prentice-Hall, 2007.

Abdelhadi Soudi, Antal Van Den Bosch, and Gunter Neumann: Arabic morphological generation and its impact on the quality of machine translation to Arabic. Arabic Computational Morphology; Text, Speech and Language Technology, 38: 287-302, 2007.

Krovetz, R.: Viewing morphology as an inference process. In Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 191-202, 1993.

Chris D. Paice.: Another stemmer. SIGIR Forum, 24(3): 56-61, 1990.

Martin F. Porter.: An algorithm for suffix stripping. Program, 14(3): 130-137, 1980.

Towards a new Approach for Arabic root extraction:Exploit relations between the word letters and their placement in the word for Arabic root extraction

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biography

References

Downloads

Published

Issue

Section

How to Cite

Latest publications

Information

Make a Submission