[ class tree: I18N_Arabic ] [ index: I18N_Arabic ] [ all elements ]

Procedural File: Stemmer.php

Source Location: /Arabic/Stemmer.php


This PHP class get stem of an Arabic word

Page Details:


Copyright (c) 2006-2013 Khaled Al-Sham'aa.


PHP Version 5



This program is open source product; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License (LGPL) as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with this program. If not, see <http://www.gnu.org/licenses/lgpl.txt>.


Class Name: Arabic Text ArStemmer Class

Filename: Stemmer.php

Original Author(s): Khaled Al-Sham'aa <khaled@ar-php.org>

Purpose: Get stem of an Arabic word


Source: http://arabtechies.net/node/83 By: Taha Zerrouki <taha.zerrouki@gmail.com>


Arabic Word Stemmer Class

PHP class to get stem of an Arabic word

A stemmer is an automatic process in which morphological variants of terms are mapped to a single representative string called a stem. Arabic belongs to the Semitic family of languages which also includes Hebrew and Aramaic. Since morphological change in Arabic results from the addition of prefixes and infixes as well as suffixes, simple removal of suffixes is not as effective for Arabic as it is for English.

Arabic has much richer morphology than English. Arabic has two genders, feminine and masculine; three numbers, singular, dual, and plural; and three grammatical cases, nominative, genitive, and accusative. A noun has the nominative case when it is a subject; accusative when it is the object of a verb; and genitive when it is the object of a preposition. The form of an Arabic noun is determined by its gender, number, and grammatical case. The definitive nouns are formed by attaching the Arabic article "AL" to the immediate front of the nouns. Besides prefixes, a noun can also carry a suffix which is often a possessive pronoun. In Arabic, the conjunction word "WA" (and) is often attached to the following word.

Like nouns, an Arabic adjective can also have many variants. When an adjective modifies a noun in a noun phrase, the adjective agrees with the noun in gender, number, case, and definiteness. Arabic verbs have two tenses: perfect and imperfect. Perfect tense denotes actions completed, while imperfect denotes uncompleted actions. The imperfect tense has four mood: indicative, subjective, jussive, and imperative. Arabic verbs in perfect tense consist of a stem and a subject marker. The subject marker indicates the person, gender, and number of the subject. The form of a verb in perfect tense can have subject marker and pronoun suffix. The form of a subject-marker is determined together by the person, gender, and number of the subject. Example:

  1.      include('./I18N/Arabic.php');
  2.      $obj new I18N_Arabic('Stemmer');
  4.      echo $obj->stem($word);


author:  Khaled Al-Sham'aa <khaled@ar-php.org>
copyright:  2006-2013 Khaled Al-Sham'aa
link:  http://www.ar-php.org
filesource:  Source Code for this file
license:  LGPL

Documentation generated on Mon, 14 Jan 2013 17:49:07 +0100 by phpDocumentor 1.4.0