[ class tree: I18N_Arabic ] [ index: I18N_Arabic ] [ all elements ]

Procedural File: WordTag.php

Source Location: /Arabic/WordTag.php


This PHP class to tagging Arabic Word

Page Details:


Copyright (c) 2006-2013 Khaled Al-Sham'aa.


PHP Version 5



This program is open source product; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License (LGPL) as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with this program. If not, see <http://www.gnu.org/licenses/lgpl.txt>.


Class Name: Tagging Arabic Word Class

Filename: WordTag.php

Original Author(s): Khaled Al-Sham'aa <khaled@ar-php.org>

Purpose: Arabic grammarians describe Arabic as being derived from three main categories: noun, verb and particle. This class built to recognize the class of a given Arabic word.


Tagging Arabic Word

This PHP Class can identifying names, places, dates, and other noun words and phrases in Arabic language that establish the meaning of a body of text.

This process of identifying names, places, dates, and other noun words and phrases that establish the meaning of a body of text-is critical to software systems that process large amounts of unstructured data coming from sources such as email, document files, and the Web.

Arabic words are classifies into three main classes, namely, verb, noun and particle. Verbs are sub classified into three subclasses (Past verbs, Present Verbs, etc.); nouns into forty six subclasses (e.g. Active participle, Passive participle, Exaggeration pattern, Adjectival noun, Adverbial noun, Infinitive noun, Common noun, Pronoun, Quantifier, etc.) and particles into twenty three subclasses (e.g. additional, resumption, Indefinite, Conditional, Conformational, Prohibition, Imperative, Optative, Reasonal, Dubious, etc.), and from these three main classes that the rest of the language is derived.

The most important aspect of this system of describing Arabic is that all the subclasses of these three main classes inherit properties from the parent classes.

Arabic is very rich in categorising words, and contains classes for almost every form of word imaginable. For example, there are classes for nouns of instruments, nouns of place and time, nouns of activity and so on. If we tried to use all the subclasses described by Arabic grammarians, the size of the tagset would soon reach more than two or three hundred tags. For this reason, we have chosen only the main classes. But because of the way all the classes inherit from others, it would be quite simple to extend this tagset to include more subclasses.


  1.      include('./I18N/Arabic.php');
  2.      $obj new I18N_Arabic('WordTag');
  4.      $hStr=$obj->highlightText($str,'#80B020');
  6.      echo $str '<hr />' $hStr '<hr />';
  8.      $taggedText $obj->tagText($str);
  10.      foreach($taggedText as $wordTag{
  11.          list($word$tag$wordTag;
  13.          if ($tag == 1{
  14.              echo "<font color=#DBEC21>$word is Noun</font>, ";
  15.          }
  17.          if ($tag == 0{
  18.              echo "$word is not Noun, ";
  19.          }
  20.      }


author:  Khaled Al-Sham'aa <khaled@ar-php.org>
copyright:  2006-2013 Khaled Al-Sham'aa
link:  http://www.ar-php.org
filesource:  Source Code for this file
license:  LGPL

Documentation generated on Mon, 14 Jan 2013 17:49:09 +0100 by phpDocumentor 1.4.0