I18N_Arabic
[ class tree: I18N_Arabic ] [ index: I18N_Arabic ] [ all elements ]

Class: I18N_Arabic_Normalise

Source Location: /Arabic/Normalise.php

Class Overview


This class provides various functions to manipulate arabic text and normalise it by applying filters, for example, to strip tatweel and tashkeel, to normalise hamza and lamalephs, and to unshape a joined Arabic text back into its normalised form.


Author(s):

Copyright:

  • 2006-2016 Khaled Al-Sham'aa

Methods



Class Details

[line 93]
This class provides various functions to manipulate arabic text and normalise it by applying filters, for example, to strip tatweel and tashkeel, to normalise hamza and lamalephs, and to unshape a joined Arabic text back into its normalised form.

The functions are helpful for searching, indexing and similar functions.




Tags:

author:  Djihed Afifi <djihed@gmail.com>
copyright:  2006-2016 Khaled Al-Sham'aa
link:  http://www.ar-php.org
license:  LGPL


[ Top ]


Class Methods


method charName [line 616]

string charName( string $archar)

Return Arabic letter name in arabic.



Tags:

return:  Arabic letter name in arabic
author:  Khaled Al-Sham'aa <khaled@ar-php.org>
access:  public


Parameters:

string   $archar   Arabic unicode char

[ Top ]

method isAlef [line 448]

boolean isAlef( string $archar)

Checks for Arabic Alef forms (i.e. ALEF, ALEF MADDA, ALEF HAMZA ABOVE, ALEF HAMZA BELOW,ALEF WASLA, ALEF MAKSURA).



Tags:

return:  True if it is Arabic Alef form
author:  Khaled Al-Sham'aa <khaled@ar-php.org>
access:  public


Parameters:

string   $archar   Arabic unicode char

[ Top ]

method isHamza [line 426]

boolean isHamza( string $archar)

Checks for Arabic Hamza forms (i.e. HAMZA, WAW HAMZA, YEH HAMZA, HAMZA ABOVE, HAMZA BELOW, ALEF HAMZA BELOW, ALEF HAMZA ABOVE).



Tags:

return:  True if it is Arabic Hamza form
author:  Khaled Al-Sham'aa <khaled@ar-php.org>
access:  public


Parameters:

string   $archar   Arabic unicode char

[ Top ]

method isHaraka [line 340]

boolean isHaraka( string $archar)

Checks for Arabic Harakat marks (i.e. FATHA, DAMMA, KASRA, SUKUN, TANWIN).



Tags:

return:  True if it is Arabic Harakat mark
author:  Khaled Al-Sham'aa <khaled@ar-php.org>
access:  public


Parameters:

string   $archar   Arabic unicode char

[ Top ]

method isLigature [line 404]

boolean isLigature( string $archar)

Checks for Arabic Ligatures like LamAlef (i.e. LAM ALEF, LAM ALEF HAMZA ABOVE, LAM ALEF HAMZA BELOW, LAM ALEF MADDA ABOVE).



Tags:

return:  True if it is Arabic Ligatures like LamAlef
author:  Khaled Al-Sham'aa <khaled@ar-php.org>
access:  public


Parameters:

string   $archar   Arabic unicode char

[ Top ]

method isMoon [line 574]

boolean isMoon( string $archar)

Checks for Arabic Moon letters.



Tags:

return:  True if it is Arabic Moon letter
author:  Khaled Al-Sham'aa <khaled@ar-php.org>
access:  public


Parameters:

string   $archar   Arabic unicode char

[ Top ]

method isShortharaka [line 361]

boolean isShortharaka( string $archar)

Checks for Arabic short Harakat marks (i.e. FATHA, DAMMA, KASRA, SUKUN).



Tags:

return:  True if it is Arabic short Harakat mark
author:  Khaled Al-Sham'aa <khaled@ar-php.org>
access:  public


Parameters:

string   $archar   Arabic unicode char

[ Top ]

method isSmall [line 553]

boolean isSmall( string $archar)

Checks for Arabic Small letters (i.e. SMALL ALEF, SMALL WAW, SMALL YEH).



Tags:

return:  True if it is Arabic Small letter
author:  Khaled Al-Sham'aa <khaled@ar-php.org>
access:  public


Parameters:

string   $archar   Arabic unicode char

[ Top ]

method isSun [line 595]

boolean isSun( string $archar)

Checks for Arabic Sun letters.



Tags:

return:  True if it is Arabic Sun letter
author:  Khaled Al-Sham'aa <khaled@ar-php.org>
access:  public


Parameters:

string   $archar   Arabic unicode char

[ Top ]

method isTanwin [line 382]

boolean isTanwin( string $archar)

Checks for Arabic Tanwin marks (i.e. FATHATAN, DAMMATAN, KASRATAN).



Tags:

return:  True if it is Arabic Tanwin mark
author:  Khaled Al-Sham'aa <khaled@ar-php.org>
access:  public


Parameters:

string   $archar   Arabic unicode char

[ Top ]

method isTashkeel [line 319]

boolean isTashkeel( string $archar)

Checks for Arabic Tashkeel marks (i.e. FATHA, DAMMA, KASRA, SUKUN, SHADDA, FATHATAN, DAMMATAN, KASRATAN).



Tags:

return:  True if it is Arabic Tashkeel mark
author:  Khaled Al-Sham'aa <khaled@ar-php.org>
access:  public


Parameters:

string   $archar   Arabic unicode char

[ Top ]

method isTehlike [line 532]

boolean isTehlike( string $archar)

Checks for Arabic Teh forms (i.e. TEH, TEH MARBUTA).



Tags:

return:  True if it is Arabic Teh form
author:  Khaled Al-Sham'aa <khaled@ar-php.org>
access:  public


Parameters:

string   $archar   Arabic unicode char

[ Top ]

method isWawlike [line 511]

boolean isWawlike( string $archar)

Checks for Arabic Waw like forms (i.e. WAW, WAW HAMZA, SMALL WAW).



Tags:

return:  True if it is Arabic Waw like form
author:  Khaled Al-Sham'aa <khaled@ar-php.org>
access:  public


Parameters:

string   $archar   Arabic unicode char

[ Top ]

method isWeak [line 469]

boolean isWeak( string $archar)

Checks for Arabic Weak letters (i.e. ALEF, WAW, YEH, ALEF_MAKSURA).



Tags:

return:  True if it is Arabic Weak letter
author:  Khaled Al-Sham'aa <khaled@ar-php.org>
access:  public


Parameters:

string   $archar   Arabic unicode char

[ Top ]

method isYehlike [line 490]

boolean isYehlike( string $archar)

Checks for Arabic Yeh forms (i.e. YEH, YEH HAMZA, SMALL YEH, ALEF MAKSURA).



Tags:

return:  True if it is Arabic Yeh form
author:  Khaled Al-Sham'aa <khaled@ar-php.org>
access:  public


Parameters:

string   $archar   Arabic unicode char

[ Top ]

method normalise [line 241]

string normalise( string $text)

Takes a string, it applies the various filters in this class to return a unicode normalised string suitable for activities such as searching, indexing, etc.



Tags:

return:  the result normalised string.
author:  Djihed Afifi <djihed@gmail.com>
access:  public


Parameters:

string   $text   the text to be normalised.

[ Top ]

method normaliseHamza [line 165]

string normaliseHamza( string $text)

Normalise all Hamza characters to their corresponding aleph character in an Arabic text.



Tags:

return:  the normalised text.
author:  Djihed Afifi <djihed@gmail.com>
access:  public


Parameters:

string   $text   The text to be normalised.

[ Top ]

method normaliseLamaleph [line 193]

string normaliseLamaleph( string $text)

Unicode uses some special characters where the lamaleph and any hamza above them are combined into one code point. Some input system use them. This function expands these characters.



Tags:

return:  the normalised text.
author:  Djihed Afifi <djihed@gmail.com>
access:  public


Parameters:

string   $text   The text to be normalised.

[ Top ]

method stripTashkeel [line 141]

string stripTashkeel( string $text)

Strip all tashkeel characters from an Arabic text.



Tags:

return:  the stripped text.
author:  Djihed Afifi <djihed@gmail.com>
access:  public


Parameters:

string   $text   The text to be stripped.

[ Top ]

method stripTatweel [line 128]

string stripTatweel( string $text)

Strip all tatweel characters from an Arabic text.



Tags:

return:  the stripped text.
author:  Djihed Afifi <djihed@gmail.com>
access:  public


Parameters:

string   $text   The text to be stripped.

[ Top ]

method unichr [line 226]

string unichr( char $u)

Return unicode char by its code point.



Tags:

return:  the result character.
author:  Djihed Afifi <djihed@gmail.com>
access:  public


Parameters:

char   $u   code point

[ Top ]

method unshape [line 271]

string unshape( string $text)

Takes Arabic text in its joined form, it untangles the characters and unshapes them.

This can be used to process text that was processed through OCR or by extracting text from a PDF document.

Note that the result text may need further processing. In most cases, you will want to use the utf8Strrev function from this class to reverse the string.

Most of the work of setting up the characters for this function is done through the ArUnicode.constants.php constants and the constructor loading.




Tags:

return:  the result normalised string.
author:  Djihed Afifi <djihed@gmail.com>
access:  public


Parameters:

string   $text   the text to be unshaped.

[ Top ]

method utf8Strrev [line 284]

string utf8Strrev( string $str, [boolean $reverse_numbers = false])

Take a UTF8 string and reverse it.



Tags:

return:  The reversed string.
access:  public


Parameters:

string   $str   the string to be reversed.
boolean   $reverse_numbers   whether to reverse numbers.

[ Top ]


Documentation generated on Fri, 01 Jan 2016 10:26:11 +0200 by phpDocumentor 1.4.0