Arabic
[ class tree: Arabic ] [ index: Arabic ] [ all elements ]

Class: ArAutoSummarize

Source Location: /sub/ArAutoSummarize.class.php

Class Overview


This PHP class do automatic keyphrase extraction to provide a quick mini-summary for a long Arabic document


Author(s):

Copyright:

  • 2006-2010 Khaled Al-Shamaa

Variables

Methods



Class Details

[line 146]
This PHP class do automatic keyphrase extraction to provide a quick mini-summary for a long Arabic document



Tags:

author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
copyright:  2006-2010 Khaled Al-Shamaa
link:  http://www.ar-php.org
license:  LGPL


[ Top ]


Class Variables

$cleanCommonInput =  'windows-1256'

[line 277]

"cleanCommon" method input charset



Tags:

access:  public

Type:   String


[ Top ]

$cleanCommonOutput =  'windows-1256'

[line 271]

"cleanCommon" method output charset



Tags:

access:  public

Type:   String


[ Top ]

$cleanCommonVars = array('str')

[line 283]

Name of the textual "cleanCommon" method parameters



Tags:

access:  public

Type:   Array


[ Top ]

$commonChars = array('É','å','í','ä','æ','Ê','á','Ç','Ó','ã', 
                                   'e', 't', 'a', 'o', 'i', 'n', 's')

[line 151]



Tags:

access:  protected

Type:   mixed


[ Top ]

$commonWords = array()

[line 156]



Tags:

access:  protected

Type:   mixed


[ Top ]

$doRateSummarizeInput =  'windows-1256'

[line 205]

"doRateSummarize" method input charset



Tags:

access:  public

Type:   String


[ Top ]

$doRateSummarizeOutput =  'windows-1256'

[line 199]

"doRateSummarize" method output charset



Tags:

access:  public

Type:   String


[ Top ]

$doRateSummarizeVars = array('str', 'keywords')

[line 211]

Name of the textual "doRateSummarize" method parameters



Tags:

access:  public

Type:   Array


[ Top ]

$doSummarizeInput =  'windows-1256'

[line 187]

"doSummarize" method input charset



Tags:

access:  public

Type:   String


[ Top ]

$doSummarizeOutput =  'windows-1256'

[line 181]

"doSummarize" method output charset



Tags:

access:  public

Type:   String


[ Top ]

$doSummarizeVars = array('str', 'keywords')

[line 193]

Name of the textual "doSummarize" method parameters



Tags:

access:  public

Type:   Array


[ Top ]

$getMetaKeywordsInput =  'windows-1256'

[line 259]

"getMetaKeywords" method input charset



Tags:

access:  public

Type:   String


[ Top ]

$getMetaKeywordsOutput =  'windows-1256'

[line 253]

"getMetaKeywords" method output charset



Tags:

access:  public

Type:   String


[ Top ]

$getMetaKeywordsVars = array('str')

[line 265]

Name of the textual "getMetaKeywords" method parameters



Tags:

access:  public

Type:   Array


[ Top ]

$highlightRateSummaryInput =  'windows-1256'

[line 241]

"highlightRateSummary" method input charset



Tags:

access:  public

Type:   String


[ Top ]

$highlightRateSummaryOutput =  'windows-1256'

[line 235]

"highlightRateSummary" method output charset



Tags:

access:  public

Type:   String


[ Top ]

$highlightRateSummaryVars = array('str', 'keywords')

[line 247]

Name of the textual "highlightRateSummary" method parameters



Tags:

access:  public

Type:   Array


[ Top ]

$highlightSummaryInput =  'windows-1256'

[line 223]

"highlightSummary" method input charset



Tags:

access:  public

Type:   String


[ Top ]

$highlightSummaryOutput =  'windows-1256'

[line 217]

"highlightSummary" method output charset



Tags:

access:  public

Type:   String


[ Top ]

$highlightSummaryVars = array('str', 'keywords')

[line 229]

Name of the textual "highlightSummary" method parameters



Tags:

access:  public

Type:   Array


[ Top ]

$importantWords = array()

[line 157]



Tags:

access:  protected

Type:   mixed


[ Top ]

$normalizeAlef = array('Ã','Å','Â')

[line 148]



Tags:

access:  protected

Type:   mixed


[ Top ]

$normalizeDiacritics = array('ó','ð','õ','ñ','ö','ò','ú','ø')

[line 149]



Tags:

access:  protected

Type:   mixed


[ Top ]

$separators = array('.',"\n",'¡','º','(','[','{',')',']','}',',',';')

[line 154]



Tags:

access:  protected

Type:   mixed


[ Top ]

$summarizeInput =  'windows-1256'

[line 169]

"summarize" method input charset



Tags:

access:  public

Type:   String


[ Top ]

$summarizeOutput =  'windows-1256'

[line 163]

"summarize" method output charset



Tags:

access:  public

Type:   String


[ Top ]

$summarizeVars = array('str', 'keywords')

[line 175]

Name of the textual "summarize" method parameters



Tags:

access:  public

Type:   Array


[ Top ]



Class Methods


constructor __construct [line 288]

ArAutoSummarize __construct( )

Loads initialize values



Tags:

access:  public


[ Top ]

method acceptedWord [line 753]

boolean acceptedWord( string $word)

Check some conditions to know if a given string is a formal valid word or not



Tags:

return:  True if passed string is accepted as a valid word else it will return False
author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
access:  protected


Parameters:

string   $word   String to be checked if it is a valid word or not

[ Top ]

method cleanCommon [line 576]

string cleanCommon( string $str)

Extracting common Arabic words (roughly) from input Arabic string (document content)



Tags:

return:  Arabic document as a string free of common words (roughly)
author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
access:  public


Parameters:

string   $str   Input normalized Arabic document as a string

[ Top ]

method doNormalize [line 556]

string doNormalize( string $str)

Normalized Arabic document



Tags:

return:  Normalized Arabic document
author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
access:  protected


Parameters:

string   $str   Input Arabic document as a string

[ Top ]

method doRateSummarize [line 428]

string doRateSummarize( string $str, integer $rate, string $keywords)

Summarize percentage of the input Arabic string (document content) into output



Tags:

return:  Output summary requested
author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
access:  public


Parameters:

string   $str   Input Arabic document as a string
integer   $rate   Rate of output summary sentence number as percentage of the input Arabic string (document content)
string   $keywords   List of keywords higlited by search process

[ Top ]

method doSummarize [line 408]

string doSummarize( string $str, integer $int, string $keywords)

Summarize input Arabic string (document content) into specific number of sentences in the output



Tags:

return:  Output summary requested
author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
access:  public


Parameters:

string   $str   Input Arabic document as a string
integer   $int   Number of sentences required in output summary
string   $keywords   List of keywords higlited by search process

[ Top ]

method draftStem [line 593]

string draftStem( string $str)

Remove less significant Arabic letter from given string (document content).

Please note that output will not be human readable.




Tags:

return:  Output string after removing less significant Arabic letter (not human readable output)
author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
access:  protected


Parameters:

string   $str   Input Arabic document as a string

[ Top ]

method getMetaKeywords [line 491]

string getMetaKeywords( string $str, integer $int)

Extract keywords from a given Arabic string (document content)



Tags:

return:  List of the keywords extracting from input Arabic string (document content)
author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
access:  public


Parameters:

string   $str   Input Arabic document as a string
integer   $int   Number of keywords required to be extracting from input string (document content)

[ Top ]

method highlightRateSummary [line 472]

string highlightRateSummary( string $str, integer $rate, string $keywords, string $style)

Highlight key sentences (summary) as percentage of the input string (document content) using CSS and send the result back as an output.



Tags:

return:  Output highlighted key sentences summary (using CSS)
author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
access:  public


Parameters:

string   $str   Input Arabic document as a string
integer   $rate   Rate of highlighted key sentences summary number as percentage of the input Arabic string (document content)
string   $keywords   List of keywords higlited by search process
string   $style   Name of the CSS class you would like to apply

[ Top ]

method highlightSummary [line 450]

string highlightSummary( string $str, integer $int, string $keywords, string $style)

Highlight key sentences (summary) of the input string (document content) using CSS and send the result back as an output



Tags:

return:  Output highlighted key sentences summary (using CSS)
author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
access:  public


Parameters:

string   $str   Input Arabic document as a string
integer   $int   Number of key sentences required to be highlighted in the input string (document content)
string   $keywords   List of keywords higlited by search process
string   $style   Name of the CSS class you would like to apply

[ Top ]

method loadExtra [line 311]

void loadExtra( )

Load enhanced Arabic stop words list



Tags:

access:  public


[ Top ]

method minAcceptedRank [line 731]

integer minAcceptedRank( array $arr, integer $int)

Calculate minimum rank for sentences which will be including in the summary



Tags:

return:  Minimum accepted sentence rank (sentences with rank more than this will be listed in the document summary)
author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
access:  protected


Parameters:

array   $arr   Sentences ranks
integer   $int   Number of sentences you need to include in your summary

[ Top ]

method rankSentences [line 652]

array rankSentences( array $sentences, array $stemmedSentences, array $arr)

Ranks sentences in a given Arabic string (document content).



Tags:

return:  Two dimension array, first item is an array of document sentences, second item is an array of ranks of document sentences.
author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
access:  protected


Parameters:

array   $sentences   Sentences of the input Arabic document as an array
array   $stemmedSentences   Stemmed sentences of the input Arabic document as an array
array   $arr   Words ranks array (word as an index and value refer to the word frequency)

[ Top ]

method rankWords [line 609]

hash rankWords( string $str)

Ranks words in a given Arabic string (document content). That rank refers to the frequency of that word appears in that given document.



Tags:

return:  Associated array where document words referred by index and those words ranks referred by values of those array items.
author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
access:  protected


Parameters:

string   $str   Input Arabic document as a string

[ Top ]

method summarize [line 332]

string summarize( string $str, string $keywords, integer $int, string $mode, string $output, [string $style = null])

Core summarize function that implement required steps in the algorithm



Tags:

return:  Output summary requested
author:  Khaled Al-Shamaa <khaled.alshamaa@gmail.com>
access:  protected


Parameters:

string   $str   Input Arabic document as a string
string   $keywords   List of keywords higlited by search process
integer   $int   Sentences value (see $mode effect also)
string   $mode   Mode of sentences count [number|rate]
string   $output   Output mode [summary|highlight]
string   $style   Name of the CSS class you would like to apply

[ Top ]


Documentation generated on Sat, 14 Aug 2010 13:23:52 -0700 by phpDocumentor 1.4.0