I18N_Arabic
[ class tree: I18N_Arabic ] [ index: I18N_Arabic ] [ all elements ]

Procedural File: AutoSummarize.php

Source Location: /Arabic/AutoSummarize.php



Classes:

I18N_Arabic_AutoSummarize
This PHP class do automatic keyphrase extraction to provide a quick mini-summary for a long Arabic document


Page Details:

----------------------------------------------------------------------

Copyright (c) 2006-2016 Khaled Al-Sham'aa.

http://www.ar-php.org

PHP Version 5

----------------------------------------------------------------------

LICENSE

This program is open source product; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License (LGPL) as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with this program. If not, see <http://www.gnu.org/licenses/lgpl.txt>.

----------------------------------------------------------------------

Class Name: Arabic Auto Summarize Class

Filename: AutoSummarize.php

Original Author(s): Khaled Al-Sham'aa <khaled@ar-php.org>

Purpose: Automatic keyphrase extraction to provide a quick mini-summary for a long Arabic document.

----------------------------------------------------------------------

Arabic Auto Summarize

This class identifies the key points in an Arabic document for you to share with others or quickly scan. The class determines key points by analyzing an Arabic document and assigning a score to each sentence. Sentences that contain words used frequently in the document are given a higher score. You can then choose a percentage of the highest-scoring sentences to display in the summary. "ArAutoSummarize" class works best on well-structured documents such as reports, articles, and scientific papers.

"ArAutoSummarize" class cuts wordy copy to the bone by counting words and ranking sentences. First, "ArAutoSummarize" class identifies the most common words in the document and assigns a "score" to each word--the more frequently a word is used, the higher the score.

Then, it "averages" each sentence by adding the scores of its words and dividing the sum by the number of words in the sentence--the higher the average, the higher the rank of the sentence. "ArAutoSummarize" class can summarize texts to specific number of sentences or percentage of the original copy.

We use statistical approach, with some attention apparently paid to:

  • Location: leading sentences of paragraph, title, introduction, and conclusion.
  • Fixed phrases: in-text summaries.
  • Frequencies of words, phrases, proper names
  • Contextual material: query, title, headline, initial paragraph
The motivation for this class is the range of applications for key phrases:

  • Mini-summary: Automatic key phrase extraction can provide a quick mini-summary for a long document. For example, it could be a feature in a web sites; just click the summarize button when browsing a long web page.
  • Highlights: It can highlight key phrases in a long document, to facilitate skimming the document.
  • Author Assistance: Automatic key phrase extraction can help an author or editor who wants to supply a list of key phrases for a document. For example, the administrator of a web site might want to have a key phrase list at the top of each web page. The automatically extracted phrases can be a starting point for further manual refinement by the author or editor.
  • Text Compression: On a device with limited display capacity or limited bandwidth, key phrases can be a substitute for the full text. For example, an email message could be reduced to a set of key phrases for display on a pager; a web page could be reduced for display on a portable wireless web browser.
This list is not intended to be exhaustive, and there may be some overlap in the items.

Example:

  1.  include('./I18N/Arabic.php');
  2.  $obj new I18N_Arabic('AutoSummarize');
  3.  
  4.  $file 'Examples/Articles/Ajax.txt';
  5.  $r 20;
  6.  
  7.  // get contents of a file into a string
  8.  $fhandle fopen($file"r");
  9.  $c fread($fhandlefilesize($file));
  10.  fclose($fhandle);
  11.  
  12.  $k $obj->getMetaKeywords($c$r);
  13.  echo '<b><font color=#FFFF00>';
  14.  echo 'Keywords:</font></b>';
  15.  echo '<p dir="rtl" align="justify">';
  16.  echo $k '</p>';
  17.  
  18.  $s $obj->doRateSummarize($c$r);
  19.  echo '<b><font color=#FFFF00>';
  20.  echo 'Summary:</font></b>';
  21.  echo '<p dir="rtl" align="justify">';
  22.  echo $s '</p>';
  23.  
  24.  echo '<b><font color=#FFFF00>';
  25.  echo 'Full Text:</font></b>';
  26.  echo '<p><a class=ar_link target=_blank ';
  27.  echo 'href='.$file.'>Source File</a></p>';




Tags:

author:  Khaled Al-Sham'aa <khaled@ar-php.org>
copyright:  2006-2016 Khaled Al-Sham'aa
link:  http://www.ar-php.org
filesource:  Source Code for this file
license:  LGPL








Documentation generated on Fri, 01 Jan 2016 10:25:50 +0200 by phpDocumentor 1.4.0