Skip to content

Latest commit

 

History

History
243 lines (152 loc) · 13.8 KB

TODO.md

File metadata and controls

243 lines (152 loc) · 13.8 KB

NLP Toolkit for Arabic Language

  • Detect Dialect (e.g., 'انا هاخد ده لو سمحت' -> ['Egypt', 0.984]).

  • Detect Emotion (e.g., 'الله عليكي و انتي دائما مفرحانا' -> ['happy', 0.879]).

  • Detect Gender (e.g., 'الله عليكي و انتي دائما مفرحانا' -> ['female', 0.926]).

  • Name-Entity Recognition: Recognise whether a word represents a person, location or names in the text.

  • Text Classification / Topic Categories: Classifying text based on the criteria (e.g., Toxic-words and Hate-speech filtering).

To-Do List

  • Develop an Arabic version of the PHP similar_text function to handle Harakat issue properly.

  • ALA-LC Arabic Romanization and test it via http://romanize-arabic.camel-lab.com/.

  • Setup PHP in GitHub Actions for CI/CD (e.g. php-actions/phpunit).

  • Insure coding standards in Documentation (PSR-5).

  • Improve error handling by using exceptions as @atmonshi suggested in this pull request, update related phpdoc by adding @throws tag. We may extend the general exception class like this: class ArphpException extends Exception { }, then we can throw an exception like this: throw new ArphpException('Customized error message');

  • Enhance example scripts by call the following methods: arSummaryLoadExtra, setQueryArrFields, swapAf, arabizi, dms2dd, dd2dms, dd2olc, olc2dd.

  • Use degree modifiers to alter sentiment intensity (e.g., intensity boosters such as "very" and intensity dampeners such as "kind of").

Performance Improvement Tips (lessons learned)

  • json_decode parser is faster than SimpleXML since JSON is only a description of nested string sequences, without the need to offer a DOM interface and attributes parsing.

  • If you use array_push() to add one element to the array it's better to use $array[] = because in that way there is no overhead of calling a function.

  • Set internal character encoding before call any MBstring functions is much faster than pass encoding parameter if you are using PHP version < 7.3! (bug report)

  • Replace foreach loop by array functions (map, filter, walk, etc) whenever possible.

  • Writing $row[’id’] processes 7 times faster than $row[id] ;-)

  • While str_replace is faster than preg_replace, the strtr function is four times faster than str_replace.

Good Resources

Logistics

Git and GitHub

Download/install Git from git-scm.com, then inside your project folder, right click, Git Bash here.

We start using GitHub Desktop to make interactions with Git and GitHub easier and more productive. We made this decision because of the token authentication requirements for Git operations announced by GitHub.com in July 2020 (Please note that GitHub.com will no longer accept account passwords beginning of August 13, 2021).

Import a new project repository hosted on GitHub.com (e.g. owner/reposatory):

git init
git config --global user.name "Your Name"
git config --global user.email "email@example.com"

git remote add origin https://github.com/owner/reposatory
git pull origin master

Create and push a new commit:

git add .
git commit -m "modification message"
git pull origin master
git push origin master

You can include #xxx in your commit message to reference/link it to the issue number on GitHub.

Composer and Packagist

Composer: A Dependency Manager for PHP. Download and install the Composer-Setup.exe from here.

Packagist: The PHP Package Repository.

PHP Code Sniffer

Check for standards and compatibility using PHP Code Sniffer.

composer global require squizlabs/php_codesniffer --dev

phpcs Arabic.php --standard=PSR1
phpcs Arabic.php --standard=PSR12

Note: You can use the phpcbf command to automatically correct coding standard violations when possible.

phpcbf Arabic.php --standard=PSR1
phpcbf Arabic.php --standard=PSR12

Get PHP Compatibility Coding Standard for PHP CodeSniffer by download the latest release from here, then unzip it into an arbitrary directory (e.g. inside c:\XAMPP).

phpcs --config-set installed_paths C:\xampp\PHPCompatibility

phpcs -p Arabic.php --standard=PHPCompatibility --runtime-set testVersion 5.6-

PHPUnit Testing Framework

PHPUnit is a programmer-oriented testing framework for PHP. It is an instance of the xUnit architecture for unit testing frameworks.

Simply download the PHAR distribution of PHPUnit 9 from here, then copy it inside the root directory of the library.

The following command line will execute all the automated tests:

php phpunit.phar --bootstrap ./src/Arabic.php --testdox tests

Xdebug: Debugging tool for PHP

Xdebug is an extension for PHP to assist with debugging and development. It contains a profiler and provides code coverage functionality for use with PHPUnit. Follow these instructions to get Xdebug installed.

The following command line will telling PHPUnit to include the code coverage report (more info):

php phpunit.phar --bootstrap ./src/Arabic.php --testdox tests --coverage-filter ./src/Arabic.php --coverage-html coverage

Setup the Xdebug profiler by add the following lines in the php.ini file:

xdebug.profiler_enable_trigger = 1
xdebug.profiler_output_dir = \path\to\save\profiles
xdebug.profiler_output_name = callgrind.out.%u.%H_%R

You can then selectively enable the profiler by adding "XDEBUG_PROFILE=1" to the example URL, for example:

http://localhost/ar-php/examples/strtotime.php?XDEBUG_PROFILE=1

After a profile information file has been generated you can open it with the KCacheGrind tool for Linux users or QCacheGrind for Windows users.

Note: You can find some nice video tutorials by Derick Rethans the (Xdebug author) available here (Xdebug 3 Profiling and Analysing Xdebug 3 Profiling Data)

PHPStan Static Analysis Tool

PHPStan is a PHP static analysis tool that finds bugs in your code without writing tests. It catches whole classes of bugs even before you write tests for the code. You can download the latest PHAR distribution of PHPStan from here, then copy it inside the root directory of the library.

To let PHPStan analyse your codebase, you have to use the analyse command and point it to the right directories. For example, you can run the following command line that checks scripts inside the src directory up to rule level 6 (more about rule levels):

php phpstan.phar analyse -l 6 src

Extra Utilities for Code Reviews

  • Insphpect is an automated code review tool which identifies inflexibilities in PHP code and helps you write better software.

PHP Archive (phar)

The phar extension provides a way to put entire PHP applications into a single file called a "phar" (PHP Archive) for easy distribution and installation.

In order to create and modify Phar files, the php.ini setting phar.readonly must be set to Off, then we can create the "ArPHP.phar" file using the following code:

$p = new Phar('ArPHP.phar', 0, 'ArPHP.phar');

$p->startBuffering();

$p->buildFromDirectory('\path\to\ArPHP\src');

$p->stopBuffering();

Finally, you can include this library into your script like this:

require 'phar://path/to/ArPHP.phar/Arabic.php';

$obj = new \ArPHP\I18N\Arabic();

echo $obj->version;
echo $obj->int2str(1975);

Simple PHP Minifier

Strip comments, whitespaces, and preserve newlines. Compressed library file is ideal for production environments since it typically reduce the size of the file by ~50%.

You can use the following sed (Linux stream editor) command to create a minified version of arabic.php main script:

sed "/^\s*\*/d" Arabic.php | sed "/^\s*\/\//d" | sed "/^\s*\/\*/d" | sed "/^\s*$/d" | sed -e "s/\s*=\s*/=/g" | sed -e "s/^\s*//g" > Arabic.min.php

phpDocumentor

phpDocumentor analyzes your code to create great documentation. Install it as a PHAR file format, all you need to do is download the phar binary from here, then save it in an arbitrary directory (e.g. inside c:\XAMPP).

php C:\xampp\phpDocumentor.phar -f Arabic.php -t ../docs/ --visibility="public" --title="Ar-PHP"

Benchmarking Tool

ab is a tool for benchmarking your Apache Hyper-Text Transfer Protocol (HTTP) server. This especially shows you how many requests per second your script on current Apache installation is capable of serving.

The following command line shows an example call of 1000 requests for numbers test code (50 requests in concurrency) and report related stats:

\path\to\apache\bin\ab -n 1000 -c 50 http://localhost/ar-php/examples/numbers.php

Test Against QA Releases (e.g., PHP 8.0 RC2)

  • Get the binary build of PHP (e.g. for Windows: https://windows.php.net/qa), then un-zip it in the directory of your choice.

  • Rename the "php.ini-development" file to be "php.ini", and then edit it to enable the "mbstring" extension:

extension=./ext/php_mbstring.dll
  • Open your shell (e.g. CMD or PowerShell), change the directory to be inside your unzipped PHP folder, then start the PHP built-in web server (the -t parameter to specify the document root directory):
php -S localhost:8000 -t C:\xampp\htdocs\

Note: If you get an error message tells you that VC run time is not compatible with this PHP build, then make sure to install the required version of the Microsoft Visual C++ Redistributable package (e.g. for PHP 8.0 you need the Visual Studio 2019 package which can be downloaded from here).

Carbon: Sharing code examples

Carbon website/service can be used to create and share beautiful images of your source code. Start typing or drop a file into the text area to get started.

Also you can use the NoPaste snippet. It is an open-source website similar to Pastebin where you can store any piece of code, and generate links for easy sharing.

A Browser-Based VSCode Project Viewer

There are two options:

  1. GitHub is making Codespaces available to Team and Enterprise Cloud plans on github.com. Codespaces provides software teams a faster, more collaborative development environment in the cloud. Have you tried github.dev yet (i.e., change the GitHub URL from ".com" to ".dev")? Just press the dot key ;-)

  2. Github1s explore GitHub source code right on the "web" version of VSCode simply by adding 1s after GitHub in the URL, for example: https://github1s.com/khaled-alshamaa/ar-php/blob/HEAD/src/Arabic.php