Al-Kashi Project
This is one of the experimental products developed in the Ar-PHP project labs
Al-Kashi was one of the best mathematicians in the Islamic world

In French, the law of cosines is named Théorème d'Al-Kashi (Theorem of Al-Kashi), as al-Kashi was the first to provide an explicit statement of the law of cosines in a form suitable for triangulation. In one of his numerical approximations of pi, he correctly computed 2pi to 16 decimal places of accuracy. This was far more accurate than the estimates earlier given.

Al-Kashi

We aim in Al-Kashi project to provide a rich PHP package full of statistical functions useful for online business intelligent and data mining, possible applications may include an online log file analysis, Ad's and Campaign statistics, or survey/voting results on-fly analysis. It is published under GPL license; you can download it from PHPClasses.org website, and you can check the change log here.

Khaled Al-Sham'aa
E-Mail account

Would you like to know more about statistical concepts and procedures implemented in this project? Please download this free electronic book assembled from Wikipedia articles to get detailed background information.

لمزيد من المعلومات عن هذا المشروع باللغة العربية إحيلكم إلى هذه التدوينات

Example Data

The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models). You can download example data file from here.

    $sep = "\t"; $nl  = "\n";

    $content = file_get_contents('data.txt');

    $records = explode($nl, $content);
    $header  = explode($sep, trim(array_shift($records)));
    $data    = array_fill_keys($header, array());

    foreach ($records as $id=>$record) {
        $record = trim($record);
        if ($record == '') continue;
    
        $fields = explode($sep, $record);
        $titles = $header;
        
        foreach ($fields as $field) {
            $title = array_shift($titles);
            $data[$title][] = $field;
        }
    }

    $x = $data['wt'];
    $y = $data['mpg'];

    require('kashi.php');

    $kashi = new Kashi();
x = y =

Summary Statistics:

Mean (x)3.21725
Mean (x, "geometric")3.0701885671208
Mean (x, "harmonic")2.9182632148104
Median (x)3.325
Mode (x)Array ( [0] => 3.44 )
Variance (x)0.95737896774194
SD (x)0.9784574429897
%CV (x)30.412850819479
Skewness (x)0.46591610679299
Is it significant (i.e. test it against 0)?bool(false)
Kurtosis (x)0.41659466963493
Is it significant (i.e. test it against 0)?bool(false)

Rank (x)
9, 12, 7, 16, 18, 21, 23, 15, 13, 18, 18, 29, 25, 26, 30, 32, 31, 6, 2, 3, 8, 22, 17, 27, 28, 4, 5, 1, 14, 10, 23, 11

    // $x is an array of values
    echo 'Arithmetic Mean: ' . $kashi->mean($x) . '
'; echo 'Aeometric Mean: ' . $kashi->mean($x, "geometric") . '
'; echo 'Harmonic Mean: ' . $kashi->mean($x, "harmonic") . '
'; echo 'Mode: ' . print_r($kashi->mode($x)) . '
'; echo 'Median: ' . $kashi->median($x) . '
'; echo 'Variance: ' . $kashi->variance($x) . '
'; echo 'SD: ' . $kashi->sd($x) . '
'; echo '%CV: ' . $kashi->cv($x) . '
'; echo 'Skewness: ' . $kashi->skew($x) . '
'; echo 'Is it significant (i.e. test it against 0)? '; var_dump($kashi->isSkew($x)); echo 'Kurtosis: ' . $kashi->kurt($x) . '
'; echo 'Is it significant (i.e. test it against 0)? '; var_dump($kashi->isKurt($x)); echo 'Rank (x): '; echo implode(', ', $kashi->rank($x)) . '
';
Top

Statistical Graphics:

Boxplot
Array
(
    [min] => 1.513
    [q1] => 2.62
    [median] => 3.325
    [q3] => 3.73
    [max] => 5.282
    [outliers] => Array
        (
            [0] => 5.345
            [1] => 5.424
        )

)
Histogram
Array
(
    [1.513-2.002] => 4
    [2.002-2.491] => 4
    [2.491-2.98] => 4
    [2.98-3.469] => 9
    [3.469-3.957] => 7
    [3.957-4.446] => 1
    [4.446-4.935] => 0
    [4.935-5.424] => 3
)
Normal Q-Q Plotx = -0.62609901275838, -0.36012989155586, -0.83051087731871, -0.039176085543034, 0.27769043950814, 0.36012989155586, 0.62609901275838, -0.11776987461046, -0.27769043950814, 0.19709908415753, 0.11776987461046, 1.2298587580185, 0.72451438304624, 0.83051087731871, 1.417797139161, 2.1538746917937, 1.6759397215193, -0.94678175657479, -1.6759397215193, -1.417797139161, -0.72451438304624, 0.44509652516901, 0.039176085543034, 0.94678175657479, 1.0775155681381, -1.2298587580185, -1.0775155681381, -2.1538746917937, -0.19709908415753, -0.53340970683585, 0.53340970683585, -0.44509652516901

y = 2.62, 2.875, 2.32, 3.215, 3.44, 3.46, 3.57, 3.19, 3.15, 3.44, 3.44, 4.07, 3.73, 3.78, 5.25, 5.424, 5.345, 2.2, 1.615, 1.835, 2.465, 3.52, 3.435, 3.84, 3.845, 1.935, 2.14, 1.513, 3.17, 2.77, 3.57, 2.78
Ternary Plotx = 0.729, 0.722, 0.734, 0.706, 0.695, 0.675, 0.659, 0.723, 0.701, 0.692, 0.679, 0.663, 0.676, 0.654, 0.577, 0.574, 0.625, 0.779, 0.785, 0.788, 0.716, 0.667, 0.664, 0.645, 0.691, 0.763, 0.766, 0.796, 0.689, 0.723, 0.672, 0.718

y = 0.356, 0.36, 0.369, 0.382, 0.376, 0.419, 0.407, 0.364, 0.406, 0.387, 0.408, 0.398, 0.395, 0.422, 0.463, 0.459, 0.403, 0.312, 0.317, 0.31, 0.394, 0.407, 0.417, 0.41, 0.368, 0.34, 0.323, 0.3, 0.375, 0.354, 0.381, 0.377

    echo 'Boxplot: 
';
    print_r($kashi->boxplot($x));
    echo '

'; echo 'Histogram:
';
    print_r($kashi->hist($x, 8));
    echo '

'; echo 'Normal Q-Q Plot:
'; $qq = $kashi->qqnorm($x); echo 'x = ' . implode(', ', $qq['x']) . '
'; echo 'y = ' . implode(', ', $qq['y']) . '
'; echo 'Ternary Plot:
'; $xy = $kashi->ternary($data['wt'], $data['mpg'], $data['qsec']); echo 'x = ' . implode(', ', $xy['x']) . '
'; echo 'y = ' . implode(', ', $xy['y']) . '
';
Top

Correlation, Regression, and t-Test:

Covariance (x, y)-5.1166846774194
Correlation (x, y)-0.86765937651723
Significant of Correlation1.2939593840855E-10
Path Analysis
Array
(
    [1] => -0.70763801614376
    [2] => -0.20274707094052
    [3] => 0.15145821845688
)
Regression (y = a + b*x)
Array
(
    [intercept] => 37.285126167342
    [slope] => -5.3444715727227
    [r-square] => 0.75283279365826
    [adj-r-square] => 0.74459388678021
    [intercept-se] => 1.8776273372559
    [intercept-2.5%] => 33.450499570026
    [intercept-97.5%] => 41.119752764658
    [slope-se] => 0.55910104509932
    [slope-2.5%] => -6.486308238383
    [slope-97.5%] => -4.2026349070623
    [F-statistic] => 91.375325003762
    [p-value] => 1.2939604943085E-10
)
Multiple Regression (y = a + b1*x1 + b2*x2)
Array
(
    [intercept] => 37.227270116447
    [b1] => -3.8778307424046
    [b2] => -0.031772946982161
    [r-square] => 0.82678545188279
    [adj-r-square] => 0.81483962097816
    [intercept-se] => 0
    [intercept-2.5%] => 37.227270116447
    [intercept-97.5%] => 37.227270116447
    [b1-se] => 0
    [b1-2.5%] => -3.8778307424046
    [b1-97.5%] => -3.8778307424046
    [b2-se] => 0
    [b2-2.5%] => -0.031772946982161
    [b2-97.5%] => -0.031772946982161
    [F-statistic] => 69.211213391777
    [p-value] => 9.1090543852236E-12
)
t-Test unpaired-15.632569384303
Test of null hypothesis that mean of x = mean of y Probability is5.5511151231258E-16
t-Test paired-13.847209446072
Test of null hypothesis that mean of x-y = 0 Probability is8.1046280797636E-15

    echo 'Covariance: '  . $kashi->cov($x, $y) . '
'; echo 'Correlation: ' . $kashi->cor($x, $y) . '
'; $r = $kashi->cor($x, $y); $n = count($x); echo 'Significant of Correlation: ' . $kashi->corTest($r, $n) . '
'; echo 'Path Analysis: ' . print_r($kashi->path($y, array(1=>$x, $data['hp'], $data['qsec'])), true) . '
'; echo 'Regression: ' . print_r($kashi->lm($y, $x), true) . '
'; echo 'Multiple Regression: ' . print_r($kashi->lm($data['mpg'], $data['wt'], $data['hp'])), true) . '
'; echo 't-Test unpaired: ' . $kashi->tTest($x, $y, false) . '
'; echo 'Test: ' . $kashi->tDist($kashi->tTest($x, $y, false), (count($x)-1)*(count($y)-1)) . '
'; echo 't-Test paired: ' . $kashi->tTest($x, $y, true) . '
'; echo 'Test: ' . $kashi->tDist($kashi->tTest($x, $y, true), count($x)-1) . '
';
Top

Distributions:

Normal distribution (x=0.5, mean=0, sd=1)0.3520653267643
Probability for the Student t-distribution (t=3, n=10) one-tailed0.01334365502257
Probability for the Student t-distribution (t=3, n=10) two-tailed0.0066718275112848
Probability for F distribution (f=2, df1=12, df2=15)0.10268840717083
Inverse of the standard normal cumulative distribution, with a probability of (p=0.95)1.6448536251337
t-value of the Student's t-distribution for the probability $p and $n degrees of freedom (p=0.05, n=29)2.0452296438589

Standardize (x)
(mean=0 & variance=1)
-0.61039956748153, -0.34978526910097, -0.91700462439985, -0.002299537926887, 0.22765425476185, 0.24809459188973, 0.36051644609311, -0.027849959336746, -0.068730633592521, 0.22765425476185, 0.22765425476185, 0.8715248742903, 0.52403914311621, 0.57513998593593, 2.0775047648356, 2.2553356978483, 2.1745963661931, -1.0396466471672, -1.6375265081579, -1.4126827997511, -0.76881218022266, 0.3094156032734, 0.22254417047987, 0.63646099731959, 0.64157108160156, -1.3104811141117, -1.1009676585508, -1.7417722275101, -0.048290296464633, -0.45709703902238, 0.36051644609311, -0.44687687045844

    echo 'Normal distribution (x=0.5, mean=0, sd=1): '  . $kashi->norm(0.5, 0, 1) . '
'; echo 'Probability for the Student t-distribution (t=3, n=10) one-tailed: '; echo $kashi->tDist(3, 10, 1) . '
'; echo 'Probability for the Student t-distribution (t=3, n=10) two-tailed: '; echo $kashi->tDist(3, 10, 2) . '
'; echo 'F probability distribution (f=2, df1=12, df2=15): ' . $kashi->fDist(2, 12, 15) . '
'; echo 'Inverse of the standard normal cumulative distribution (p=0.95): '; echo $kashi->inverseNormCDF(0.95) . '
'; echo 't-value of the Student\'s t-distribution (p=0.05, n=29): '; echo $kashi->inverseTCDF(0.05, 29) . '
'; echo 'Standardize (x) (i.e. mean=0 & variance=1): '; echo implode(', ', $kashi->standardize($x)) . '
';
Top

Chi-square test or Contingency tables (A/B testing):

Calculate the probability that number of cylinders distribution in automatic and manual transmission cars is same0.012646605046107

    $table['Automatic'] = array('4 Cylinders' => 3, '6 Cylinders' => 4, '8 Cylinders' => 12);
    $table['Manual']    = array('4 Cylinders' => 8, '6 Cylinders' => 3, '8 Cylinders' => 2);

    $results     = $kashi->chiTest($table);
    $probability = $kashi->chiDist($result['chi'], $result['df']);
    echo 'Chi-square test probability: ' . $probability . '
';
Top

Diversity index:

Shannon index for number of forward gears1.0130227035447
Simpson index for number of cylinders0.357421875

    $gear = array('3' => 15, '4' => 12, '5' => 5);
    $cyl  = array('4' => 11, '6' => 7, '8' => 14);

    echo 'Shannon index for gear: ' . $kashi->diversity($gear) . '
'; echo 'Simpson index for cyl: ' . $kashi->diversity($cyl, 'simpson') . '
';
Top

Analysis of Variance (ANOVA):

Analysis of variance procedure (ANOVA)

Typical ANOVA example output (mpg ~ cyl):
ANOVA table
 
Variate: mpg

Source of 
variation   d.f.    s.s.        m.s.    v.r.    F pr.
cyl         2       824.78      412.39  39.70   <.001
Residual    29      301.26      10.39	 	 
Total       31      1126.05	 	 	 

 
Tables of means
 
Grand mean  20.09 
 
cyl     4       6       8
        26.66   19.74   15.10
rep.    11      7       14
 
Standard errors of means
 
e.s.e.  1.218	 min.rep
        0.861	 max.rep
 
Standard errors of differences of means
 
s.e.d.  1.723X	 min.rep
        1.218X	 max.rep

Least significant differences of means (5% level)

l.s.d.  3.524X	 min.rep
        2.492X	 max.rep
 
Stratum standard errors and coefficients of variation
 
d.f.    s.e.    cv%
29      3.223   16.0
 
Array
(
    [TDF] => 2
    [EDF] => 29
    [TotDF] => 31
    [SST] => 824.7845900974
    [SSE] => 301.2625974026
    [SSTot] => 1126.0471875
    [MST] => 412.3922950487
    [MSE] => 10.388365427676
    [VRT] => 39.697515255869
    [F] => 4.9789191744003E-9
    [Mean] => 20.090625
    [Means] => Array
        (
            [4] => 26.6636364
            [6] => 19.7428571
            [8] => 15.1000000
        )

    [Reps] => Array
        (
            [4] => 11
            [6] => 7
            [8] => 14
        )

    [SE] => Array
        (
            [min] => 1.2182168131961
            [max] => 0.86140936956643
        )

    [SED] => Array
        (
            [min] => 1.7228187391329
            [max] => 1.2182168131961
        )

    [LSD] => Array
        (
            [min] => 3.5235599562701
            [max] => 2.491533138996
        )

    [CV] => 16.042799717154
)

require('kashi_anova.php');

// $obj = new KashiANOVA($dbname, $dbuser, $dbpass, $dbhost);
$obj = new KashiANOVA('test', 'root', '', 'localhost');

$str = file_get_contents('anova_data.txt');
$obj->loadString($str); 

// mpg ~ cyl
$result = $obj->anova('cyl', 'mpg');
print_r($result);
Top

Cluster Analysis:

K-Means Clustering
Array
(
    [Dodge Challenger] => 0
    [Chrysler Imperial] => 0
    [Cadillac Fleetwood] => 0
    [Merc 450SLC] => 0
    [AMC Javelin] => 0
    [Camaro Z28] => 0
    [Maserati Bora] => 0
    [Ferrari Dino] => 0
    [Ford Pantera L] => 0
    [Pontiac Firebird] => 0
    [Merc 450SL] => 0
    [Lincoln Continental] => 0
    [Hornet Sportabout] => 0
    [Merc 450SE] => 0
    [Duster 360] => 0
    [Fiat X1-9] => 1
    [Porsche 914-2] => 1
    [Lotus Europa] => 1
    [Volvo 142E] => 1
    [Mazda RX4 Wag] => 1
    [Datsun 710] => 1
    [Hornet 4 Drive] => 1
    [Valiant] => 1
    [Merc 240D] => 1
    [Mazda RX4] => 1
    [Merc 280] => 1
    [Merc 280C] => 1
    [Merc 230] => 1
    [Fiat 128] => 1
    [Toyota Corolla] => 1
    [Honda Civic] => 1
    [Toyota Corona] => 1
)
Hierarchical Clustering
32	15	14	0.034867528963888
33	12	11	0.046511652279906
34	1	0	0.048063902847295
35	10	9	0.048146270217687
36	33	13	0.048374485470338
37	24	4	0.06456633193609
38	19	17	0.067898627038737
39	22	21	0.092305891561629
40	39	37	0.11301195978463
41	32	16	0.11529825256692
42	31	2	0.1155541020107
43	5	3	0.11717892926293
44	40	36	0.11995870908923
45	23	6	0.12445889917409
46	38	25	0.12703468709516
47	46	42	0.19819935352147
48	8	7	0.20845446781686
49	48	20	0.22553907135502
50	45	44	0.23476357897562
51	47	18	0.24068916220486
52	50	41	0.25528946686225
53	34	29	0.26595333894602
54	51	27	0.27674027068183
55	54	26	0.28056404941297
56	49	43	0.28521660028422
57	56	35	0.30779338554525
58	30	28	0.35715746216011
59	55	53	0.37801491177356
60	59	57	0.42234403985919
61	60	52	0.52592878486916
62	61	58	0.49319668374021

require('kashi_cluster.php');
$obj = new KashiCluster();
$obj->dataLoad($data);

$result = $obj->kMean(2);
print_r($result);

// Heretical tree output has no header, and consists of four columns. For each row, the first column is the 
// identifier of the node, the second and third columns are child nodes identifier, and the fourth column used 
// to determine the height of the node when rendering a tree.
$tree = $obj->hClust();
echo "
$tree
";
Top

Time Series Analysis:

Moving Average2.894, 3.062, 3.201, 3.375, 3.362, 3.362, 3.358, 3.458, 3.566, 3.692, 4.054, 4.4508, 4.7058, 4.3998, 3.9668, 3.2838, 2.692, 2.327, 2.574, 3.019, 3.421, 3.315, 3.039, 2.6546, 2.5206, 2.3056, 2.6326, 2.7606

    echo 'Moving Average for x: ' . implode(', ', $kashi->movingAvg($x, 5)) . '
';
Top

Matrix Functions:

     | 1   2 |         | 5   7 | 
 A = | 3   4 |  ,  B = | 6   8 |
A + B
 | 6     9 |
 | 9    12 |
B - A
 | 4     5 |
 | 3     4 |
A * 2
 | 2     4 |
 | 6     8 |
A * B
 | 17   23 |
 | 39   53 |
Transpose of B, t(B)
 | 5     6 |
 | 7     8 |
Determinant of A, |A| -2
Cofactor of A
 |  4   -3 |
 | -2    1 |
Adjoint of A
 |  4   -2 |
 | -3    1 |
Inverse of A
 |  -2     1 |
 | 1.5  -0.5 |

    $A[1][1] = 1;
    $A[1][2] = 2;
    $A[2][1] = 3;
    $A[2][2] = 4;

    $B[1][1] = 5;
    $B[1][2] = 7;
    $B[2][1] = 6;
    $B[2][2] = 8;

    echo 'A + B = ', print_r($kashi->mAddition($A, $B), true), '
'; echo 'B - A = ', print_r($kashi->mSubtraction($B, $A), true), '
'; echo 'A * 2 = ', print_r($kashi->mMultiplication($A, 2), true), '
'; echo 'A * B = ', print_r($kashi->mMultiplication($A, $B), true), '
'; echo 'Transpose of B, t(B) = ', print_r($kashi->mTranspose($B), true), '
'; echo 'Determinat of A, |A| = ', print_r($kashi->mDeterminant($A), true), '
'; echo 'Cofactor of A = ', print_r($kashi->mCofactor($A), true), '
'; echo 'Adjoint of A = ', print_r($kashi->mAdjoint($A), true), '
'; echo 'Inverse of A = ', print_r($kashi->mInverse($A), true), '
';
Top

Solve System of Linear Equations:

System of Linear Equations

       2*X2 + X3 = 9
  X1 + 4*X2 - X3 = 23
 -X1 +   X2 + X3 = 2
Array
(
    [1] => 2
    [2] => 5
    [3] => -1
)

	$X[1] = array(1=>0, 2, 1);
	$X[2] = array(1=>1, 4, -1);
	$X[3] = array(1=>-1, 1, 1);

	$Y = array(1=>9, 23, 2);
	
	print_r($kashi->solve($X, $Y));
Top

To-do list:

Principal Component Analysis (PCA)
Multiple Linear Regression and Relative Weights
Analysis of Covariance
Extra Clustering Methods (i.e. Linkage Criteria)
Eigenvalues and Eigenvectors of Matrices
Export Graphics in SVG Format
 

Example Data Description (Motor Trend Car Road Tests):

Format: A data frame with 32 observations on 12 variables.
ID Title Description
1 model Car models
2 mpg Miles/(US) gallon
3 cyl Number of cylinders
4 disp Displacement (cu.in.)
5 hp Gross horsepower
6 drat Rear axle ratio
7 wt Weight (lb/1000)
8 qsec 1/4 mile time
9 vs V/S
10 am Transmission (0 = automatic, 1 = manual)
11 gear Number of forward gears
12 carb Number of carburetors

You can download example data file from here.

Top