Convert Microsoft to LINUX: Perl & Basic Statistics

This page will serve as an introduction to some basic statistics and show how Perl can be used to get these, along with a general overview of Perl.

Statistics is based on the idea of a normal distribution which is represented by the Gaussian function or Bell Curve. This is the distribution which you would see if you dropped thousands of balls from the same spot and had them cascade through a lattice of square pegs. For a perfect, infinite continuous distribution you can define concepts such as the mean median mode and standard deviation . In reality our distributions are always finite so it is necessary to make some slight modifications to the definitions.
Mean = Sum of all values / N (number of Values)
Median = Value point where 50 % of the values are above and 50 % below
This would be the knife edge that you could balance the curve on.
Mode = The most frequently occuring value.
Sample Standard Deviation = SQRT( (Sum_of( X - Mean)**2) / (N - 1) )
So putting this in words, Take the difference between the value and the Mean and square it. Now sum this over all values. Now divide this result by N - 1, the number of values minus one. Note we must use N - 1 here to account for the fact that we have discrete entities. This is called the bessel correction. In the perfect continuous infinite world it would be N, but we are finite beings remember!

Here is the perlstat script to do everything in the last paragraph. Note, it has one fancy hash list array in it. Other than that all the routines are very straight forward. You can review the concepts for the map command to generate from an array a hash table in the Perl tutorials referenced in the next section. It is under:
Data Structures: Scalars, Array, Hashes

General Overview of Perl


There are numerous good Perl tutorials on the web, please look at this one for starters:   Perl Tutorial
Here are some of my favorite scripts which I use:

  1. This is one for archiving, that is backing up important directories on your Desktop folder. It is assumed all these directories are contained under /Users/victor/Desktop with an additional directory of /Users/victor/Sites . Please download the following file do_tar_backup and use it to backup your favorite Desktop Directories. Note, it uses an input file called files_to_archive that looks like this:
    /Users/victor/Desktop/SBC
    /Users/victor/Desktop/Deutsch
    /Users/victor/Desktop/Jobs
    /Users/victor/Desktop/Muttie
    /Users/victor/Desktop/VOCS
    /Users/victor/Desktop/email
    /Users/victor/Desktop/si
    /Users/victor/Sites

    To run this simply type do_tar_backup files_to_archive

    So how do I untar these files, that is restore them? . That is simple, just double click the Sites.tar file on you MAC OS X or Linux System. Then drag out each individual file or the top level folder to your Desktop and you have all the files restored. Be aware that you may be overwriting current directories/folders when you do this.

  2. Here is a simpler one which just shows you how to run a standard UNIX/Linux command such as pwd to find out what directory you are in and all the names of the directories above you. It is available at: pwd_perl . Run it by typing pwd_perl

  3. Here is a sample which shows you how to do arithmetic functions such as addition and multiplication. It is available at: perlarith . Run it by typing perlarith 1 2 3 4 5

  4. Here is a teaching sample showing how the perl sort function works. Additionally I have included the code for the bubble sort. If you run it you will see the successive iterations work to create the final result. perlbubble   Run it by typing perlbubble

  5. Here is a sample which shows you how to do some basic file processing and call subroutines. It is available at: perlprocfile . Run it by typing perlprocfile test_file Here is the input file I used: test_file .

One can additionally run Perl scripts inside of an html web page. This is called CGI Common Gateway Interface. To enable CGI you must have Perl compiled on your Linux Box or MAC OS X System. See Apache Setup   Note, in addition on a MAC OS X you must edit a file called /etc/httpd/users/victor.conf . This assumes your username is victor. Yours may be called something different. Here is a sample:   victor.conf   If you are too lazy to do all the configuration and you just want to try my global httpd.conf file in etc/httpd directory. Here it is:   httpd.conf The point is there are two levels to running CGI Perl scripts. The Apache Setup tells you how to modify /etc/httpd/httpd.conf . This is saying how your Apache Web Server works on a global level. Secondly for each user logging in they have their own httpd.conf file which is read in addition to the global one called username.conf Here is where we allow victor to run CGI scripts in the directory /Users/victor/Sites Following are some sample CGI Scripts to try:

  1. hello.cgi   This uses standard html inside of Perl.
    source
  2. hello2.cgi   This uses the cgi.pm module inside of Perl.
    source
  3. colors.cgi   This uses the cgi.pm module inside of Perl to view the colors.
    source
  4. env.cgi   This uses the cgi.pm module inside of Perl to view the environmental variables.
    source
  5. status.cgi   This shows the status of users on your Web Server.
    source

Is there a way to test these scripts on my own localhost, that is without being connected to the Internet?

Of course there is. If you remember the built in loopback address of 127.0.0.1 is also called localhost. So just type in your browser on the MAC OS X :
http://localhost/~victor/hello.cgi
Note, this assumes the username is victor and the *.cgi files live in /Users/victor/Sites

or on your Linux host :

http://localhost/cgi-bin/hello.cgi
Note, this assumes you have copied the *.cgi files to the default directory of /var/www/cgi-bin

For the Linux host, all you have to do is install apache. You do not have to edit any httpd.conf files! To install apache just type: urpmi apache at the Terminal Konsole which you should be very familiar with now. Feed it the CD's and let it update and you are done.

Let me finally leave you with one advanced Perl gem . Suppose you want to change all the *.html files in a directory replacing "converttolinux.com" with "localhost". That is you are testing your Webpage on the localhost before uploading it with the new changes. click here to see the perl file. Note, it is based on the SED command for globally editing masses of files. It uses the command:
sed -e 's/foo/bar/g' myfile.txt which you probably already know. Note, you have to have the g in there to make it global, otherwise it will only replace the first occurence on each line.

A perl script can also be used to aid in deciphering the output from tcpdump for troubleshooting network problems. Suppose you wanted to see all the http traffic on port 80 between you and a certain host. i.e.:
tcpdump -xls 1500 port 80 and host converttolinux.com | ./tcpdump-data-filter.pl > tcpdump.out
The tcpdump-data-filter.pl perl script prints the ascii values at the end of each output line so you can make sense of the hexadecimal values that are generated. Here is the source for the perl file.

Now that you know quite a bit about Perl you may be tempted to write really fancy code. However, you need to think about the person whohas to follow you. Here is an example of structured Perl. Note, it uses the -w switch with the initial invocation of perl as in:
#!/usr/bin/perl -w
This will generate warnings. Additionally we use the parameter:
use strict;
Note all sub routines and functions are declared in advanced as well as all variables and arrays. This may seem tedious but will really pay off in the long run. Click here to see the text.

For a second structured example, click here to see the text. This program filters a list that was downloaded from a website to turn it into a US Postal Service mailing list. Here are the input and the output files.

Making use of some of the functions in the previous script, you can use this Perlstopproc.txt script to kill processes stuck in memory on a MAC based upon a search pattern. Use with caution. Type ./perlstopproc.txt with no parameters and read the caution statement. I use it every day to stop the HPscanjet program using ./perlstopproc HP to kill the process. Note, if you are using Ubuntu , use this script instead.

For a third structured example, click here to see the text. This program converts a Roman Numeral under MMMM to its Arabic Value.

For a fourth structured example, click here to see the text. This program converts an Arabic Number less than 4000 to a Roman Numeral. Note, both this and the previous example are based on Ozawa Sakuro's program available on the web. See the comments of this last example for more details.

For a fifth structured example, click here to see the text. This program takes a time string and adds minutes to it. The result is output in the original format. i.e.
perltimadd "9:13 AM"  ,  20
will yield "9:33 AM" as the output. Note, there needs to be a space on both sides of the comma.

Now, that you are in a mathematical vain, here is a program using recursiion to calculate the Fibonacci Series, i.e. [0, 1, 1, 2, 3, 5, 8, 13 ...] Click here to see the text. Note, I have based it loosely on this web tutorial. I tightened up the code and added lots of comments.

Here is a non-recursive way to calculate the Fibonacci Series. I based it on a well known perl one liner , but made it intelligible. That is I turned it into a Fibonacci function call.

Here is a non-recursive way to calculate the PI Series. It is based on the well known Gregory-Leibnitz series. I have added in a timer so you can go up to a billion iterations if you like. This took 10 minutes on my Linux Host. Click here to see it.

Here is a non-recursive way to calculate the beautiful number e discovered and used by Euler and DeMoivre among others. This is an original derivation. Please see:   this.