Convert HTML table to CSV

Just a quick one – I needed a script to convert a table to a csv, so this is what I came up with. See the annotations for notes:

//html table is in variable $report
$report = str_replace(array("\n", "\r"), "", $report); //remove existing line breaks
$report = str_replace('"', '\"', $report); //escape existing quote marks
$csv_lines = explode('</tr>', $report); //explode by end of table row
$csv_report = ''; //define output
  foreach ($csv_lines as $this_line) { //each row
  $csv_cells = explode('</td>', $this_line); //explode by end of table cell
  $csv_newcells = array(); //define new cells
    foreach ($csv_cells as $this_cell) { //each cell
    $this_cell = strip_tags($this_cell); //remove any html tags
    $this_cell = html_entity_decode($this_cell); //remove any html characters
    $this_cell = trim($this_cell); //trim any whitespace
      if (!is_numeric($this_cell)) $this_cell = '"'.$this_cell.'"'; //encapsulate in quotes if it is not a number
    $csv_newcells[] = $this_cell; //add it to the new cell array
    } //foreach cell
  $csv_report .= implode(',', $csv_newcells)."\r\n"; add the new cell line to the output
  } //foreach line
echo $csv_report;

Process email bounces with PHP

This is a quick script to process email bounces, for example from a mailing list so that users can be flagged up or unsubscribed when they have too many failures.

The actual bounce identification will be done by Chris Fortune’s Bounce Handler, which you can download from:
http://anti-spam-man.com/php_bouncehandler/

We require 3 files from that package:
bounce_driver.class.php
bounce_responses.php
rfc1893.error.codes.php

What this script does is get the bounced emails from a specified mailbox and counts up how many failed emails there are per email address – if the number is at least as many as your threshold value (called $delete), then (you insert your code to unsubscribe the email address or whatever etc. and) the bounced emails are then deleted. You can run the script as a cronjob or call from your mailing list script to tidy up subscriptions.

<?php

# define variables
$mail_box = '{mail.domain.com:143/novalidate-cert}'; //imap example
$mail_user = 'username'; //mail username
$mail_pass = 'password'; //mail password
$delete = '5'; //deletes emails with at least this number of failures

# connect to mailbox
$conn = imap_open ($mail_box, $mail_user, $mail_pass) or die(imap_last_error());
$num_msgs = imap_num_msg($conn);

# start bounce class
require_once('bounce_driver.class.php');
$bouncehandler = new Bouncehandler();

# get the failures
$email_addresses = array();
$delete_addresses = array();
  for ($n=1;$n<=$num_msgs;$n++) {
  $bounce = imap_fetchheader($conn, $n).imap_body($conn, $n); //entire message
  $multiArray = $bouncehandler->get_the_facts($bounce);
    if (!empty($multiArray[0]['action']) && !empty($multiArray[0]['status']) && !empty($multiArray[0]['recipient']) ) {
      if ($multiArray[0]['action']=='failed') {
      $email_addresses[$multiArray[0]['recipient']]++; //increment number of failures
      $delete_addresses[$multiArray[0]['recipient']][] = $n; //add message to delete array
      } //if delivery failed
    } //if passed parsing as bounce
  } //for loop

# process the failures
  foreach ($email_addresses as $key => $value) { //trim($key) is email address, $value is number of failures
    if ($value>=$delete) {
    /*
    do whatever you need to do here, e.g. unsubscribe email address
    */
    # mark for deletion
      foreach ($delete_addresses[$key] as $delnum) imap_delete($conn, $delnum);
    } //if failed more than $delete times
  } //foreach

# delete messages
imap_expunge($conn);

# close
imap_close($conn);

?>

IP Failover

IP Failover (proof of concept)

Essentially the idea is that if your primary web server goes down, your backup server automatically takes over (failover). When your primary server is back online, it takes over again (failback). It sounds simple, but it can be very complicated and expensive to eliminate all the single points of failure – you could need multiple redundant routers, switches, servers, power supplies, UPS and storage all on separate networks.

The simple setup I am proposing uses a minimum of two servers; the primary and the backup, both on separate networks.

An alternative to IP failover (where the IP is changed) is IP takeover (where the backup server actually takes on the IP address of the failed primary), but for this to work, the IP addresses need to have the same router, so would need to be at the same web host. That’s fine if the primary server failure issue is local hardware, but if it’s a power failure in the datacentre or a problem with the network, a switch, a router or any connection problems, transit or peering, then both servers would have the same problem, so it makes more sense to make them geographically disparate on entirely separate networks.

It should be noted that IP failover services are provided by sites like dnsmadeeasy.com and zoneedit.com – this would be much simpler to set up, but of course you pay for the privilege.

There are two options for the backup server. Either it just displays a “service currently not available” type page (simple to set up) or it is a complete mirror of the primary server. For my purposes it is a status page, but it is perfectly possible to replicate the primary server if necessary – you could run rsync (preferably through an SSH tunnel with a key pair), something along the lines of “rsync -avz –delete -e “ssh -i /root/rsync/mirror-rsync-key” /home/website user@server.com:/home/website” run every few minutes by cron – with MySQL replication for the databases, for example. Google has multiple articles and how-tos for both scenarios.

A possible issue with a replicated server is if users change files via FTP or change the database using a database script when the backup server is active, you would either have to prevent them doing so or make sure you mirror those changes back to the primary server when it comes back online. Unless of course you are using a different server for the database and shared storage for the web files.

This is how I propose to do it: there’s a script on the backup server that monitors the primary server (a heartbeat-type service, run as a PHP script every few minutes by cron). The script can optionally check with any other spare servers to see if they can contact the primary server (to make sure it is definitely down). If the primary server is definitely down, the backup server updates the DNS entries for the server domain(s) to point to itself (with a low TTL in case the primary comes back online).

It’s simplest if the primary server is also the primary DNS server and the backup server is the secondary DNS server, then if a user cannot connect to the primary server, it also cannot get the incorrect DNS records (assuming the whole server is down, not just the web service, which the heartbeat script should check).

Major caveat: some ISPs may ignore the TTL of DNS records and cache the wrong results for too long, but nothing can be done about that.

Updating the DNS records could be done using the nsupdate command, but there may be issues with permissions both to run the program and for the server to update the DNS records, so it’s simpler if the backup is running Virtualmin, which comes with a remote API which can be called from a PHP script.

This is the flowchart of what happens:

BACKUP checks PRIMARY is up every x mins and loads previous state from database:
     > PRIMARY was up and is still up – do nothing
     > PRIMARY was up and is now down:
          > check with SPARE servers:
               > cannot contact SPARES – internet probably down at BACKUP – do nothing
               > at least 1 SPARE reports PRIMARY up – network issues – do nothing
               > otherwise – update DNS and log to database
     > PRIMARY was down and is still down – do nothing
     > PRIMARY was down and is back up – update DNS and log to database

This is how the PHP script could update the DNS records:

<?php

//if primary down, failover IP addresses, remove www record then add new one:

$result1 = shell_exec("wget -O - --quiet --http-user=root --http-passwd=root_pass --no-check-certificate 'https://www.backupdomain.com:10000/virtual-server/remote.cgi?program=modify-dns&domain=primarydomain.com&remove-record=www.primarydomain.com. A'");

$result2 = shell_exec("wget -O - --quiet --http-user=root --http-passwd=root_pass --no-check-certificate 'https://www.backupdomain.com:10000/virtual-server/remote.cgi?program=modify-dns&domain=primarydomain.com&ttl=60&add-record=www.primarydomain.com. A 1.2.3.4'");

//echo $result2; //should end with: Exit status: 0 if successful

// if primary back online, reverse the process (though the primary as the primary DNS server should eventually update the record on the secondary DNS server anyway)

?>

I’ll post more actual examples when I get around to implementing this 🙂

PHP Recursive File Copy Function

I couldn’t find a function online that copies folders recursively in PHP and actually works, so I wrote my own:

function recursiveCopy($src, $dest) {
if (is_dir($src)) $dir = opendir($src);
while ($file = readdir($dir)) {
if ($file != '.' && $file != '..') {
if (!is_dir($src.'/'.$file)) copy($src.'/'.$file, $dest.'/'.$file);
else {
@mkdir($dest.'/'.$file, 0750);
recursiveCopy($src.'/'.$file, $dest.'/'.$file);
} //else
} //if
} //while
closedir($dir);
} //function

To summarise: if the source is a folder, open it and start reading the files. If the files are not folders, copy them straight to the destination, if they are folders, create a new folder at the destination and then run the function again (within itself) for the new folder.

Usage:

recursiveCopy('/home/site/public_html/folder','/home/othersite/public_html/folder');

Javascript (JQuery): Social networking feeds – new Facebook authentication

After my previous post, Javascript (JQuery): Social networking feeds all in one place, Facebook went and added authentication to the feed retrieval. After much head-scratching, this is how to enable the Facebook feed under the new OAuth system.

You need an access token to get to the data, so what we are going to do is create a Facebook App which the user then permits to access their information and that will give us the token we need.

So first you need to create a Facebook App. This is simpler than it sounds, we don’t need to create an App that actually does anything or even exists, we just need it for authentication. So, install the Developer App on Facebook and then go to that App and select Set Up New App. Enter the details of the App and be sure to give it a URL and domain – e.g. http://www.cheesefather.com as URL and cheesefather.com as domain. What you put it doesn’t matter that much.

The new App will have an Application id – a load of numbers. Now, this is the method to get the access token. Log in as the user you want the feed for (I am assuming you are using this to retrieve your own feed) and then go to the page:

https://www.facebook.com/dialog/oauth?client_id=YOUR_APP_ID&redirect_uri=http://www.YOUR_URL.com&scope=read_stream,offline_access

Replace your App id and URL in the example. What we are doing here is creating an App request to the user to access data, including the feed (read_stream) and to access the data when they are offline (offline_access) with a token that does not expire (ever, if I’m reading the docs correctly, even if they uninstall the App).

Once you have accepted, the script continues to your URL, passing a very long code to it (you can just copy it from the address bar) – copy this code, the part after ?code=

Then we can finally request the access token. As well as the two codes we just used we also need your App Secret from your Facebook App page. Get the secret and then go to the following page:

https://graph.facebook.com/oauth/access_token?client_id=YOUR_APP_ID&redirect_uri=http://www.YOUR_URL.com&client_secret=YOUR_APP_SECRET&code=THAT_LONG_CODE&type=client_cred

Obviously replace your App id, URL, App secret and the long code with your variables. The script passes back to you an access token (check the source code if your browser isn’t displaying it).

All you need to do now is add that access token to the feed request (see the previous post for the rest of the scripts):

$.getJSON(" https://graph.facebook.com/USER_ID/posts?access_token=ACCESS_TOKEN&limit=5&callback=?",

…replacing the user id of the feed to want to retrieve.

BUT WAIT!…

You don’t want people who check your source code to have access to your Facebook account, so we need to hide that token. This is how I did it – call a PHP proxy script from the Javascript and the PHP return the content minus the access token. So you change that line to:

$.getJSON("facebook.inc.php?callback=?",

Or whatever the name of your new PHP file is. And then the contents of that new file are:

<?php
$access_token = 'YOUR_ACCESS_TOKEN';
header('Content-Type: text/javascript; charset=UTF-8');
ini_set('user_agent', $_SERVER['HTTP_USER_AGENT']);
$handle = fopen('https://graph.facebook.com/USER_ID/posts?access_token='.$access_token.'&limit=5&callback='.$_GET['callback'], 'rb');
$contents = '';
if ($handle) {
while (!feof($handle)) {
$contents .= fread($handle, 8192);
}
}
fclose($handle);
$contents = str_replace($access_token,'',$contents);
print_r($contents);
exit();
?>

Replace your access token and user id in the above example.

What this code is doing is as follows: you define your access token, you tell the browser that it’s outputting Javascript (so that JQuery interprets the results properly), you spoof some browser information so that Facebook returns the data correctly, then you open a connection to the JSON page using your access token and the reference that JQuery has assigned the JSON, then we remove all references to your access token from the output (it appears in links that are returned) and finally print the output so that it can be interpreted by the original JQuery function. Voila! What was so simple just a week ago is now quite a bit more complicated…

CentOS: Install PHP 5.2 with t1lib support

The first step is to vanilla install PHP 5.2 (to handle any dependency issues) and then recompile it with the t1lib option. So enable the testing repo of CentOS 5. Change to root user first, then create the repo:

su -
vi /etc/yum.repos.d/CentOS-Testing.repo

Enter insert mode (hit i) and paste the following into the new file:

# CentOS-Testing:
# !!!! CAUTION !!!!
# This repository is a proving grounds for packages on their way to CentOSPlus and CentOS Extras.
# They may or may not replace core CentOS packages, and are not guaranteed to function properly.
# These packages build and install, but are waiting for feedback from testers as to
# functionality and stability. Packages in this repository will come and go during the
# development period, so it should not be left enabled or used on production systems without due
# consideration.
[c5-testing]
name=CentOS-5 Testing
baseurl=http://dev.centos.org/centos/$releasever/testing/$basearch/
enabled=1
gpgcheck=1
gpgkey=http://dev.centos.org/centos/RPM-GPG-KEY-CentOS-testing
includepkgs=php*

Then update PHP and restart Apache (yum will double-check you want to go ahead):

yum update php*
service httpd restart

PHP is now updated, but the t1lib is not installed or compiled into PHP. So let’s download and install it (you’ll need make and gcc installed):

cd ~admin/software
wget ftp://sunsite.unc.edu/pub/Linux/libs/graphics/t1lib-5.1.2.tar.gz
tar zxfv t1lib-5.1.2.tar.gz
cd t1lib-5.1.2
./configure
make && make install

If it exits with a latex error, install latex:

yum -y install tetex-latex

Installing t1lib can also be accomplished if you have the rpmforge repo installed (see previous post step 6) with: yum --enablerepo=rpmforge install t1lib
If you upgrade your software in the future and get an error about libt1.so.5()(64bit) then install t1lib again using this method and then service httpd restart

Then run the make commands again. T1lib is now installed. Next step is to recompile PHP. Firstly, set up a build environment (still as root) and install some software that we’ll need to compile:

mkdir -p /usr/src/redhat/{SRPMS,RPMS,SPECS,BUILD,SOURCES}
chmod 777 /usr/src/redhat/{SRPMS,RPMS,SPECS,BUILD,SOURCES}
yum -y install rpm-build re2c bison flex

Now, we need to lose our root privileges to compile the software, so we need to run exit or logout to drop back to the admin user (make sure this is the right version of PHP you have just installed, use rpm -q php to check).

exit
cd ~admin/software
wget http://dev.centos.org/centos/5/testing/SRPMS/php-5.2.10-1.el5.centos.src.rpm
rpm --install php-5.2.10-1.el5.centos.src.rpm
vi /usr/src/redhat/SPECS/php.spec

Technically, we should edit the release line to reflect the changes we are making, but that creates dependency issues, so we’ll ignore that and edit the configure lines. Scroll to where is says %configure with various includes after the line. Remove the line that says --disable-rpath \ which will stop the compile working (this is PHP bug #48172) and add at the end: --with-t1lib \

Exit insert mode, save and exit (hit Esc, then ZZ). Now rebuild the RPM files:

rpmbuild -bb /usr/src/redhat/SPECS/php.spec

It’s highly likely that you will now get a list of failed dependencies. All of them need to be installed. The following is my list – yours may be different. Su to the root user and install them, then logout back to the admin user after this command:

su -
yum -y --skip-broken install bzip2-devel curl-devel db4-devel expat-devel gmp-devel aspell-devel httpd-devel libjpeg-devel libpng-devel pam-devel libstdc++-devel sqlite-devel pcre-devel readline-devel libtool gcc-c++ libc-client-devel cyrus-sasl-devel openldap-devel postgresql-devel unixODBC-devel libxml2-devel net-snmp-devel libxslt-devel libxml2-devel ncurses-devel gd-devel freetype-devel
exit

Then run the rpmbuild command again. If you get a GD error after the T1_StrError line, try running this command as root:

su -
ldconfig /usr/local/lib
exit

Run the rpmbuild command again (as non-root). When it finishes (will take a while), install the resultant RPM files as root user:

su -
cd /usr/src/redhat/RPMS/x86_64/
rpm -Uhv --nodeps --force *.rpm
service httpd restart
exit

Your path to the RPMs may be different depending on your architecture.