Monitor server cpu resources with email notification

I thought I’d write a quick script to keep an eye on which processes/users are using too many cpu cycles on my CentOS server. This checks the usage over the previous 5 minutes and emails a detailed list of cpu-hungry processes if it’s over the defined limit. Run it from cron to keep an eye on those resources:

#!/bin/bash
CPU_LIMIT="10" # relevant to number of cores, so quad-core at capacity is 4
EMAIL="your@email.com"
  if [ $(echo "$(cat /proc/loadavg | cut -d " " -f 2) >= $CPU_LIMIT" | bc) = 1 ]; then
ps ax --sort=-pcpu o user,pid,pcpu,pmem,vsz,rss,stat,time,comm | mail -s "CPU OVER LIMIT ON `hostname`" $EMAIL
  fi

That’s all folks!

RAID error email reporting with the 3ware 9550SXU-8L and tw_cli

This is a follow up to my previous post dmraid error reporting by email, this time for a hardware raid controller, this one is the 3ware/AMCC 9550SXU-8L. The concept is exactly the same, the only thing different is the script that checks the raid array status.

We will be using the command line tool tw_cli – which can be downloaded from www.3ware.com (now LSI) – go to support and select your product, then download the command line tools zip (which is currently CLI Linux – 10.2 code set). In this example, the tw_cli file has been extracted to / and chmodded 755.

Follow the other post and instead of inserting the other script insert this:

#!/bin/sh
# check raid status and email if not ok
STATUS=`/tw_cli info c0 | grep "RAID"`
OK=`echo "$STATUS" | grep "OK"`
if [ "$STATUS" != "$OK" ]
then
/tw_cli info c0 | mail -s "RAID ERROR ON `hostname`" your@email.com
fi

This works for my setup using the 10.2 version of the command line tools. I have 2 separate raid arrays on the card (which is why I queried all lines that have the word “RAID” on them and then check they also have the word “OK” on them). If your card is positioned differently or you have multiple cards you may need to change the command line options.

dmraid error reporting by email

dmraid is a software raid/fakeraid/onboard raid tool.

As far as I can tell, the only error reporting that dmraid does is by hooking into logwatch – which emails me a very long file I don’t often read and I would like to know immediately if my raid array is degraded.

This works for me on CentOS 5.6 with dmraid installed – no guarantees on other flavours/combinations. My dmraid version is dmraid-1.0.0.rc13-63.el5. I haven’t tested the output from other versions of dmraid, but it would be pretty trivial to update the script if they are different (or your path is different).

So what we are going to do is write a simple shell script that checks the array status and emails if there is a problem. Then we run the script every 15 (or whatever) minutes.

The dmraid needs to be run as root, so you might as well su - for the whole of this.

To create the file just vi /raid_status.sh, hit i to insert and paste this:

#!/bin/sh
# check raid status and email if not ok
STATUS=`/sbin/dmraid -s | grep "status"`
if [ "$STATUS" != "status : ok" ]
then
/sbin/dmraid -s | mail -s "RAID ERROR ON `hostname`: $STATUS" your@email.com
fi

Hit Esc, ZZ to save, then make the file executable:

chmod 755 /raid_status.sh

Now add it to cron so that it runs regularly:

crontab -e

…and insert the line (i to insert – Esc, ZZ to save):

00,15,30,45 * * * * /raid_status.sh

Voila! dmraid with email error reporting.