dmraid error reporting by email

dmraid is a software raid/fakeraid/onboard raid tool.

As far as I can tell, the only error reporting that dmraid does is by hooking into logwatch – which emails me a very long file I don’t often read and I would like to know immediately if my raid array is degraded.

This works for me on CentOS 5.6 with dmraid installed – no guarantees on other flavours/combinations. My dmraid version is dmraid-1.0.0.rc13-63.el5. I haven’t tested the output from other versions of dmraid, but it would be pretty trivial to update the script if they are different (or your path is different).

So what we are going to do is write a simple shell script that checks the array status and emails if there is a problem. Then we run the script every 15 (or whatever) minutes.

The dmraid needs to be run as root, so you might as well su - for the whole of this.

To create the file just vi /raid_status.sh, hit i to insert and paste this:

#!/bin/sh
# check raid status and email if not ok
STATUS=`/sbin/dmraid -s | grep "status"`
if [ "$STATUS" != "status : ok" ]
then
/sbin/dmraid -s | mail -s "RAID ERROR ON `hostname`: $STATUS" your@email.com
fi

Hit Esc, ZZ to save, then make the file executable:

chmod 755 /raid_status.sh

Now add it to cron so that it runs regularly:

crontab -e

…and insert the line (i to insert – Esc, ZZ to save):

00,15,30,45 * * * * /raid_status.sh

Voila! dmraid with email error reporting.

Tagged , , , , , , , . Bookmark the permalink.

4 Responses to dmraid error reporting by email

  1. Pingback: RAID error email reporting with the 3ware 9550SXU-8L and tw_cli | Oracle of Geek

  2. catalin says:

    very cool script it helped me

  3. gnjavator says:

    Cool script, but… but when i type this command I get this output:
    *** Group superset .ddf1_disks
    –> Active Subset
    name : ddf1_4035305a8680222820202020202020209b7dac183a354a45
    size : 312237824
    stride : 128
    type : mirror
    status : ok
    subsets: 0
    devs : 2
    spares : 0

    OK, but when i remove one drive simulating it’s failure i get status ok, and only difference is that it says “devs : 1″… i don’t have really faulty drive to check what happens when you have faulty drive, but when drive is totaly missing it says that status is OK which is not good. 🙁

  4. Eike says:

    If you are only interested in status, you might use “dmraid -s -csta” which returns the status only: “ok” for example.

    Any updates on the not changed status but reduced number of devices?

Leave a Reply

Your email address will not be published. Required fields are marked *