RAID error email reporting with the 3ware 9550SXU-8L and tw_cli

This is a follow up to my previous post dmraid error reporting by email, this time for a hardware raid controller, this one is the 3ware/AMCC 9550SXU-8L. The concept is exactly the same, the only thing different is the script that checks the raid array status.

We will be using the command line tool tw_cli – which can be downloaded from www.3ware.com (now LSI) – go to support and select your product, then download the command line tools zip (which is currently CLI Linux – 10.2 code set). In this example, the tw_cli file has been extracted to / and chmodded 755.

Follow the other post and instead of inserting the other script insert this:

#!/bin/sh
# check raid status and email if not ok
STATUS=`/tw_cli info c0 | grep "RAID"`
OK=`echo "$STATUS" | grep "OK"`
if [ "$STATUS" != "$OK" ]
then
/tw_cli info c0 | mail -s "RAID ERROR ON `hostname`" your@email.com
fi

This works for my setup using the 10.2 version of the command line tools. I have 2 separate raid arrays on the card (which is why I queried all lines that have the word “RAID” on them and then check they also have the word “OK” on them). If your card is positioned differently or you have multiple cards you may need to change the command line options.

Tagged , , , , , , , , , . Bookmark the permalink.

6 Responses to RAID error email reporting with the 3ware 9550SXU-8L and tw_cli

  1. catalin says:

    very cool script it helped me
    Thanks a lot

  2. Andrea says:

    Thanks 1k

  3. Pingback: URL

  4. Donavan says:

    here is some code I wrote, it provides a bit more active monitoring, and will only send email once for each event.

    #!/usr/bin/perl -w
    #
    # tw_cli_raid_monitor.pl
    # Perl v5.8.8
    # Tested under RHEL5, should work fine under RHEL6
    #
    # This program uses tw_cli and smartctl to monitor the condition of 3ware RAID
    # controllers. It will send notification email out if the raid state changes.
    # Raid states OK and VERIFYING are considered good/optimal states. This program
    # is only set up to monitor one controller. If you have multiple raid arrays
    # on the same system you will need to modify the code, though its probably
    # to rename the program and run multiple instances of the same code with
    # different config settings.
    #
    # This program uses an xml config file that must have the same name as the
    # program and is expected to be in the same directory.
    #
    #tw_cli_raid_monitor.xml
    # < =
    #-
    #
    # /bin/tw_cli
    # /usr/sbin/smartctl
    # /bin/date
    # /usr/sbin/sendmail
    # c0
    # u0
    # 10
    # admin.notice@mail_account.com
    # Server_Name 3ware RAID <root@Server_Name.com>
    # Server_Name 3ware RAID Status:
    #
    # Server_Name 3ware RAID
    # root@Server_Name.com
    #
    #-
    #
    # This program runs continually, it is recommended that you run it in a session
    # preserving shell such as tmux or screen. Make sure the account you run from
    # has permissions to run tw_cli and smartctl, you may need to modify your
    # /etc/sudoers file to give the designated account permissions to run the listed
    # commands. You may need to modify the config as follows:
    # /usr/bin/sudo /bin/tw_cli
    # /usr/bin/sudo /usr/sbin/smartctl
    #
    # This script can be started at boot via crontab:
    #
    # @reboot /path/to/cron/script/tw_cli_raid_monitor.cron
    #
    #tw_cli_raid_monitor.cron
    #-
    ###!/bin/bash
    #
    ## To list tmux sessions use:
    ## tmux ls
    ## To Connect to tmux sessions use:
    ## tmux attach -t tw_cli_raid_monitor
    ## To detach from the tmux session use:
    ## (ctrl a) then press the d key
    #
    ## Set up the paths and environmental variables to run tmux.
    #source /home/user/.bashrc
    #
    ## Start the tmux session
    #/usr/bin/tmux new-session -d -s tw_cli_raid_monitor
    #
    ## Change to the working directory.
    #usr/bin/tmux send-keys -t tw_cli_raid_monitor "cd /path/to/cron/script/" C-m
    #
    ## Start the tw_cli_raid_monitor.pl process up in screen..
    #/usr/bin/tmux send-keys -t tw_cli_raid_monitor "/path/to/cron/script/tw_cli_raid_monitor.pl" C-m
    #-
    #
    # History:
    # ---------------------------------------------------------------------------
    # 2013-12-12 dkienenberger Created.
    #
    #############################################################################

    use XML::Simple;
    use File::Basename;
    #use Data::Dumper;

    #>>>>>>>>>>>>>>>>>>>>>>>
    # Check the raid status.
    #>>>>>>>>>>>>>>>>>>>>>>>

    sub get_raid_status {
    my $data = `$command_tw_cli info $controller $unit status | awk '{print \$4}'`;
    chomp $data;
    chomp $data;
    return $data;
    }

    #>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    # Determine what we do depending on the condition of the raid state.
    #>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

    sub process_raid_state {
    my ($raid_status) = @_;
    no warnings 'exiting';

    # Check if the last raid state is not optimal.
    if ( $Last_raid_state ne "OK" and $Last_raid_state ne "VERIFYING" ) {
    #If the raid state has not changed.
    if ( $Last_raid_state eq $raid_status ) {

    # Start check over again.
    next;

    }

    #If the raid state has changed.
    else {

    # if the raid state is optimal again.
    if ( $raid_status eq "OK" or $raid_status eq "VERIFYING" ) {

    # Notify the recipients that the raid state has returned to optimal.
    &notify_raid_state_change("short");

    # Set record the changed raid state.
    $Last_raid_state = $raid_status;

    # Start check over again.
    next;

    }

    # if the raid state is still not optimal.
    else {

    # Notify the recipients that the raid state has changed and is still not optimal.
    &notify_raid_state_change("full");

    # Set record the changed raid state.
    $Last_raid_state = $raid_status;

    # Start check over again.
    next;
    }
    }
    }

    #If the last raid state is optimal.
    else {

    #If the raid state has not changed.
    if ( $Last_raid_state eq $raid_status ) {

    # Start check over again.
    next;

    }

    #If the raid state has changed.
    else {

    # Check if the raid state is still optimal.
    if ( $raid_status eq "OK" or $raid_status eq "VERIFYING" ) {

    # Set record the changed raid state.
    $Last_raid_state = $raid_status;

    # Start check over again.
    next;

    }

    # if the raid state is not optimal.
    else {

    # Notify the recipients that the raid state has changed.
    &notify_raid_state_change("full");

    # Set record the changed raid state.
    $Last_raid_state = $raid_status;

    # Start check over again.
    next;
    }
    }
    }
    }

    #>>>>>>>>>>>>>>>>>>>>>>>
    # Check the raid status.
    #>>>>>>>>>>>>>>>>>>>>>>>

    # takes argument "short" to send a short report.
    sub notify_raid_state_change {
    my ($message_type) = @_;
    my ($report,$smart_data);
    undef %{$port_and_serials};

    if ( $message_type eq "short" ) {

    # Send a short email to recipients
    &send_email_notice();

    }

    # Send a full email report to recipients
    else {

    # collect tw_cli logs and data.
    $report = &gather_tw_cli_report_data;

    #Get a list of bad devices port numbers and serial numbers.
    $port_and_serials = &identify_bad_drive;

    # Check and see if there are returned ports.
    if ($port_and_serials) {

    # collect smart data.
    $smart_data = &check_smart_data($port_and_serials);

    # Add the smart data to the report
    $report .= $smart_data;
    $report .= <>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    # Get information for a full email report
    #>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

    sub gather_tw_cli_report_data {
    my $data_status = `$command_tw_cli info $controller`;
    my $data_alarms = `$command_tw_cli alarms`;

    return $data_status . "\n\n" . $data_alarms . "\n";
    }

    #>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    # Identify the bad drive(s).
    #>>>>>>>>>>>>>>>>>>>>>>>>>>>>

    sub identify_bad_drive {
    my $data = `$command_tw_cli info $controller drivestatus`;
    my @drivestatus_output = split('\n', $data);
    my $TextLine;
    my ($junk,$port,$status,$serial,$port_number);
    my %Port = ();
    no warnings 'once';

    # process each line of the drive status output
    foreach $TextLine (@drivestatus_output) {

    # process only port lines that start with p#.
    if ($TextLine =~ m#^p\d.+#) {

    # Break the line down into variables.
    ($port,$status,$junk,$junk,$junk,$serial) = split(' ', $TextLine);
    chomp $serial;

    # Check if the status of the current port is not optimal.
    if ( $status ne "OK" ) {

    # Remove the P prefix.
    ($junk,$port_number) = split(/p/, $port);

    # Store the port number and serial number of the bad device.
    $Port{ $port_number } = $serial;
    }
    }
    }

    #Return a reference to the port hash.
    return \%Port;
    }

    #>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    # Retrieve smart data from the device.
    #>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

    sub check_smart_data {
    my $port = shift;
    my $data;

    # go though the bad ports.
    while ( my ($key, $value) = each(%{$port}) ) {

    #Retrieve smart data.
    $data .= " Port: $key\n Device Serial Number: $value\n";
    $data .= `$command_smartctl -A -d 3ware,$key /dev/twa0`;
    $data .= "\n\n";
    }

    return $data;

    }

    #>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    # Send out the email to the recipients.
    #>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

    sub send_email_notice {
    my ($report) = @_;
    no warnings 'uninitialized';

    # if there is a report, send a full email report.
    if ($report) {

    # Send full report to recipients

    open (MAIL, "|$command_sendmail -t") || die "Can't open the mail binary: ${command_sendmail}!\n";
    print MAIL "X-Mailer: ${mail_x_mailer}\n";
    print MAIL "Return-Path: ${mail_return_path}\n";
    print MAIL "From: ${mail_from}\n";
    print MAIL "To: ${mail_to}\n";
    print MAIL "Subject: ${mail_subject_prefix}${Current_raid_state}${mail_subject_postfix}\n\n";
    print MAIL "${report}\n\n";
    close (MAIL);
    }

    # if the report is empty.
    else {

    # Send short notice to recipients

    open (MAIL, "|$command_sendmail -t") || die "Can't open the mail binary: ${command_sendmail}!\n";
    print MAIL "X-Mailer: ${mail_x_mailer}\n";
    print MAIL "Return-Path: ${mail_return_path}\n";
    print MAIL "From: ${mail_from}\n";
    print MAIL "To: ${mail_to}\n";
    print MAIL "Subject: ${mail_subject_prefix}${Current_raid_state}${mail_subject_postfix}\n\n";
    print MAIL "\n";
    close (MAIL);
    }

    }

    # Run the main routine.
    &Start;

    #>>>>>>>>>>>>>
    #> Main Start
    #>>>>>>>>>>>>>

    sub Start {

    #declare all global variables used in this subroutine.
    local ($TimeStamp, $Last_raid_state, $Current_raid_state, $command_tw_cli ,$command_smartctl ,$command_date ,$command_sendmail ,$controller ,$unit ,$check_time ,$mail_to ,$mail_from ,$mail_subject_prefix ,$mail_subject_postfix ,$mail_x_mailer ,$mail_return_path);

    # Obtain configuration from a file with a name inferred from this
    # script's name.
    my ($program_name) = split(/\./, basename($0));
    my $program_full_name =basename($0);

    # Load the config file using xmlsimple.
    $config = XMLin("${program_name}.xml", SuppressEmpty => 'undef');

    $command_tw_cli = $config->{'command_tw_cli'};
    $command_smartctl = $config->{'command_smartctl'};
    $command_date = $config->{'command_date'};
    $command_sendmail = $config->{'command_sendmail'};
    $controller = $config->{'controller'};
    $unit = $config->{'unit'};
    $check_time = $config->{'number_of_seconds_between_raid_checks'};
    $mail_to = $config->{'mail_to'};
    $mail_from = $config->{'mail_from'};
    $mail_subject_prefix = $config->{'mail_subject_prefix'};
    $mail_subject_postfix = $config->{'mail_subject_postfix'};
    $mail_x_mailer = $config->{'mail_x_mailer'};
    $mail_return_path = $config->{'mail_return_path'};

    # Get the date timestamp
    $TimeStamp = `$command_date`;
    chomp $TimeStamp;

    # Print other start info:
    print "-- Starting $program_full_name $TimeStamp --\n";
    print "Loaded config file ${program_name}.xml\n";

    # Run till killed.
    while (1) {

    #Sleep in seconds before running the check again.
    sleep ($check_time);

    # Get the status of the 3ware RAID.
    $Current_raid_state = &get_raid_status;

    # Define the first run of Last_raid_state.
    if (!$Last_raid_state) {
    $Last_raid_state = $Current_raid_state;
    }

    # Get the date timestamp
    $TimeStamp = `$command_date "+%Y%m%d %T"`;
    chomp $TimeStamp;

    # Output the current status to standard out.
    print "$TimeStamp: Raid Status: $Current_raid_state\n";

    # Process the raid state.
    &process_raid_state($Current_raid_state);

    }

    #End the program
    exit;
    }

    • Donavan says:

      Sorry, looks like the html cleaner of this blog decided to strip a lot of my “>” and “<" from the code block, I cant guarantee it will work properly or post the xml config example.

  5. Maxim Makarenko says:

    Thanks a lot

Leave a Reply

Your email address will not be published. Required fields are marked *