Simon icon Simon
Flexible server monitoring

Ping test: include ping and traceroute results to make it really useful

Hi all,

I would like for the Ping test to be able to include ping and traceroute results in an email notifier. This would make the ping test way more useful.

I've been corresponding with Simon's author David via email and he said this needs to be done using a custom (Shell) Script-based service. The problem is I'm not a shell script expert and the script would basically require an shell script expert to write (I think).

I'm not a programmer but it would seem to me the script would need to (thanks to David for helping me understand a bit more about the exit values):

1. run a ping command (e.g. ping -c 10 {ServerName})
2. find the average ping time value (from ping results using probably grep) and compare against {TimeThreshold}
3. if average ping time value is greater than {TimeThreshold}, then run traceroute command (traceroute {ServerName}), "mark" test as "failed" (exit result value 1) and end script, otherwise continue script...
4. find packet loss value (from ping results above) and compare against {PacketLossThreshold}
5. if packet loss value is greater than {PacketLossThreshold}, then run traceroute command (traceroute {ServerName}), "mark" test as "failed" (exit result value 1) and end script, otherwise "mark" test as "success" (exit result value 0)

If the test fails, I basically would like to get the following automatically emailed to me by using an email notifier (and to use variables to include the date/time the test was done):

sh:~ admin$ ping sample.server.com
PING sample.server.com (202.64.69.151): 56 data bytes
64 bytes from 202.64.69.151: icmp_seq=0 ttl=111 time=244.432 ms
64 bytes from 202.64.69.151: icmp_seq=1 ttl=111 time=221.971 ms
64 bytes from 202.64.69.151: icmp_seq=2 ttl=111 time=243.018 ms
64 bytes from 202.64.69.151: icmp_seq=3 ttl=111 time=220.99 ms
64 bytes from 202.64.69.151: icmp_seq=4 ttl=111 time=232.624 ms
64 bytes from 202.64.69.151: icmp_seq=5 ttl=111 time=243.936 ms
64 bytes from 202.64.69.151: icmp_seq=6 ttl=111 time=215.623 ms
64 bytes from 202.64.69.151: icmp_seq=7 ttl=111 time=208.475 ms
64 bytes from 202.64.69.151: icmp_seq=8 ttl=111 time=244.549 ms
64 bytes from 202.64.69.151: icmp_seq=9 ttl=111 time=236.922 ms
64 bytes from 202.64.69.151: icmp_seq=10 ttl=111 time=268.634 ms
64 bytes from 202.64.69.151: icmp_seq=11 ttl=111 time=236.311 ms
64 bytes from 202.64.69.151: icmp_seq=12 ttl=111 time=231.318 ms
64 bytes from 202.64.69.151: icmp_seq=13 ttl=111 time=250.657 ms
64 bytes from 202.64.69.151: icmp_seq=14 ttl=111 time=176.762 ms
64 bytes from 202.64.69.151: icmp_seq=15 ttl=111 time=243.854 ms
64 bytes from 202.64.69.151: icmp_seq=17 ttl=111 time=244.334 ms
64 bytes from 202.64.69.151: icmp_seq=18 ttl=111 time=196.518 ms
64 bytes from 202.64.69.151: icmp_seq=19 ttl=111 time=244.092 ms
64 bytes from 202.64.69.151: icmp_seq=20 ttl=111 time=264.856 ms
64 bytes from 202.64.69.151: icmp_seq=21 ttl=111 time=244.243 ms
64 bytes from 202.64.69.151: icmp_seq=22 ttl=111 time=243.095 ms
^C
--- sample.server.com ping statistics ---
23 packets transmitted, 22 packets received, 4% packet loss
round-trip min/avg/max = 176.762/234.418/268.634 ms
sh:~ admin$
sh:~ admin$
sh:~ admin$ traceroute sample.server.com
traceroute to sample.server.com (202.64.69.151), 30 hops max, 40 byte packets
1 172.20.232.1 (172.20.232.1) 1.01 ms 0.342 ms 0.345 ms
2 58.246.73.49 (58.246.73.49) 1.606 ms 1.334 ms 1.372 ms
3 210.22.67.61 (210.22.67.61) 1.304 ms 1.331 ms 1.647 ms
4 8ge0-gsr1-sh2-chj1.sh.cncnet.net (210.22.66.81) 1.927 ms 1.739 ms 2.054 ms
5 4p0-gsr1-sh2-sb1.sh.cncnet.net (210.22.66.37) 1.794 ms 1.745 ms 1.744 ms
6 210.22.66.145 (210.22.66.145) 2.059 ms 1.849 ms 1.914 ms
7 219.158.21.249 (219.158.21.249) 12.038 ms 2.008 ms 2.154 ms
8 219.158.3.249 (219.158.3.249) 39.635 ms 28.577 ms 28.331 ms
9 219.158.3.250 (219.158.3.250) 237.894 ms 226.613 ms 196.421 ms
10 219.158.3.238 (219.158.3.238) 179.074 ms 157.957 ms 151.504 ms
11 219.158.3.98 (219.158.3.98) 145.615 ms 148.245 ms 159.414 ms
12 i-6-4.iadv02.net.reach.com (134.159.100.233) 172.865 ms 188.722 ms 203.844 ms
13 i-4-0.tmh-core04.net.reach.com (202.84.153.225) 199.425 ms 202.71 ms 194.779 ms
14 unknown.net.reach.com (134.159.161.82) 185.01 ms 223.107 ms 190.598 ms
15 v152.tmhc1.pacific.net.hk (202.64.4.1) 180.6 ms 184.963 ms 200.309 ms
16 v203.tmhs3503.pacific.net.hk (202.64.147.234) 207.249 ms 207.062 ms 212.337 ms
17 * 220.232.190.178 (220.232.190.178) 183.834 ms 156.79 ms
18 220.232.190.212 (220.232.190.212) 162.474 ms 166.743 ms 176.873 ms
19 * * *
20 * * *
21 * * *
^C

Would very much appreciate if any shell script gurus could help to create such a script. If the 2 tests (ping time and packet loss) are difficult to do, then if just packet loss could be done, that would great!

Thanks in advance for any feedback.

Have a great weekend y'all!

Cheers,
Derek

derektom's picture

Now have Perl script and command that work from Terminal but...

With the help of a colleague (Brian Smith), I now have a Perl script (separate file) that is called from a single shell command and does what I want (see initial post above). When I try to get it to work within Simon as a Script-based service, however, it does not work.

The Perl script is below which should be saved with the filename networkTest.pl; copied into one of your PATH directories (check using printenv PATH command in Terminal); and sudo chmod'ed 755 (made executable).

#!/usr/bin/perl
use Getopt::Long;

$packetLoss = -1;
$count=0;
$timeSum=0;
$ServerName = "";
$TimeThreshold = 10.0;
$PacketLossThreshold = 1;
$result = GetOptions("server=s" => \$ServerName, "time=f" => \$TimeThreshold, "loss=i" => \$PacketLossThreshold);
if (length($ServerName)==0 || $result == 0) {
&Usage;
exit -1;
}
open(PING,"/sbin/ping -c 10 $ServerName|");
while (<PING>) {
if (/time=(\S+)\s/) {
$count++;
$timeSum += $1;
} elsif (/\s(\S+)%\spacket\sloss/) {
$packetLoss = $1;
}
}
close(PING);
$average = $timeSum/$count;
if ($average > $TimeThreshold || $packetLoss > $PacketLossThreshold) {
print STDERR "\n$average Average Ping Time\n";
print STDERR "$packetLoss% Packet Loss\n\n";
open(TRACEROUTE,"/usr/sbin/traceroute -m 20 $ServerName|");
while(<TRACEROUTE>) {
print STDERR $_;
}
close(TRACEROUTE);
exit 1;
} else {
exit 0;
}

sub Usage {
print STDERR "Usage:  $0 --server=SERVERNAME --time=TIMETHRESHOLD --loss=LOSSTHRESHOLD\n";
print STDERR "where:\n\tSERVERNAME = fully qualified server name\n";
print STDERR "\tTIMETHRESHOLD = Time Threshold in ms (default 10ms)\n";
print STDERR "\tLOSSTHRESHOLD = Packet Loss Threshold Percentage (default 1%)\n";
}

The following command (in Terminal) will produce the desired result with the server set as sample.server.com, the roundtrip time threshold set at 80ms, and packet loss threshold set at 10%.

networkTest.pl --server=sample.server.com --time=80 --loss=10

The command to issue using custom variables in Simon (I think) would be:

networkTest.pl --server={ServerName} --time={TimeThreshold} --loss={PacketLossThreshold}

This setup works fine when the command is issued in Terminal but I have not been able to get it to work within Simon.

Any suggestions?

Thanks,
Derek

derektom's picture

Sample networkTest results

Sample networkTest results:

$ networkTest.pl --server=sample.server.com --time=10 --loss=10

36.0241 Average Ping Time
0% Packet Loss

traceroute to sample.server.com (202.64.69.151), 20 hops max, 40 byte packets
1  172.20.232.1 (172.20.232.1)  28.003 ms  0.297 ms  0.318 ms
2  58.246.73.49 (58.246.73.49)  1.962 ms  1.252 ms  1.207 ms
3  210.22.67.61 (210.22.67.61)  21.015 ms  1.285 ms  1.152 ms
4  8ge0-gsr1-sh2-chj1.sh.cncnet.net (210.22.66.81)  1.777 ms  1.789 ms  1.692 ms
5  4p0-gsr1-sh2-sb1.sh.cncnet.net (210.22.66.37)  116.508 ms  25.278 ms  197.816 ms
6  8ge0-gsr1-sh2-lk1.sh.cncnet.net (210.22.66.145)  2.859 ms  1.607 ms  1.544 ms
7  219.158.21.245 (219.158.21.245)  1.778 ms  1.672 ms  1.541 ms
8  219.158.4.109 (219.158.4.109)  31.937 ms  31.929 ms  31.818 ms
9  219.158.3.182 (219.158.3.182)  29.654 ms  29.613 ms  29.722 ms
10  219.158.3.130 (219.158.3.130)  35.143 ms  34.732 ms  34.685 ms
11  219.158.3.94 (219.158.3.94)  37.148 ms  37.348 ms  36.891 ms
12  i-6-4.iadv02.net.reach.com (134.159.100.233)  37.143 ms  37.213 ms  97.625 ms
13  i-4-0.tmh-core04.net.reach.com (202.84.153.225)  76.776 ms  35.447 ms  35.759 ms
14  unknown.net.reach.com (134.159.161.82)  129.643 ms  35.316 ms  35.351 ms
15  v152.tmhc1.pacific.net.hk (202.64.4.1)  97.691 ms  37.436 ms  37.433 ms
16  v203.tmhs3503.pacific.net.hk (202.64.147.234)  107.094 ms  37.832 ms  52.071 ms
17  220.232.190.178 (220.232.190.178)  100.77 ms  38.066 ms  38.232 ms
18  220.232.190.212 (220.232.190.212)  43.61 ms  36.175 ms  39.458 ms
19  * * *
20  * * *

David Sinclair's picture

Not working for me

The script doesn't appear to work for me even in Terminal. Not sure what I did wrong.

As for calling from Simon, try using the full path to the script, rather than just the script name. Simon doesn't set up any environment variables, other than the default ones, so that may make a difference.

derektom's picture

Please try this

Please try this:

1. Copy the networkTest.pl file to your home directory
2. Open Terminal and issue these 3 commands:

cd ~/
chmod 755 networkTest.pl
ls -al | grep networkTest

3. Result should be something like this to indicate file is executable:

-rwxr-xr-x   1  <username> <username> 1194 Dec 1 23:53 networkTest.pl

4. Then just issue this command:

perl networkTest.pl --server=bbdo.com.hk --time=5 --loss=10

You can substitute your own values for server (IP address or hostname of server to ping; bbdo.com.hk in above command), time (roundtrip time threshold in milliseconds; 5ms in above command) and loss (packet loss threshold as %; 10% in above command). You want the test to fail so that you can see the output which includes the traceroute results. The above test will fail if from your machine, average ping rountrip time to server 'bbdo.com.hk' is greater than 5ms or packet loss is greater than 10%. Nothing will show on your screen for the first 10-15 seconds while the script is pinging without outputting to screen.

Please let me know if that works for you. If it doesn't work, please let me know what errors you see.

If it works, you should see output similar that in my previous post (above).

In Simon, I tried putting the full path to the script but still no luck!

Thanks,
Derek

David Sinclair's picture

Nothing is output

When I run it, it seems to go into an infinite loop; nothing is output, no matter how long I wait.

Looking at the script code, it seems to go into an infinite while() loop, presumably until it receives no more data, or something, but it seems to get stuck.

Looking in Activity Monitor, perl is running non-stop, using 99% CPU.

derektom's picture

When you run the command,

When you run the command, after around 10 seconds does it take you back to the command prompt?

If so, it could be that the test is succeeding so nothing is output.

Please try this command which should fail and output the traceroute results:

perl networkTest.pl --server=sample.server.com --time=0 --loss=0

I just tried this on 2 machines now - a PowerBook G4 and a MacBook Pro both running Tiger - and it worked just fine on both machines.

Thanks,
Derek

David Sinclair's picture

Ah, figured it out. For some

Ah, figured it out.

For some reason, the script I was using was missing the between the brackets in the while statement, hence the infinite loop.

Using the correct script, it works in the Terminal, and I am able to call it from Simon... but it always seems to fail there, even if I use a site that succeeds via the Terminal. Still, a step in the right direction!

One modification to the script: it seems to want quotes around the {ServerName} variable.

derektom's picture

Anything I can do on my end to further this?

Thanks, David. Anything I can do on my end to help to further this? I can ask the script's author Brian Smith to help if there's anything "broken". On another forum I've asked for a script to be written as a bash shell script but don't know when or if I'll get any help. We really need this functionality.

I've never used quotes around the {ServerName} variable in Terminal but do you mean for use in Simon?

Thanks!

Cheers,
Derek

David Sinclair's picture

Re quotes

Re quotes, yes, I meant when calling the script from Simon.

I did try embedding the entire script in Simon, but didn't have any luck with that, either. I do wonder what the missing piece is... perhaps some environment variable is needed or something.

I think I figured this out...

I've been able to call the script from bash and successfully report success or failure. I have not been able to get it to work as a perl script yet (which would be ideal).

Basically, the perl script was outputting to standard error. After changing the STDERR to STDOUT it worked better.

However, the problem is, the output is not accessible to be emailed.

Test Ping Enhanced <n/a> has just failed at 17:02:11 on 2008-01-08.
NotificationDate: 2008-01-08
NotificationTime: 17:02:11
NotifierAfterThisManyChanges: 1
NotifierAfterThisManyErrors: 1
NotifierAfterThisManyRecoveries: 1
NotifierFailureForRecovery: yes
NotifierForChange: yes
NotifierForFailure: yes
NotifierForRecovery: yes
NotifierKind: Script
NotifierName: Log Event Debug
NotifierOnlyOncePerFailure: yes
TestChangeDifferenceWithHTML:
TestChangeDifferenceWithoutHTML:
TestChangeText:
TestIdentifier: 4D70ED51-6BBE-4191-9BF7-F0A449A82322
TestLastChangeDate: 1-01-01
TestLastChangeTime: 19:00:00
TestLastCheckDate: 2008-01-08
TestLastCheckTime: 17:00:35
TestLastError: Failure
TestLastEventDate: 2008-01-08
TestLastEventTime: 17:02:11
TestLastFailureDate: 2008-01-08
TestLastFailureTime: 17:02:11
TestLastRecoveryTime: 16:50:44
TestName: Test Ping Enhanced
TestNextCheckDate: 2008-01-08
TestNextCheckTime: 17:05:35
TestPassword:
TestPlugin: Script
TestService: Ping Enhanced
TestStatusExact: BadNow
TestStatusPhrase: just failed
TestStatusType: Bad
TestURL: n/a
TestUsername:

Picture of the Service setup:

Picture of a Failure:

Picture of a Success:

Picture of the Test setup:

Here is the updated script:

#Original script By Brian Smith and Derek
#
#Modified STDERR to STDOUT for use with Simon, www.dejal.com, Justin Miller, RightMinds

#!/usr/bin/perl
use Getopt::Long;

$packetLoss = -1;
$count=0;
$timeSum=0;
$ServerName = "";
$TimeThreshold = 10.0;
$PacketLossThreshold = 1;
$result = GetOptions("server=s" => \$ServerName, "time=f" => \$TimeThreshold, "loss=i" => \$PacketLossThreshold);
if (length($ServerName)==0 || $result == 0) {
&Usage;
exit -1;
}
open(PING,"/sbin/ping -c 10 $ServerName|");
while (<PING>) {
if (/time=(\S+)\s/) {
$count++;
$timeSum += $1;
} elsif (/\s(\S+)%\spacket\sloss/) {
$packetLoss = $1;
}
}
close(PING);
$average = $timeSum/$count;
if ($average > $TimeThreshold || $packetLoss > $PacketLossThreshold) {
print STDOUT "\n$average Average Ping Time\n";
print STDOUT "$packetLoss% Packet Loss\n\n";
open(TRACEROUTE,"/usr/sbin/traceroute -m 20 $ServerName|");
while(<TRACEROUTE>) {
print STDOUT $_;
}
close(TRACEROUTE);
exit 1;
} else {
exit 0;
}

sub Usage {
print STDERR "Usage:  $0 --server=SERVERNAME --time=TIMETHRESHOLD --loss=LOSSTHRESHOLD\n";
print STDERR "where:\n\tSERVERNAME = fully qualified server name\n";
print STDERR "\tTIMETHRESHOLD = Time Threshold in ms (default 10ms)\n";
print STDERR "\tLOSSTHRESHOLD = Packet Loss Threshold Percentage (default 1%)\n";
}

David Sinclair's picture

Re: I think I figured this out...

Do you have anything in the Smart Change Detection section of the Edit Test window? That could explain no output for {TestChangeText}.

No, I don't have anything in

No, I don't have anything in there. Because the success output is always null, wouldn't anything in there cause constant failure? I've tried clicking the "Look for changes in output" option and leaving the start and end blank.

Ideas?

David Sinclair's picture

Re: No, I don't have anything in

Ah, sorry, I don't know what I was thinking with my previous reply.

If it's a failure, the change text is blank, as that only applies to successful changes.

For failures, the only relevant output is the error message.

I am thinking about changing this in a future version, to have a new output variable that has the output of the check, whether a success or failure; this would have to be distinct from the change text, as that is used to detect a difference from the previous successful check.

So output from the perl

So output from the perl script executed via bash shell is not captured? So it would have to be embedded as a perl script in order to function?

David Sinclair's picture

Re: So output from the perl

The script output is captured for successes. It is currently not captured for failures; in that case, the error is captured instead.

derektom's picture

Perl script to do job but without Simon

My colleague Brian Smith was able to modify the Perl script to work without using Simon. It mails out using the Unix mail command. I'm using cron to schedule the checks.

David, I'm not sure how you feel about this... If you're OK with it, I can post the code here.

Thanks,
Derek

David Sinclair's picture

Re: Perl script to do job but without Simon

Obviously I'd prefer the script used Simon, but yours is a fairly specialized need, so making convenience compromises makes sense.

You could of course bypass the need for cron by having Simon run the script as you did before, and just bypass Simon for the notification.

Sure, go ahead and post the script. Others may find it useful.

derektom's picture

Updated Perl script

OK, first, here is a sample of what the email alert would look like (in the middle is a helpful summary section):

From: Username_on_HOST1
Sent: Sunday, January 06, 2008 10:16 PM
To: Recipients
Subject: ALERT: HOST1=>HOST2 connection poor

PING HOST2 (XXX.XXX.XXX.XXX): 56 data bytes
64 bytes from XXX.XXX.XXX.XXX: icmp_seq=0 ttl=113 time=39.885 ms
64 bytes from XXX.XXX.XXX.XXX: icmp_seq=1 ttl=113 time=39.123 ms
64 bytes from XXX.XXX.XXX.XXX: icmp_seq=2 ttl=113 time=39.107 ms
64 bytes from XXX.XXX.XXX.XXX: icmp_seq=4 ttl=113 time=38.78 ms
64 bytes from XXX.XXX.XXX.XXX: icmp_seq=5 ttl=113 time=39.244 ms
64 bytes from XXX.XXX.XXX.XXX: icmp_seq=6 ttl=113 time=39.354 ms
64 bytes from XXX.XXX.XXX.XXX: icmp_seq=7 ttl=113 time=41.027 ms
64 bytes from XXX.XXX.XXX.XXX: icmp_seq=8 ttl=113 time=38.58 ms
64 bytes from XXX.XXX.XXX.XXX: icmp_seq=9 ttl=113 time=39.404 ms
64 bytes from XXX.XXX.XXX.XXX: icmp_seq=10 ttl=113 time=38.97 ms
64 bytes from XXX.XXX.XXX.XXX: icmp_seq=11 ttl=113 time=39.088 ms
64 bytes from XXX.XXX.XXX.XXX: icmp_seq=12 ttl=113 time=44.176 ms
64 bytes from XXX.XXX.XXX.XXX: icmp_seq=13 ttl=113 time=39.445 ms
64 bytes from XXX.XXX.XXX.XXX: icmp_seq=15 ttl=113 time=44.502 ms
64 bytes from XXX.XXX.XXX.XXX: icmp_seq=16 ttl=113 time=38.743 ms
64 bytes from XXX.XXX.XXX.XXX: icmp_seq=17 ttl=113 time=39.12 ms
64 bytes from XXX.XXX.XXX.XXX: icmp_seq=18 ttl=113 time=39.039 ms
64 bytes from XXX.XXX.XXX.XXX: icmp_seq=19 ttl=113 time=39.11 ms

--- HOST2 ping statistics ---
20 packets transmitted, 18 packets received, 10% packet loss
round-trip min/avg/max = 38.58/39.816/44.502 ms


HOSTS: HOST1 => HOST2
DATE: 2008-01-06
TIME: 22:15:00
LATENCY: 39 ms (threshold = 80 ms)
PACKET LOSS: 10% (threshold = 5%)

TRACEROUTE:
traceroute to HOST2 (XXX.XXX.XXX.XXX), 15 hops max, 40 byte packets
1  fw.HOST1.com (XXX.XXX.XXX.2)  1.445 ms  0.847 ms  0.51 ms
2  XXX.XXX.XXX.1 (XXX.XXX.XXX.1)  1.229 ms  1.277 ms  1.028 ms
3  atm0-0-0-27-r10.starhub.net.sg (203.117.161.85)  2.972 ms  3.115 ms  3.018 ms
4  vlan902-cat6tl1-rsm2.starhub.net.sg (203.118.2.3)  3.133 ms  3.253 ms  3.035 ms
5  203.118.0.229 (203.118.0.229)  3.302 ms  3.495 ms  3.042 ms
6  203.118.3.224 (203.118.3.224)  3.137 ms  3.293 ms  2.834 ms
7  203.98.128.65 (203.98.128.65)  34.645 ms  34.633 ms  37.842 ms
8  203.98.161.23 (203.98.161.23)  36.215 ms  35.995 ms  36.469 ms
9  * * *
10  v152.tmhc1.pacific.net.hk (202.64.4.1)  38.78 ms  38.429 ms  40.611 ms
11  v203.tmhs3503.pacific.net.hk (202.64.147.234)  38.575 ms  38.785 ms  38.557 ms
12  220.232.190.178 (220.232.190.178)  43.218 ms *  38.608 ms
13  220.232.190.212 (220.232.190.212)  40.545 ms  40.518 ms  41.007 ms
14  * * *
15  * * *

Here is the Perl script that should be named "networkTest.pl":

#!/usr/bin/perl
use Getopt::Long;

#
# Variable Initialization
#
$hostname = `hostname`; chomp $hostname; # Hostname of this server
$SUBJECT = "ALERT: HOST1=>HOST2 connection poor"; # Email Subject Line
$TimeThreshold = 10.0; # Default max average ping time for alert (ms)
$PacketLossThreshold = 10; # Default max packet loss for alert (percent)

($sec,$min,$hour,$dayOfMonth,$mon,$year) = localtime(time); # Separate current time into variables
$date = sprintf("%04.4d-%02.2d-%02.2d",$year+1900,$mon+1,$dayOfMonth); # Formatted date string YYYY-MM-DD
$time = sprintf("%02.2d:%02.2d:%02.2d",$hour,$min,$sec); # Formatted time HH:MM:SS
$packetLoss = -1; # Percent packet loss
$count=0; # Number of pings
$timeSum=0; # Total time required for all pings
$email=""; # List of emails passed to script
$ping=""; # Ping results
$distribution=""; # space separated list of email addresses
$output=""; # Diagnostic output
$ServerName = ""; # Name of server to ping
#
# Get command line options
#
$result = GetOptions("server=s" => \$ServerName, "time=f" => \$TimeThreshold, "loss=i" => \$PacketLossThreshold,
     "email=s" => \$email);
#
# Double check usage
#
if (length($ServerName)==0 || $result == 0) {
&Usage;
exit -1;
}
#
# Reformat email list
if (length($email)>0) {
$distribution = join(" ",split(/[\,\s+]/,$email));
}
#
# Ping Test
#
open(PING,"/sbin/ping -c 20 $ServerName|");
while (<PING>) {
if (/time=(\S+)\s/) { # Regular expression to extract ping time
$count++;
$timeSum += $1;
} elsif (/\s(\S+)%\spacket\sloss/) { # Regular expression to extract packet loss
$packetLoss = $1;
}
$ping .= $_;
}
close(PING);
$average = sprintf("%d",$timeSum/$count); # Avg, as an integer
#$average = sprintf("%6.2f",$timeSum/$count); # Avg, with 2 places after decimal
#
# Run diagnostic if necessary
#
if ($average > $TimeThreshold || $packetLoss > $PacketLossThreshold) {

$output .= "HOSTS: HOST1 => $ServerName\n";
$output .= "DATE: $date\n";
$output .= "TIME: $time\n";
$output .= "LATENCY: $average ms (threshold = ${TimeThreshold} ms)\n";
$output .= "PACKET LOSS: $packetLoss% (threshold = ${PacketLossThreshold}%)\n\n";
$output .= "TRACEROUTE:\n";
open(TRACEROUTE,"/usr/sbin/traceroute -m 15 $ServerName 2>&1 |");
while(<TRACEROUTE>) {
$output .= $_;
}
close(TRACEROUTE);
if (length($distribution)>0) {
open(MAIL,"|mail -s \"\"\"$SUBJECT\"\"\" $distribution");
print MAIL $ping . "\n\n";
print MAIL $output;
close MAIL;
} else {
print STDERR $output;
}
exit 1;
} else {
exit 0;
}

sub Usage {
print STDERR "Usage:  $0 --server=SERVERNAME --time=TIMETHRESHOLD --loss=LOSSTHRESHOLD --email=EMAILLIST\n";
print STDERR "where:\n\tSERVERNAME = fully qualified server name\n";
print STDERR "\tTIMETHRESHOLD = Time Threshold in ms (default 100ms)\n";
print STDERR "\tLOSSTHRESHOLD = Packet Loss Threshold Percentage (default 10%)\n";
print STDERR "\tEMAILLIST = List of email address to send results to\n";
}

In the above code, you'd want to edit the following:

  1. line 8: email subject
  2. line 61: name of HOST1 (source hostname)
  3. line 67: number after -m switch is the max hops

Command to run the script:
perl networkTest.pl --server=HOST2 --time=3 --loss=0 --email=name1@domain.com,name2@domain.com

HOST2 should be replaced by the hostname of the destination server (e.g. www.google.com); value after --time= is the latency threshold in milliseconds (e.g. 80 for 80ms); value after --loss is the packet loss threshold as a percentage (e.g. 5 for 5%); and addresses after --email= are the recipient email addresses (separate multiple addresses by a single comma, no spaces).

To successfully run the script, you need to make sure the networkTest.pl file is executable (chmod 755 networkTest.pl) and is located in your PATH environment variable settings or you specify the full path to the script.

I use CronniX to set a System cron schedule to run this. To run this every 10 minutes, the schedule is like this:
*/10 * * * *

Hope this is helpful.

Credit to my colleague Brian Smith at BBDO Detroit for coming up with this awesome Perl script!

Cheers,
Derek