Simon icon Simon
Flexible server monitoring

Best Practices in monitoring

Hi folks,
Just purchased a license and am enjoying Simon's "visualness" when it comes to server monitoring.

I'm hoping someone can speak of best practices when it comes to server monitoring. I'm not sure *how* to interpret results from Simon. For example, I have a ping and HTTP monitor created for one target website, but I'm not sure how to interpret findings.

Overall, these are the two questions I'm essentially driving at:
-- how can I properly evaluate server response time w/ Simon? (Can Simon tell me if a site/server is "slow"?)
-- how can I properly evaluate server uptime w/Simon? (Can Simon tell me if a server is "down"?)

Here are some specific questions that are raised:
-- for a ping monitor - what's considered appropriate for a "failure if" setting?
-- how often should I schedule a ping test? How many packets should I send?
-- for a HTTP test, why is one check "yellow" with a 1 sec check duration while another server's HTTP check "green" with a 2.2 sec check duration?
-- in evaluating server uptime, is ping or HTTP test more appropriate ?

I checked the forums and documents and didn't come up with answers but if I missed something, please share.

Thanks,
Kristopher

David Sinclair's picture

Re: Best Practices in monitoring

Hi Kristopher. Thanks for buying Simon.

Hopefully others will chime in to give their own thoughts on these questions, but here are my answers:

  • Tell if a server is slow: You can look at the Checks log to get a feel for this. It reports the duration of the check, so if it starts taking a longer time, you'll see. You can also use the Ping service if you want to be notified when it becomes too slow.
  • Tell if a server is down: Yes, Simon will indicate a down server with a red downward status icon, and it can notify you when this occurs, and when it recovers again, if desired.
  • Ping: I've tried to set sensible default values. You can of course change them if you have different needs.
  • Frequency of checks: Again, you can go with the defaults, or change if desired. The default frequency will alert you if things are seriously wrong, without driving too much extra traffic to your site.
  • Status icons: The yellow status icon means the test had failed, then recovered. Green means it had a change. The status icons start off as grey, and change to bright green when a change occurs, then slowly fade. If a failure occurs, it changes to a downward red triangle, then when it recovers from the failure it changes to upward orange, which fades (through yellow) over time. If you prefer simpler status icons, that just indicate success in green and failure in red, instead of the time-sensitive ones, there's a preference for that.
  • Server uptime: HTTP is probably better for checking websites, in that it tells you that everything is working properly: people can view the pages correctly. Ping is better for other kinds of servers, or if you want to check response time and dropped packets. For important sites, using both has merit.

I hope this helps. Let us know if you have any further questions. And if anyone else wants to add their perspective, please do!

Wow!

Thanks Dave for all this feedback! Very helpful. I've setup both pings an HTTP checks using defaults.

This leaves me with two follow up questions:

1. In an HTTP and Ping checks, I have days old checks still green. Given the info above about the status icons, shouldn't all green icons eventually fade (to grey)?

2. When you say "recover from failure" we can define as the next test being successful, right?

3. Actually, one more q. Can the "check date" display be changed to display 12 hour (am/pm) time?

Thanks!
K

David Sinclair's picture

Re: Wow!

Answers:

  1. When I said "eventually", it actually takes a year of no further events to get to fully neutral grey. Each interval of an hour, a day, a week, a month, and a year is a shade lighter than the previous.
  2. Correct. Recovery is success after a previous failure. Useful to know as indication that it has failed recently.
  3. You can change the View Options to display absolute date/times instead of relative intervals. (And many more columns can be shown, too.)