Simon icon Simon
Flexible server monitoring

Simon App Health testing (Redux 2)

Back in 2008 and again a couple of years ago I asked about a feature that would allow another service (or another Simon server) to monitor Simon to make sure that it's alive and running:

http://www.dejal.com/forums/2012/08/27/test-simon-app-health-redux

It's been a few major revisions since then and (unless I missed it) still nothing....even if it's a rare occurrence, this is a big hole in Simon's featureset, one that can really bite you in the butt if Simon or OS X crashes or freezes (even from causes like bad memory or HD on the box) and you're depending on it for notifications about critical infrastructure.

Pinging the box that Simon is running on is not sufficient: in addition to telling you nothing about the health or existence of the Simon app, I've seen OS X soldiering on replying to pings while the upper levels of the OS are completely hosed and unresponsive.

Any chance that we'll see something soon that allows us to monitor Simon's health externally?

Thanks,
John

David Sinclair's picture

Re: Simon App Health testing (Redux 2)

Hi John,

I still haven't got around to splitting Simon up into multiple components, as mentioned in those previous threads. But I am inching closer to that: my Time Out 2 app does use that approach, so I should be able to use that technique in a future version of Simon. The Simon app is an order of magnitude more complex than Time Out, so I consider Time Out 2 a smaller-scale trial run for this technique.

In the meantime, I'm wondering if there is anything else that could be done. Simon does use distributed notifications, that another app could monitor. So one option is I could write a small standalone app to do so, though I'm not sure if that'd have any benefit, since it'd still be on the same machine.

David Sinclair's picture

Re: Simon App Health testing (Redux 2)

Oh, one thought for a way to do this now — a Simon instance on another machine could monitor a report output by Simon (either uploaded to a website, or via local web sharing).

The test on that other machine could use filters to look at the report date, and result in a failure if that doesn't change. And have the report update it self more frequently than the test.

Simple and feasible now.