Simon icon Simon
Flexible server monitoring

All tests are failing but in fact, they aren't

I've got about 125 tests in Simon and last week they all decided to fail at once.

I didn't make much of it the first time, so I re-launched Simon and all the tests recovered. Two days later, the same thing happened, all my tests failed again. I did the same thing as the first time, re-launched Simon. Another two days later (today), the same thing happened and the usual fix did it.

Although it's getting annoying because whenever a test fails more than 2 times, I receive both an e-mail and a SMS.

Here are 3 screenshots of my Simon window:

http://bit.ly/wOK7e
http://bit.ly/C0ezP
http://bit.ly/2fuURm

I never heard of this bug before, so I suspect it has something to do plist, preferences, or something like that. Hopefully someone has an idea or suggestion for me to try.

I'm running Simon on Mac OS X Leopard Server and I've got the Simon Enterprise license.

Any help is appreciated.

wedix's picture

Just happened again, I have

Just happened again, I have no idea why its doing this.

David Sinclair's picture

Re: All tests are failing but in fact, they aren't

I'm sorry that you're having difficulty.

Interesting that the duration is exactly 1.0 minutes for all of them — I'm guessing that is your time out interval for all tests? (The default is 3 minutes.)

I would guess that the application is running out of resources, and thus failing.

What frequency do you check the tests? Checking too many at once could use up the resources — each application can only handle so much at once. I try to manage things to avoid that situation, though.

You may be able to confirm this, or provide helpful diagnostic info, by turning on the available debug logging, e.g. WebDebugMode, PortDebugMode, PingDebugMode.

Quit Simon, then enter these lines into Terminal:

defaults write com.dejal.simon2 WebDebugMode YES
defaults write com.dejal.simon2 PortDebugMode YES
defaults write com.dejal.simon2 PingDebugMode YES

(or whichever one of those services you use)

Then send me the Console log output when all the tests are failing.

In a future version, I plan to split each check out to a separate process (some already do use separate processes, e.g. Port-based tests), which should eliminate resource-based issues. I might have to consider bringing that feature forward to 2.6.

wedix's picture

Interesting

I modified the setting "Interval between checks" to 5 seconds instead of 1 seconds like it was before. Do you think it should help? Most of my tests are taking about a second to complete so they are never more than 1 test running at the same time (at least not that I can see in the Checks section).

The default timeout is 1 minute yes, the checks runs every 10 minutes and if a failure occurs the test will run again in 30 seconds. I wouldn't mind changing these values although doing it for over 100 tests is a pain. Although I've located the Tests.simon xml file, I just need some details on the Interval Units variable, does 0 = seconds and 1 = minutes?

Thanks!

David Sinclair's picture

Re: Interesting

Yes, 0 = seconds, 1 = minutes, etc.

The interval between checks preference applies to manual checks; scheduled checks occur at whatever frequency from the previous check.

wedix's picture

Interval between checks did it.

Just wanted to let you know what was going on with this issue. I changed the interval between checks 4 days ago and I didn't have this problem since then.

David Sinclair's picture

Re: Interval between checks did it.

Glad to hear it. Makes sense... the more tests you have, the more time you need to allow for them to avoid bogging down the OS's resources.

Thanks for the update.