Simon icon Simon
Flexible server monitoring

Bogus HTTP/HTTPS test failures

I upgraded to v3 recently from v2, and a few times in the ensuing month, I've had a few of my HTTP and HTTPS tests start timing out for no apparent reason. This is reported this as a failure and it continues for at least 30 minutes (the longest that I've let Simon run in that state, to see if it recovers). I can go into the test's Preview and sure enough it just sits there with the barber pole going until it times out.

On the same machine that's running Simon and while Simon is actively reporting the timeouts, I can fire up Safari and perform the same simple checks manually and they work fine (no timeouts, response is immediate), so it doesn't seem to be an issue with the machine, the network or the OS.

If I just quit the Simon app and relaunch it, the tests immediately start working properly, again seemingly pointing to an issue with Simon.

I'm running this on a PowerMac G5 running 10.5.8.

Has anyone else seen this?

It seems pretty intermittent and random, but is there anything that I can do to gather troubleshooting info when the situation occurs?

Thanks,
John

David Sinclair's picture

Re: Bogus HTTP/HTTPS test failures

Unfortunately I have had a few reports similar to that. It seems that in some situations it is possible for Simon to get overloaded — using up resources. I am unable to recreate this, and it doesn't affect most people, but can affect some heavy users.

I am working on version 3.1 which includes major changes to the data storage (see my recent blog post), which may help, but importantly is an essential step towards the ultimate solution of splitting the checking into a separate process.

I also plan to introduce a checking limiter feature to avoid checking too many tests at once, which may also help.

If this situation occurs again, please send me the Console log. It might provide clues as to what is going on.

JohnDCCIU's picture

Re: Bogus HTTP/HTTPS test failures

It has happened a couple of times in the past; this is just the latest incident (and the one that I got time to report ;-).

Here's the All Messages log from the time when it happened:

=============================================

5/9/11 8:00:06 AM Simon[85347] NSUncaughtSystemExceptionException: Uncaught system exception: signal 10
5/9/11 8:00:07 AM Simon[85347] An uncaught exception was raised
5/9/11 8:00:07 AM Simon[85347] Uncaught system exception: signal 10
5/9/11 8:00:07 AM Simon[85347] NSUncaughtSystemExceptionException: Uncaught system exception: signal 10
5/9/11 8:00:06 AM [0x0-0x17e07df].com.dejal.simon[85347] sh: /usr/bin/atos: No such file or directory
5/9/11 8:00:07 AM [0x0-0x17e07df].com.dejal.simon[85347] head: illegal line count -- +-2
5/9/11 8:00:07 AM [0x0-0x17e07df].com.dejal.simon[85347] sh: /usr/bin/atos: No such file or directory
5/9/11 8:00:36 AM Simon[85347] Pantomime: Connected!
5/9/11 8:00:36 AM Simon[85347] Pantomime: Connected!
5/9/11 8:00:36 AM Simon[85347] Pantomime: Sending the message...
5/9/11 8:00:36 AM Simon[85347] Pantomime: Sending the message...
5/9/11 8:00:36 AM Simon[85347] Pantomime: Sent!
Closing the connection.
5/9/11 8:00:36 AM Simon[85347] Pantomime: Sent!
Closing the connection.
5/9/11 8:01:51 AM Simon[85347] Pantomime: Connected!
5/9/11 8:01:51 AM Simon[85347] Pantomime: Sending the message...
5/9/11 8:01:51 AM Simon[85347] Pantomime: Sent!
Closing the connection.
5/9/11 8:02:02 AM Simon[85347] Pantomime: Connected!
5/9/11 8:02:02 AM Simon[85347] Pantomime: Sending the message...
5/9/11 8:02:02 AM Simon[85347] Pantomime: Sent!
Closing the connection.
5/9/11 8:02:04 AM Simon[85347] Pantomime: Connected!
5/9/11 8:02:04 AM Simon[85347] Pantomime: Sending the message...
5/9/11 8:02:04 AM Simon[85347] Pantomime: Sent!
Closing the connection.
5/9/11 8:02:11 AM Simon[85347] Pantomime: Connected!
5/9/11 8:02:11 AM Simon[85347] Pantomime: Connected!
5/9/11 8:02:11 AM Simon[85347] Pantomime: Sending the message...
5/9/11 8:02:11 AM Simon[85347] Pantomime: Sending the message...
5/9/11 8:02:11 AM Simon[85347] Pantomime: Sent!
Closing the connection.
5/9/11 8:02:11 AM Simon[85347] Pantomime: Sent!
Closing the connection.
5/9/11 8:03:52 AM Simon[85347] Pantomime: Connected!
5/9/11 8:03:53 AM Simon[85347] Pantomime: Sending the message...
5/9/11 8:03:53 AM Simon[85347] Pantomime: Sent!
Closing the connection.
5/9/11 8:04:03 AM Simon[85347] Pantomime: Connected!
5/9/11 8:04:03 AM Simon[85347] Pantomime: Sending the message...
5/9/11 8:04:03 AM Simon[85347] Pantomime: Sent!
Closing the connection.
5/9/11 8:04:37 AM /usr/sbin/ocspd[22676] starting
5/9/11 8:05:46 AM Simon[22688] Pantomime: Connected!
5/9/11 8:05:46 AM Simon[22688] Pantomime: Sending the message...
5/9/11 8:05:46 AM Simon[22688] Pantomime: Sent!
Closing the connection.
5/9/11 8:05:55 AM Simon[22688] Pantomime: Connected!
5/9/11 8:05:56 AM Simon[22688] Pantomime: Sending the message...
5/9/11 8:05:56 AM Simon[22688] Pantomime: Sent!
Closing the connection.
5/9/11 8:05:58 AM Simon[22688] Pantomime: Connected!
5/9/11 8:05:59 AM Simon[22688] Pantomime: Sending the message...
5/9/11 8:05:59 AM Simon[22688] Pantomime: Sent!
Closing the connection.
5/9/11 8:06:02 AM Simon[22688] Pantomime: Connected!
5/9/11 8:06:02 AM Simon[22688] Pantomime: Sending the message...
5/9/11 8:06:02 AM Simon[22688] Pantomime: Sent!
Closing the connection.
5/9/11 8:06:08 AM Simon[22688] Pantomime: Connected!
5/9/11 8:06:08 AM Simon[22688] Pantomime: Connected!
5/9/11 8:06:09 AM Simon[22688] Pantomime: Sending the message...
5/9/11 8:06:09 AM Simon[22688] Pantomime: Sending the message...
5/9/11 8:06:09 AM Simon[22688] Pantomime: Sent!
Closing the connection.
5/9/11 8:06:09 AM Simon[22688] Pantomime: Sent!
Closing the connection.

David Sinclair's picture

Re: Bogus HTTP/HTTPS test failures

Thanks. That log shows an uncaught exception somewhere, but doesn't identify where. It also shows Simon continuing to successfully send email notifications after that, so it must still be operating at least partially.

Pity it doesn't show where the exception occurred. I try to catch every possible place an exception could occur, but I must be missing one.

Simon tries to use the "atos" command-line tool to output a stack backtrace to identify where the exception occurred, but that is only available if you have Apple's developer tools installed. You could install them if you wish, though it isn't necessary.

The major data changes in version 3.1 will likely change things significantly, so this issue may go away as a result. And if not, the app split planned for 3.2 should definitely do so.

JohnDCCIU's picture

Re: Bogus HTTP/HTTPS test failures

Ok, I installed XCode 3; hopefully it will give more information in the log during the next incident. If not, we'll see if 3.1 or 3.2 does the trick.

JohnDCCIU's picture

Re: Bogus HTTP/HTTPS test failures

Just had another incident, now with XCode installed. Here's the Console log, minus all the Pantomime stuff.

=======================================================

5/31/11 7:13:52 AM Simon[238] NSUncaughtSystemExceptionException: Uncaught system exception: signal 10

5/31/11 7:13:52 AM [0x0-0x1b01b].com.dejal.simon[238] head: illegal line count -- +-2
5/31/11 7:13:53 AM Simon[238] An uncaught exception was raised

5/31/11 7:13:53 AM Simon[238] Uncaught system exception: signal 10
5/31/11 7:13:53 AM Simon[238] NSUncaughtSystemExceptionException: Uncaught system exception: signal 10

5/31/11 7:13:53 AM [0x0-0x1b01b].com.dejal.simon[238] 1 CFRunLoopRunSpecific (in CoreFoundation) + 3792
5/31/11 7:13:53 AM [0x0-0x1b01b].com.dejal.simon[238] 2 +[NSURLConnection(NSURLConnectionReallyInternal) _resourceLoadLoop:] (in Foundation) + 284

David Sinclair's picture

Re: Bogus HTTP/HTTPS test failures

Thanks. Wasn't there any more lines than that? I'd expect many more lines in the log after the last one pasted there. Those lines just show OS-level stuff.

It could be very helpful if you could provide any additional lines from the log.

It'd also be worthwhile to update to version 3.1, due out today (or the beta out now), which includes changes in a likely related area, to see if that helps.

JohnDCCIU's picture

Re: Bogus HTTP/HTTPS test failures

Sorry, that's pretty much all there was, except for a bunch of Pantomime stuff....I included more log lines below from the system.log, from just before and then after the problem hit.

I updated to 3.1.1 today and will let you know if it occurs again....it's very intermittent, so who knows when....

Thanks,
John

May 31 07:13:52 simon Simon[238]: NSUncaughtSystemExceptionException: Uncaught system exception: signal 10
May 31 07:13:52 simon [0x0-0x1b01b].com.dejal.simon[238]: head: illegal line count -- +-2
May 31 07:13:53 simon Simon[238]: An uncaught exception was raised
May 31 07:13:53 simon Simon[238]: Uncaught system exception: signal 10
May 31 07:13:53 simon Simon[238]: NSUncaughtSystemExceptionException: Uncaught system exception: signal 10
May 31 07:13:53 simon [0x0-0x1b01b].com.dejal.simon[238]: 1 CFRunLoopRunSpecific (in CoreFoundation) + 3792
May 31 07:13:53 simon [0x0-0x1b01b].com.dejal.simon[238]: 2 +[NSURLConnection(NSURLConnectionReallyInternal) _resourceLoadLoop:] (in Foundation) + 284
May 31 07:15:14 simon Simon[238]: Pantomime: Connected!
May 31 07:15:14 simon Simon[238]: Pantomime: Sending the message...
May 31 07:15:14 simon Simon[238]: Pantomime: Sent!\nClosing the connection.
May 31 07:15:40 simon Simon[238]: Pantomime: Connected!
May 31 07:15:40 simon Simon[238]: Pantomime: Sending the message...
May 31 07:15:40 simon Simon[238]: Pantomime: Sent!\nClosing the connection.
May 31 07:15:40 simon Simon[238]: Pantomime: Connection closed.
May 31 07:16:22 simon Simon[238]: Pantomime: Connected!
May 31 07:16:22 simon Simon[238]: Pantomime: Sending the message...
May 31 07:16:23 simon Simon[238]: Pantomime: Sent!\nClosing the connection.
May 31 07:16:23 simon Simon[238]: Pantomime: Connection closed.
May 31 07:17:15 simon Simon[238]: Pantomime: Connected!
May 31 07:17:15 simon Simon[238]: Pantomime: Sending the message...
May 31 07:17:15 simon Simon[238]: Pantomime: Sent!\nClosing the connection.
May 31 07:17:41 simon Simon[238]: Pantomime: Connected!
May 31 07:17:41 simon Simon[238]: Pantomime: Sending the message...
May 31 07:17:41 simon Simon[238]: Pantomime: Sent!\nClosing the connection.
May 31 07:17:44 simon Simon[238]: Pantomime: Connected!
May 31 07:17:45: --- last message repeated 1 time ---
May 31 07:17:45 simon Simon[238]: Pantomime: Sending the message...
May 31 07:17:45: --- last message repeated 1 time ---
May 31 07:17:45 simon Simon[238]: Pantomime: Sent!\nClosing the connection.
May 31 07:17:52: --- last message repeated 1 time ---
May 31 07:17:52 simon Simon[238]: Pantomime: Connected!
May 31 07:17:52: --- last message repeated 1 time ---
May 31 07:17:52 simon Simon[238]: Pantomime: Sending the message...
May 31 07:17:52: --- last message repeated 1 time ---
May 31 07:17:52 simon Simon[238]: Pantomime: Sent!\nClosing the connection.
May 31 07:17:52: --- last message repeated 1 time ---
May 31 07:17:52 simon Simon[238]: Pantomime: Connection closed.
May 31 07:19:22 simon gssd-agent[54413]: krb5_get_default_realms: could not get default realm: Unknown Error Code: -1765328160
May 31 07:19:22 simon gssd-agent[54413]: Error returned by svc_mach_gss_init_sec_context:
May 31 07:19:22 simon gssd-agent[54413]: Major error <1> Unspecified GSS failure. Minor code may provide more information
May 31 07:19:22 simon gssd-agent[54413]: Minor error <1> Unknown Error Code: 19777
May 31 07:22:16 simon Simon[238]: Pantomime: Connected!
May 31 07:22:16 simon Simon[238]: Pantomime: Sending the message...
May 31 07:22:16 simon Simon[238]: Pantomime: Sent!\nClosing the connection.
May 31 07:22:42 simon Simon[238]: Pantomime: Connected!
May 31 07:22:42 simon Simon[238]: Pantomime: Sending the message...
May 31 07:22:42 simon Simon[238]: Pantomime: Sent!\nClosing the connection.
May 31 07:22:45 simon Simon[238]: Pantomime: Connected!
May 31 07:22:45: --- last message repeated 1 time ---
May 31 07:22:45 simon Simon[238]: Pantomime: Sending the message...
May 31 07:22:45: --- last message repeated 1 time ---
May 31 07:22:45 simon Simon[238]: Pantomime: Sent!\nClosing the connection.
May 31 07:22:52: --- last message repeated 1 time ---
May 31 07:22:52 simon Simon[238]: Pantomime: Connected!
May 31 07:22:52 simon Simon[238]: Pantomime: Sending the message...
May 31 07:22:52 simon Simon[238]: Pantomime: Sent!\nClosing the connection.
May 31 07:27:17 simon Simon[54517]: Pantomime: Connected!
May 31 07:27:18 simon Simon[54517]: Pantomime: Sending the message...
May 31 07:27:18 simon Simon[54517]: Pantomime: Sent!\nClosing the connection.
May 31 07:27:23 simon Simon[54517]: Pantomime: Connected!
May 31 07:27:23 simon Simon[54517]: Pantomime: Sending the message...
May 31 07:27:24 simon Simon[54517]: Pantomime: Sent!\nClosing the connection.
May 31 07:27:25 simon Simon[54517]: Pantomime: Connected!
May 31 07:27:25: --- last message repeated 1 time ---
May 31 07:27:25 simon Simon[54517]: Pantomime: Sending the message...
May 31 07:27:25: --- last message repeated 1 time ---
May 31 07:27:25 simon Simon[54517]: Pantomime: Sent!\nClosing the connection.
May 31 07:27:31: --- last message repeated 1 time ---
May 31 07:27:31 simon Simon[54517]: Pantomime: Connected!
May 31 07:27:32 simon Simon[54517]: Pantomime: Sending the message...
May 31 07:27:32 simon Simon[54517]: Pantomime: Sent!\nClosing the connection.
May 31 07:27:32 simon Simon[54517]: Pantomime: Connection closed.
May 31 07:27:38 simon Simon[54517]: Pantomime: Connected!
May 31 07:27:38 simon Simon[54517]: Pantomime: Sending the message...
May 31 07:27:38 simon Simon[54517]: Pantomime: Sent!\nClosing the connection.
May 31 07:13:52 simon Simon[238]: NSUncaughtSystemExceptionException: Uncaught system exception: signal 10
May 31 07:13:52 simon [0x0-0x1b01b].com.dejal.simon[238]: head: illegal line count -- +-2
May 31 07:13:53 simon Simon[238]: An uncaught exception was raised
May 31 07:13:53 simon Simon[238]: Uncaught system exception: signal 10
May 31 07:13:53 simon Simon[238]: NSUncaughtSystemExceptionException: Uncaught system exception: signal 10
May 31 07:13:53 simon [0x0-0x1b01b].com.dejal.simon[238]: 1 CFRunLoopRunSpecific (in CoreFoundation) + 3792
May 31 07:13:53 simon [0x0-0x1b01b].com.dejal.simon[238]: 2 +[NSURLConnection(NSURLConnectionReallyInternal) _resourceLoadLoop:] (in Foundation) + 284
May 31 07:15:14 simon Simon[238]: Pantomime: Connected!
May 31 07:15:14 simon Simon[238]: Pantomime: Sending the message...
May 31 07:15:14 simon Simon[238]: Pantomime: Sent!\nClosing the connection.
May 31 07:15:40 simon Simon[238]: Pantomime: Connected!
May 31 07:15:40 simon Simon[238]: Pantomime: Sending the message...
May 31 07:15:40 simon Simon[238]: Pantomime: Sent!\nClosing the connection.
May 31 07:15:40 simon Simon[238]: Pantomime: Connection closed.
May 31 07:16:22 simon Simon[238]: Pantomime: Connected!
May 31 07:16:22 simon Simon[238]: Pantomime: Sending the message...
May 31 07:16:23 simon Simon[238]: Pantomime: Sent!\nClosing the connection.
May 31 07:16:23 simon Simon[238]: Pantomime: Connection closed.
May 31 07:17:15 simon Simon[238]: Pantomime: Connected!
May 31 07:17:15 simon Simon[238]: Pantomime: Sending the message...
May 31 07:17:15 simon Simon[238]: Pantomime: Sent!\nClosing the connection.
May 31 07:17:41 simon Simon[238]: Pantomime: Connected!
May 31 07:17:41 simon Simon[238]: Pantomime: Sending the message...
May 31 07:17:41 simon Simon[238]: Pantomime: Sent!\nClosing the connection.
May 31 07:17:44 simon Simon[238]: Pantomime: Connected!
May 31 07:17:45: --- last message repeated 1 time ---
May 31 07:17:45 simon Simon[238]: Pantomime: Sending the message...
May 31 07:17:45: --- last message repeated 1 time ---
May 31 07:17:45 simon Simon[238]: Pantomime: Sent!\nClosing the connection.
May 31 07:17:52: --- last message repeated 1 time ---
May 31 07:17:52 simon Simon[238]: Pantomime: Connected!
May 31 07:17:52: --- last message repeated 1 time ---
May 31 07:17:52 simon Simon[238]: Pantomime: Sending the message...
May 31 07:17:52: --- last message repeated 1 time ---
May 31 07:17:52 simon Simon[238]: Pantomime: Sent!\nClosing the connection.
May 31 07:17:52: --- last message repeated 1 time ---
May 31 07:17:52 simon Simon[238]: Pantomime: Connection closed.
May 31 07:22:16 simon Simon[238]: Pantomime: Connected!
May 31 07:22:16 simon Simon[238]: Pantomime: Sending the message...
May 31 07:22:16 simon Simon[238]: Pantomime: Sent!\nClosing the connection.
May 31 07:22:42 simon Simon[238]: Pantomime: Connected!
May 31 07:22:42 simon Simon[238]: Pantomime: Sending the message...
May 31 07:22:42 simon Simon[238]: Pantomime: Sent!\nClosing the connection.
May 31 07:22:45 simon Simon[238]: Pantomime: Connected!
May 31 07:22:45: --- last message repeated 1 time ---
May 31 07:22:45 simon Simon[238]: Pantomime: Sending the message...
May 31 07:22:45: --- last message repeated 1 time ---
May 31 07:22:45 simon Simon[238]: Pantomime: Sent!\nClosing the connection.
May 31 07:22:52: --- last message repeated 1 time ---
May 31 07:22:52 simon Simon[238]: Pantomime: Connected!
May 31 07:22:52 simon Simon[238]: Pantomime: Sending the message...
May 31 07:22:52 simon Simon[238]: Pantomime: Sent!\nClosing the connection.
May 31 07:27:17 simon Simon[54517]: Pantomime: Connected!
May 31 07:27:18 simon Simon[54517]: Pantomime: Sending the message...
May 31 07:27:18 simon Simon[54517]: Pantomime: Sent!\nClosing the connection.
May 31 07:27:23 simon Simon[54517]: Pantomime: Connected!
May 31 07:27:23 simon Simon[54517]: Pantomime: Sending the message...
May 31 07:27:24 simon Simon[54517]: Pantomime: Sent!\nClosing the connection.
May 31 07:27:25 simon Simon[54517]: Pantomime: Connected!
May 31 07:27:25: --- last message repeated 1 time ---
May 31 07:27:25 simon Simon[54517]: Pantomime: Sending the message...
May 31 07:27:25: --- last message repeated 1 time ---
May 31 07:27:25 simon Simon[54517]: Pantomime: Sent!\nClosing the connection.
May 31 07:27:31: --- last message repeated 1 time ---
May 31 07:27:31 simon Simon[54517]: Pantomime: Connected!
May 31 07:27:32 simon Simon[54517]: Pantomime: Sending the message...
May 31 07:27:32 simon Simon[54517]: Pantomime: Sent!\nClosing the connection.
May 31 07:27:32 simon Simon[54517]: Pantomime: Connection closed.
May 31 07:27:38 simon Simon[54517]: Pantomime: Connected!
May 31 07:27:38 simon Simon[54517]: Pantomime: Sending the message...
May 31 07:27:38 simon Simon[54517]: Pantomime: Sent!\nClosing the connection.

David Sinclair's picture

Re: Bogus HTTP/HTTPS test failures

Weird that the remainder of the backtrace was missing. Hopefully the changes in 3.1 will prevent this issue; I'll be most interested in seeing a log if it occurs again under 3.1.1.

JohnDCCIU's picture

Re: Bogus HTTP/HTTPS test failures

Sorry to report that it happened again this morning, using v3.1.1; below is what was in the log. Any ideas on how I can enable the backtrace? These are getting pretty disruptive, as me and my people really jump when all these systems suddenly are reported down on our email and phones.....

Thanks,
John

6/15/11 9:52:07 AM Simon[234] NSUncaughtSystemExceptionException: Uncaught system exception: signal 10

6/15/11 9:52:07 AM [0x0-0x19019].com.dejal.simon[234] head: illegal line count -- +-2
6/15/11 9:52:07 AM Simon[234] An uncaught exception was raised

6/15/11 9:52:07 AM Simon[234] Uncaught system exception: signal 10
6/15/11 9:52:07 AM Simon[234] NSUncaughtSystemExceptionException: Uncaught system exception: signal 10

6/15/11 9:52:08 AM [0x0-0x19019].com.dejal.simon[234] 1 CFRunLoopRunSpecific (in CoreFoundation) + 3792
6/15/11 9:52:08 AM [0x0-0x19019].com.dejal.simon[234] 2 +[NSURLConnection(NSURLConnectionReallyInternal) _resourceLoadLoop:] (in Foundation) + 284

David Sinclair's picture

Re: Bogus HTTP/HTTPS test failures

I'm sorry to hear that; I'd hoped that the changes in 3.1 would fix it. It looks like the exception is occurring in an OS-generated background thread, which is why I can't catch it.

The planned app split change (now coming in a future version) will definitely fix that, since it could kill the whole process if it gets stuck. But I'll have to see if I can come up with an interim solution. Maybe I'll use a helper for the Web (HTTP) service, like I do for Port-based ones.

JohnDCCIU's picture

Re: Bogus HTTP/HTTPS test failures

Eh.....sorry to keep whining about this, but the frequency of the failures is increasing for some reason. It's now happening twice a day on some days.

Any chance that there's some progress on the interim solution that you mentioned above, or could I test an early beta of the split app?

Thanks,
John

David Sinclair's picture

Re: Bogus HTTP/HTTPS test failures

Unfortunately due to other commitments, I won't be able to work on Simon 3.2 until early July. So hopefully a beta should be available around July 15. Sorry about that... I hope you can hold on till then.