Simon icon Simon
Flexible server monitoring

Simon checks result in failure, even though net access is flawless

Hi,

First let me just say how pleased I am with Simon - a really great app!

However, today I started having problems with the checks... I use about 10 tests using "Web (HTTP)"-service and its been working flawlessly for the last 2 months or so. Then all of a sudden every test started showing up as "failures" - all of them timing out and reaching the maximum time-out limit that I set i the prefs (5 sec).

When I check the sites by double-clicking on the test in question I have no problems at all in reaching them. Neither have I experienced any other Internet problems that could be the reason for Simon to act out of the ordinary...

Any ideas on this?

Thanks for your help!

Edit: Oh yeah, I forgot to add that when I do I restart of Simon the tests go through without any problems for about 1 minute. Then all of a sudden everyone of them start timing out and then continues this way...

David Sinclair's picture

Re: Simon checks result in failure, even though net access...

I was going to ask if you'd tried relaunching Simon, but I see that doesn't help.

Very strange. Had you changed the tests recently? Or upgraded Simon recently, or upgraded the OS (e.g. to Mac OS X 10.5.7, which just came out)? Just trying to isolate a cause.

Try adding a new test, and see if that works or fails too.

Well I DID upgrade to 10.5.7

Well I DID upgrade to 10.5.7 today - could that be the problem? Is this something that is known before?

I'll try to add a new test and get back here with the results...

Edit: The new test didn't work either - still timing out...

David Sinclair's picture

Re: Well I DID upgrade to 10.5.7

I'm not aware of any issues with 10.5.7, and haven't received any other reports of problems with it. But it's conceivable.

Can you think of anything on your machine that could be blocking Simon's communication?

If you try a different kind of service within Simon (e.g. Ping), does that work?

I can't really think of

I can't really think of anything else that could possibly block Simon's comms... Nothing "infrastructural" anyway. Now I am not really sure what 10.5.7 adds to the OS - maybe there is something in the update doing this? Have you still had no other feedback about this problem?

I tried adding a "ping-service" to www.google.com with 2 packets - and this seems to work fine.... However - the HTTP-services are still malfunctioning...

Any other ideas?

Edit: I am guessing the Ping-service is working - the triangle shows up as uncolored when the other starts failing. Howevere, since I have never used Ping-service before I might be mistaken?

David Sinclair's picture

Re: I can't really think of

Yes, it sounds like the Ping test is working fine. It'd show a red downward triangle if failing.

I haven't received any other reports of problems with the Web service.

Another thing to try: click the Preview button in the toolbar. What happens? Does it still time out? What is the error message? Is anything displayed in the source or preview areas of the window?

Have you tried adding a new test for a common site like Apple or Google, to ensure it isn't something strange with the sites you are checking?

Nothing shows up at all in

Nothing shows up at all in the Preview and/or the Source window. The status bar in the lower right corner seems to be functioning but stuck at "searching"-status. I let the window be like this for about 15 minutes and still nothing changed.

I also tried adding a test for www.google.com a you proposed. However, this test also started failing at the same time as the other sites...

The strange thing is that when restarting Simon, as I mentioned above, the tests seem to work fine for a couple of minutes, then suddenly everything goes haywire...

Any other ideas?

PS. I also tried

PS. I also tried retrograding to version 2.3.5 but it showed the same results....

David Sinclair's picture

Re: Nothing shows up at all

Very strange. The Google test proves that it is something specific to your machine, rather than with the sites you're checking.

I'd normally suggest sending me your data to examine, but I don't think that'd help in this case.

Two other ideas:

1. Are there any errors in the Console log from Simon? (Via /Applications/Utilities/Console.)

2. If not, try moving aside your Simon data and create a new one, to see if that still has the problem. Quit Simon, rename the "~/Library/Application Support/Dejal/Simon" folder, then re-launch Simon with default data. You can later reverse that process by deleting the new data folder and renaming your normal one.

Maybe this could be

Maybe this could be something!
It seems as though every time the tests start timing out the following lines get created in the console:

May 18 18:46:04 Adrien-iMac-24 Simon[8449]: NSUncaughtSystemExceptionException: Uncaught system exception: signal 11
May 18 18:46:04 Adrien-iMac-24 Simon[8449]: An uncaught exception was raised
May 18 18:46:04 Adrien-iMac-24 Simon[8449]: Uncaught system exception: signal 11
May 18 18:46:04 Adrien-iMac-24 Simon[8449]: NSUncaughtSystemExceptionException: Uncaught system exception: signal 11
May 18 18:46:04 Adrien-iMac-24 [0x0-0x1c41c4].com.dejal.simon2[8449]: sh: /usr/bin/atos: No such file or directory

Now, this area of OSX is really unfamiliar to me so I can't tell anything out of the above - can you?
However, this could very possibly be something thats connected with the timing out since the start at the exact same time both in Simons logs and in the console log...

Edit: I tried the exact same thing again and the exact same lines showed up just as the tests started timing out:

May 18 19:42:39 Adrien-iMac-24 Simon[8918]: NSUncaughtSystemExceptionException: Uncaught system exception: signal 11
May 18 19:42:39 Adrien-iMac-24 Simon[8918]: An uncaught exception was raised
May 18 19:42:39 Adrien-iMac-24 Simon[8918]: Uncaught system exception: signal 11
May 18 19:42:39 Adrien-iMac-24 Simon[8918]: NSUncaughtSystemExceptionException: Uncaught system exception: signal 11
May 18 19:42:39 Adrien-iMac-24 [0x0-0x1d21d2].com.dejal.simon2[8918]: sh: /usr/bin/atos: No such file or directory

Edit2: I also tried renaming the dir/file that you mentioned above - this didn't change anything. The tests seem to time out just as before without any noticeable differences...

David Sinclair's picture

Re: Maybe this could be

Yes, that would correspond with the issue. Simon is having an exception somewhere, which causes the check to time out and fail.

Unfortunately, those log entries don't indicate where the problem occurred. The atos unix command isn't present, so it can't output the location of the problem.

Probably the best way forward now would be for you to get that unix command, which is installed as part of Apple's developer tools. Or simpler, download it from this link and use Terminal to install it via the following command, assuming the decompressed file is in your Downloads folder:

sudo cp ~/Downloads/atos /usr/bin

(It will ask for your password, since it is copying it into a root-owned directory.)

After doing that, do a Simon check again and copy the Console log entries; hopefully it'll include further information pinpointing the cause.

...

When you tried renaming the directory, did you quit Simon first? You need to do so, or it'll just write out the same data again (at best) or cause data loss (at worst). Relaunching Simon without that data folder there would create default data.

Here is what showed up in

Here is what showed up in the console with the atos-command installed:

May 18 21:24:54 Adrien-iMac-24 Simon[9719]: NSUncaughtSystemExceptionException: Uncaught system exception: signal 11
May 18 21:24:55 Adrien-iMac-24 Simon[9719]: An uncaught exception was raised
May 18 21:24:55 Adrien-iMac-24 Simon[9719]: Uncaught system exception: signal 11
May 18 21:24:55 Adrien-iMac-24 Simon[9719]: NSUncaughtSystemExceptionException: Uncaught system exception: signal 11
May 18 21:24:55 Adrien-iMac-24 [0x0-0x1e51e5].com.dejal.simon2[9719]: atos cannot examine process 9719 for unknown reasons, even though it appears to exist.

...

Since I wasn't sure exactly how I did when I renamed the folder I redid the process as you described. The time outs still occurred - BUT they took a lot longer before first occurring. After renaming the folders it took about 20 mins before the first test timed out, compared to about 2-3 mins earlier.

The Console still showed the same message though:

May 18 21:46:38 Adrien-iMac-24 Simon[9761]: NSUncaughtSystemExceptionException: Uncaught system exception: signal 11
May 18 21:46:39 Adrien-iMac-24 Simon[9761]: An uncaught exception was raised
May 18 21:46:39 Adrien-iMac-24 Simon[9761]: Uncaught system exception: signal 11
May 18 21:46:39 Adrien-iMac-24 Simon[9761]: NSUncaughtSystemExceptionException: Uncaught system exception: signal 11
May 18 21:46:39 Adrien-iMac-24 [0x0-0x1e71e7].com.dejal.simon2[9761]: atos cannot examine process 9761 for unknown reasons, even though it appears to exist.

David Sinclair's picture

Re: Here is what showed up

Hmm. Looks like atos has other dependencies. Pity.

So... unless you were willing to install Apple's developer tools, which requires creating a free account at the Apple Developer site, the only way we can narrow this down further is for me to put debug logging in a special build of Simon for you. Let me know what you want to do.

I have to say that you

I have to say that you really are helpful - thanks! :)

It sounds like the smoothest way is to go with the debug logging build - is this as efficient as the developer tool option?
If yes, then I would really much appreciate it if you could set me up with such a build - that would be great!

David Sinclair's picture

Re: I have to say

I'm happy to help. If you're having a problem, it's possible that others are too, even though nobody else has reported a problem. So it's worthwhile to track it down.

Here is a special build. It adds exception handling in the Web plugin, so hopefully will catch the errors you experienced. Please run this and let me know what the Console log says. (You are welcome to continue running it.)

I don't think that did the

I don't think that did the trick. The lines in the console look the same, to me anyways...:

May 19 07:42:17 Adrien-iMac-24 Simon[10822]: NSUncaughtSystemExceptionException: Uncaught system exception: signal 11
May 19 07:42:17 Adrien-iMac-24 Simon[10822]: An uncaught exception was raised
May 19 07:42:17 Adrien-iMac-24 Simon[10822]: Uncaught system exception: signal 11
May 19 07:42:17 Adrien-iMac-24 Simon[10822]: NSUncaughtSystemExceptionException: Uncaught system exception: signal 11
May 19 07:42:17 Adrien-iMac-24 [0x0-0x208208].com.dejal.simon2[10822]: atos cannot examine process 10822 for unknown reasons, even though it appears to exist.

David Sinclair's picture

Re: I don't think that did the

Hmm... you're sure you used the provided special build?

That would indicate that the exception is occurring somewhere else than where I expected. I'm not sure where it could be, though.

Yup I am quite sure - I even

Yup I am quite sure - I even downloaded the build (which starts up as version 2.5.2) again just to be sure since I thought that the same results in the console was kind of weird.
Here are the same lines again from when I tried just a couple of minutes ago (i added the lines that preceded the ones associated with Simon - if this is of any help):

May 19 20:26:11 Adrien-iMac-24 org.apache.httpd[17273]: (2)No such file or directory: httpd: could not open error log file /private/var/log/apache2/error_log.
May 19 20:26:11 Adrien-iMac-24 org.apache.httpd[17273]: Unable to open logs
May 19 20:26:11 Adrien-iMac-24 com.apple.launchd[1] (org.apache.httpd[17273]): Exited with exit code: 1
May 19 20:26:11 Adrien-iMac-24 com.apple.launchd[1] (org.apache.httpd): Throttling respawn: Will start in 10 seconds
May 19 20:26:19 Adrien-iMac-24 Simon[17266]: NSUncaughtSystemExceptionException: Uncaught system exception: signal 11
May 19 20:26:19 Adrien-iMac-24 Simon[17266]: An uncaught exception was raised
May 19 20:26:19 Adrien-iMac-24 Simon[17266]: Uncaught system exception: signal 11
May 19 20:26:19 Adrien-iMac-24 Simon[17266]: NSUncaughtSystemExceptionException: Uncaught system exception: signal 11
May 19 20:26:19 Adrien-iMac-24 [0x0-0x297297].com.dejal.simon2[17266]: atos cannot examine process 17266 for unknown reasons, even though it appears to exist.

...

I tried using Simon on my PowerBook G4 with 10.5.7 and it worked without any problems at all... So infrastructure-wise (router, internet connection etc) there shouldn't be any problems...

David Sinclair's picture

Re: Yup I am quite sure

Hmm. Interesting that it works on your other machine (assuming you left it running long enough to manifest the timeouts); that really indicates a problem specific to your iMac.

I'm at a loss to where the exception could be occurring; Simon already had exception handlers around the plugin code, so any exception should be caught.

I normally wouldn't suggest this, but would you be willing to install Apple's developer tools, so the atos command works? That should pinpoint the point of failure. Otherwise I'd have to flail around trying to guess where it could be.

You should be able to install just the base and unix tools; you wouldn't need everything. Go to the Mac Dev Center.

Ok, I registered... However,

Ok, I registered... However, there are A LOT of downloads at the ADC member site...

Could you please tell me exactly which ones i need to download?

David Sinclair's picture

Re: Ok, I registered...

You'd want the Xcode 3.1.2 installer.

Installed ok and then

Installed ok and then restarted Simon. Below is what came out in the console - hopefully this will help:

May 20 07:18:36 Adrien-iMac-24 Simon[19600]: NSUncaughtSystemExceptionException: Uncaught system exception: signal 11
May 20 07:18:37 Adrien-iMac-24 Simon[19600]: An uncaught exception was raised
May 20 07:18:37 Adrien-iMac-24 Simon[19600]: Uncaught system exception: signal 11
May 20 07:18:37 Adrien-iMac-24 Simon[19600]: NSUncaughtSystemExceptionException: Uncaught system exception: signal 11
May 20 07:18:37 Adrien-iMac-24 [0x0-0x2ca2ca].com.dejal.simon2[19600]: 1 0xffffffff
May 20 07:18:37 Adrien-iMac-24 [0x0-0x2ca2ca].com.dejal.simon2[19600]: 2 _CFStreamSignalEventSynch (in CoreFoundation) + 193
May 20 07:18:37 Adrien-iMac-24 [0x0-0x2ca2ca].com.dejal.simon2[19600]: 3 CFRunLoopRunSpecific (in CoreFoundation) + 3141
May 20 07:18:37 Adrien-iMac-24 [0x0-0x2ca2ca].com.dejal.simon2[19600]: 4 CFRunLoopRunInMode (in CoreFoundation) + 88
May 20 07:18:37 Adrien-iMac-24 [0x0-0x2ca2ca].com.dejal.simon2[19600]: 5 +[NSURLConnection(NSURLConnectionReallyInternal) _resourceLoadLoop:] (in Foundation) + 320
May 20 07:18:37 Adrien-iMac-24 [0x0-0x2ca2ca].com.dejal.simon2[19600]: 6 -[NSThread main] (in Foundation) + 45
May 20 07:18:37 Adrien-iMac-24 [0x0-0x2ca2ca].com.dejal.simon2[19600]: 7 __NSThread__main__ (in Foundation) + 308
May 20 07:18:37 Adrien-iMac-24 [0x0-0x2ca2ca].com.dejal.simon2[19600]: 1 CFRunLoopRunSpecific (in CoreFoundation) + 4355
May 20 07:18:37 Adrien-iMac-24 [0x0-0x2ca2ca].com.dejal.simon2[19600]: 2 CFRunLoopRunInMode (in CoreFoundation) + 88
May 20 07:18:37 Adrien-iMac-24 [0x0-0x2ca2ca].com.dejal.simon2[19600]: 3 +[NSURLConnection(NSURLConnectionReallyInternal) _resourceLoadLoop:] (in Foundation) + 320
May 20 07:18:37 Adrien-iMac-24 [0x0-0x2ca2ca].com.dejal.simon2[19600]: 4 -[NSThread main] (in Foundation) + 45
May 20 07:18:37 Adrien-iMac-24 [0x0-0x2ca2ca].com.dejal.simon2[19600]: 5 __NSThread__main__ (in Foundation) + 308

Apparently the lines don't show in a very pedagogic manor here in the forum - let me know if you want me to send them to you in a text-file or something easier to work with.

...

I let Simon run for several hours on my PB G4 and still no regular time outs appeared as on my iMac.

Ulf Dunkel's picture

> May 19 20:26:11

> May 19 20:26:11 Adrien-iMac-24 org.apache.httpd[17273]: (2)No such file or directory: httpd: could not open error log file /private/var/log/apache2/error_log.
> May 19 20:26:11 Adrien-iMac-24 org.apache.httpd[17273]: Unable to open logs
> May 19 20:26:11 Adrien-iMac-24 com.apple.launchd[1] (org.apache.httpd[17273]): Exited with exit code: 1

By the way, Adrien. Did you ever remove System stuff from your Leopard Mac? I am quite sure that /privat/var/log/apache2/error_log is a file which should exist on every Mac which has at least been run for some days.

You might want to check this in the Finder using "Go > Go to Folder": /private/var/log/apache2/

Does it exist at all?

Just my 2 cents.
--Ulf

Thanks for your cents! I

Thanks for your cents! I thought that it was weird that those lines kept showing up in the console on a regular basis.

The folder doesn't appear to exist. Is this something essential to OSX? Is there anything I can do to repair this?

Thanks again for your help!

David Sinclair's picture

Making progress

Thanks for the console log via the developer tools. That shows that the exception is in the OS-level URL connection code — no Simon code is involved at all. No wonder I couldn't find a cause.

I could probably catch the exception at the main run loop level, though I'm not sure that'd be a good idea, since it'd mask the problem.

Thanks also to Ulf — that was an excellent observation regarding the Apache log. Sounds like a symptom of a damaged Apache installation, which may relate to the problem.

At this point I'd recommend reinstalling the OS. That should hopefully solve the damage, and thus fix the underlying cause of the issue with Simon.

Ok, sound like drastic

Ok, sound like drastic measures.... But maybe its worth it?

Forgive me if I sound like a newbie for asking this; but is there any smooth way to reinstalling the OS without having to reinstall all the applications?

David Sinclair's picture

Re: Ok, sound like drastic

I can't guarantee that reinstalling the OS would help, but it looks like it would.

You won't need to reinstall the apps. You can just do an update install, installing over the top of your existing OS. That should be enough. I believe that is the default installer mode, though check the installer options.

Well I reinstalled the OS

Well I reinstalled the OS and chose the "Archive and Install"-option.
However, the problems with Simon still occur... With the exact same lines appearing in the console!

This is getting frustrating...

Any more ideas?

I think I might be on to

I think I might be on to something! Since I installed the OS again I started doing some testing myself - to root out what might be causing this.

I turned off all apps that I could possibly think of might be interfering with Simon and then paused all tests. Then I started resuming them one after one and the time outs didn't seem to occur at all! I did this without any problems, with about 30 mins between each test, up until the 10th test - then all of a sudden the tests started timing out again.
I quit Simon and now paused all but 8 tests and then started resuming the rest as above. Shortly after resuming the 10th the time-outs started again! Thus it seems someway that the problem has to with the number of the tests that you run simultaneously...

Does this make any sense to you? Its very strange since I've been having like 12-14 tests running since I first started using Simon, some 2-3 months ago.

I'll continue testing this and we'll see what happens...

David Sinclair's picture

Re: I think I might be on to

I'm sorry that reinstalling the OS didn't solve it. It's possible that something was left (e.g. a preference) that is behind it, but it's hard to tell.

Definitely very interesting that having 10 active tests triggers the misbehavior. You definitely should be able to have many more than that (depending on your license level).

Actually, if you had said 7 or 20, my immediate thought would be the license limit: if you exceed that, Simon sets tests to "waiting" and displays an alert about having too many active tests. But that doesn't fit with the exceptions you experienced, so obviously isn't the cause.

Please do continue testing it, and let me know what you find. Try increasing the interval between checks in Advanced Preferences, and make sure the tests aren't checking at the same time, to see if that helps.

I've been continuing the

I've been continuing the tests - however the results seem to be inconclusive... Some times the time-outs start at 7 or 8 tests, and sometimes at 10...

By the way, is there anyway to shorten test-intervals to be shorter than 15 seconds? I now have "1 second" selected in some tests to get the quickest test-ratio (ie 15 seconds). Is there anyway to get this down to 5 seconds?

David Sinclair's picture

Re: I've been continuing the

Continuing my previous (unlikely) thought: what kind of license do you have?

No, tests have a certain minimum timeframe, and doing them too often would likely cause problems anyway, bogging down the system.

Well I have a free license -

Well I have a free license - which I guess is no license at all... But this shouldn't be any problem, since its been working fine for several months before the time outs started...

I've continued the testing and its Simon seems to be stable at around 7-8. If I go above this number there is a high chance the time outs will reoccur...

David Sinclair's picture

Re: Well I have a free license

If you've been using it for months without a license, you may be hitting against the license limitations. Doesn't sound quite like your symptoms, but might be related.

The trial period starts off allowing 20 active tests (like the Standard license), then reduces to 7 (like the Basic license) after the half-way warning. So you might be experiencing that.

Send me a screenshot of the Simon Monitor window when you next experience an issue.

Ok, but it seems strange

Ok, but it seems strange that I haven't had any warnings at all... At what time should the half-way warning show itself?

David Sinclair's picture

Re: Ok, but it seems strange

That is strange... you don't have a hacked/damaged copy of Simon by any chance?

Or a localized copy; maybe there's a localization issue?

Have a look at Simon > Licenses. What does it say at the top?

The reminders are semi-random, but you should have been reminded after 15 days of usage, and again at 30 days.

I do not have a hacked

I do not have a hacked version of Simon - no point in having so really since it is possible to use the app for free anyway.

Damaged copy seems unlikely due to the fact that I have reinstalled Simon several times during these testing procedures and once even the OS.

Anyway - since I have found out that below 7-8 tests Simon doesn't time-out I will continue to use it with those limitations...

Thanks for your help - hopefully this problem will not occur for so many (if any?) more people!

David Sinclair's picture

Re: I do not have a hacked

Thanks for the reply.

I would still like to help you solve it, in case there is a bug that affects others.

You didn't mention localization — are you seeing English for Simon, or one of the other localizations?

I would like to see a screenshot of the Simon Monitor window when the timeout occurs, in case that provides any clues.

I am running Simon in

I am running Simon in English - or anyway this is the language that the app uses. However, my OS X is in Swedish - but since these problems didn't occur at all before this doesn't seem to be the root of the problem either...

Its "nice" to see someone else having the same problem! :) This way, maybe we'll get to the real problem...

jjspreij's picture

exactly the same problem here....

hi David,

first of all, Simon looks great! ;-)

I seem to have the exact same problem as described above:

- I run 10.5.7 in English (but didn't run Simon before 10.5.7 so no comparison there)

- I have a freshly downloaded version 2.5.1 and a Standard license
(registered to jj.paypal@demon.cx; bought it a long time ago, only started using it today)

- there are only 5 checks defined: 1x Incoming Mail (POP) and 4x Web (HTTP)

- checking the POP3 server with Simon works fine all the time

- checking a website with Simon, very simply, works fine initially (Check Duration between 0.1 and 0.7 secs), then starts timing out & failing after a short time (timeout is set at 30 seconds)

- all sites react fine in Safari, using the Visit Site button; but they don't load with the Preview button

- system.log shows these lines when searching for "dejal":
Jun 1 12:47:27 OctoPro [0x0-0x3fa3fa].com.dejal.simon2[96037]: sh: /usr/bin/atos: No such file or directory
Jun 1 13:47:09 OctoPro [0x0-0x3fa3fa].com.dejal.simon2[96037]: sh: /usr/bin/atos: No such file or directory
Jun 1 21:28:27 OctoPro [0x0-0x4a74a7].com.dejal.simon2[99501]: sh: /usr/bin/atos: No such file or directory

- I haven't got the Dev Tools installed; also haven't hacked anything in the system that could possibly be related...

- built-in OS X firewall is off, Little Snitch is not installed

grtz, jj

David Sinclair's picture

Re: exactly the same problem here....

That's concerning. I have heard some mutterings of issues with URL connection bugs introduced in Mac OS X 10.5.7, so maybe you are both victims of that. Strange that it only affects a few people, though.

Please try running this special build, which will output more information to the Console log, and let me know what happens:

SImon 2.5.2e2

jjspreij's picture

Thanks David, running that

Thanks David, running that build now!

However, the problem went away before that, after restarting the machine (recent QT software update) and some fiddling with the Check-settings...

It's been running fine for hours now ;-)

Will let you know if the problem comes back!

--jj