Simon icon Simon
Flexible server monitoring

Mount plug-in issues

We're seeing issues with long term usage of the new mount plug-in. We're tasking it to test at least 30 servers at the moment.

Something is going wrong where it should be unmounting shares and perhaps it is failing to do so. We come in from the weekend to find dozens of shares mounted on the desktop. Dozens of shares and folders with share names appearing in the Finder's sidebar. Examination of the /Volumes directory shows that multiple folders, where the share would have been grafted to, have been left behind. The shares refuse to unmount until Simon is quit. We also have to delete the stray folders from /Volumes manually. For best results I also go ahead and reboot the system that Simon sits on. This can cause a chain reaction where Simon will report phantom failures.

Anyone else seeing this or able to recreate this?

Attached a previously taken screenshot. This one is nowhere near as bad as it was this morning.

AttachmentSize
OddSimonIssue.png66.5 KB
CPU Usage.jpg39.05 KB

Hmmm?

Yikes! I'm sorry to hear that. I'm looking though the code to see if I can find what might cause such an issue. Out of curiosity, does Simon have any entries in Console.log that correspond to this?

Daniel Ellis

Does this look relevant?

Trying to find logged messages from around the times of the issue. The following appears in the system log from around the right time:

Sep 3 01:40:01 lithium kernel[0]: smbfs_smb_qfsattr: (fyi) share 'NTFS', attr 0xb, maxfilename 255
Sep 3 01:40:01 lithium kernel[0]: smbfs_aclsflunksniff: (fyi) user sid S-1-5-21-3380867091-2989415938-698606940-3074 didnt map

This message may have occurred after the problem was already present.

Also found things like this:

2007-08-29 10:07:03.754 Simon[22040] Could not unmount volume H21AFP because it contains open files or is busy.
mount_smbfs: spnego blob2principal error 1
mount_smbfs: spnego blob2principal error 1
mount_smbfs: spnego blob2principal error 1
mount_smbfs: spnego blob2principal error 1
mount_smbfs: spnego blob2principal error 1
mount_smbfs: spnego blob2principal error 1
mount_smbfs: spnego blob2principal error 1
mount_smbfs: spnego blob2principal error 1

and things like this:

2007-08-21 13:51:18.660 Simon[22040] Unmounting H03AFP failed. Result = -35
mount_smbfs: spnego blob2principal error 1
mount_smbfs: spnego blob2principal error 1
mount_smbfs: session setup phase failed: syserr = Permission denied

and:

mount_smbfs: negotiate phase failed: syserr = Operation timed out

There were some references to a Simon crash, but I don't see a Simon crash log.

Most of our AFP servers have been set to Standard authentication to avoid Kerberos issues. Ideally the SMB functionality in Simon should avoid Kerberos too as each of our servers have their own Kerberos Realm at this point in time.

Increased CPU activity

See new attachment. It's the CPU usage of the system running Simon over the last week. Notice within the last 24 hrs Simon has been running almost constantly over 60%. Simon does not appear to be doing anything out of the ordinary and it maintains this rate of CPU usage even when not performing a test.

I'm going to reboot to see if this clears it... however the server was rebooted yesterday.

CPU Activity

I noticed your attachment this AM. Is it possible that this period of increased CPU load corresponds to a period of time where Simon was having trouble with your shares?

When the OS tries to talk to a share that it can no longer reach, it puts a fair amount of load on the system thrashing about. That's why I ask. Or, are you you seeing this increased CPU activity apart from when the Mount service is giving you trouble?

Everything should have been back to normal again

There did not appear to be any issues at the time of the increased CPU utilisation. In Activity Monitor we could see it was Simon that was using 60%+. Rebooting the server fixed that one. We could leave it as a red-herring... but important to investigate if the mount plug-in has an issue or is it something at our end.

Well, those messages from

Well, those messages from Console.log are curious. They indicate that shares aren't being unmounted either due to busy files being indicated, or the -35 status which relates to the volume not being found. All this is happening at the system level, using the OS's support for mounting remote volumes. What I can do, if you're willing, is build a version of the plugin that will give us a little more debug info to see if we can find the root cause.

If that sounds like something you're willing to try, let me know. You can find my contact info here (if not, just shoot David an email, and he'll get in touch with me):

www.dejal.com/about/daniel/contact

In the meantime, I'll keep looking.

Any further mis-behavior?

Just curious as to whether things are still misbehaving? Any additional info if so?

Daniel