Problems with NFS on an AIX Reboot? Then Go Single
Problems with NFS on an AIX Reboot? Then Go Single
A few weeks ago, I got called to review an issue where an AIX admin was cloning an AIX image onto a new LPAR. He said the reboot process was hanging during the NFS startup, trying to contact the NFS server to mount the remote file systems. I told him to wait for another five minutes before I came along and investigated the issue. He replied it had been like that for more than 30 minutes! Even hitting on the keyboard didn’t cancel the NFS mount process. My first step was to disable the NFS startup, which seemed a probable solution to the issue. Even though the NFS server was up and exporting the correct file systems. (To see what file systems are exported, use the command: showmount –e.)
The administrator wanted to find his AIX diagnostic DVD to boot off from, but I stopped him. He only needed to boot AIX into single user mode. This means no network services will be started so you can investigate the network-related issues.
Going Single User
Booting into single user is pretty simple. It goes like this:
- Boot the LPAR into SMS, select your normal boot disk to boot off, but instead of booting into Normal mode, select Service boot mode. See Figure 1.
- An informational message screen is then presented, before the diagnostic menu. Here select Single User Mode. See Figure 2.
- Once selected, the system goes into single user and you’ll be prompted for the root password before entry into single user is allowed. See Figure 3.
Once in single mode single entry, you can attack the problem. Now if you’re cloning from an old image, be sure you know the root password set at the time of the clone image. If not, you’ll be sitting at the password prompt for a long time.
Sorting Out NFS
I had a good guess what the issue was. The NFS mounts were probably set up to be a hard mount but no intr attribute was set. When doing NFS mounts, you can either do a soft or hard mount. With a soft mount, it will try a mount until the timo (timeout period) is reached. Then it gives up, generally after 15 seconds unless otherwise specified. With a hard mount, it will try forever waiting for a response from the NFS server. Now if the NFS server is down, you are out of luck and the client is going to hang around for long time! The only way to cancel the mount is to set the intr attribute, which lets you cancel the mount in progress with a from the keyboard. On this occasion, I decided to comment out anything related to NFS. I just needed to get the LPAR up on the network.
So I edited the /etc/inittab file to comment out the NFS startup, which is the entry rc.nfs. Now be sure to remember a comment in inittab is a colon (:) and not a hash (#). In the /etc/rc.nfs, I put a ‘exit 0’ at the top of the file. At this point, I was quite satisfied that the NFS wouldn’t start. I could have just ran:rmnfs –B, which would have more or less accomplished the same task. I also made sure that there were no entries in /etc/filesysyems for NFS mounts to comment out. When I exited single user mode, the LPAR came up within a minute.
When the Unexpected Happens
So what can we learn to avoid getting NFS issue when rebooting a client or doing a clone? Either ensure the NFS is a soft mount or set the intr attribute if it’s a hard mount.
For example, the following command creates a read-only soft mount of the file system, when the remote host and directory is uk01wrs6040:/opt/software_nfs. On the local host the directory, /opt/software_nfs has already been created to mount the remote directory. The following will mount the NFS share:
mount -o ro,soft uk01wrs6040:/opt/software_nfs /opt/software_nfs
To do a hard mount, specify the intr option, so we can if the remote server uk01wrs6040 is not responding. (Note it’s also read-only.)
mount -o ro,hard,intr uk01wrs6040:/opt/software_nfs /opt/software_nfs
Don’t Make It Hard
If you’re using SMIT for your NFS mounts, you can try setting the “mount automatically at system restart” option to no within the SMIT and then manually mount NFS after the system is up. You could also set it to mount in the background (option: bg), but I like to see what’s going on when I bring up an AIX machine, so all my mounts are in the foreground (option: fg). That’s just me.
As a rule, I don’t bother with SMIT to create predefined NFS mount, rather I explicitly create them in a rc.local script file, which is the last event to be called (executed) from /etc/inittab, which I’ve set up myself. These NFS mounts will do hard mounts but have the intr option set. So if it hangs during machine bootup with NFS issues, I just from the keyboard to break out of the NFS mounting. The machine then carries on with the normal boot process. No point in making it hard for yourself.