Recovery AIX system when hang on boot (554 code error).
If AIX system does not boot properly, possible causes could be:
- corrupted file system
- corrupted Journaled File System (JFS) log device
- bad IPL-device record or bad IPL-device magic number; the magic number indicates the device type
- corrupted copy of the Object Data Manager (ODM) database on the boot logical volume
- fixed disk (hard disk) in the inactive state in the root volume group
- bad zonning mapping with (NPIV, VSCSI, switches FC, SAn storage)
Check this post http://wp.me/p5bweg-8i to see why AIX is not booting correctly.
Follow this steps to recovery the system.
1. Boot AIX in maintenance mode (From DVD, MKSYSB, NIM or TAPE).
NOTE: Booteable media must be the same version and level as the system.
Choose Start Maintenance Mode for System Recovery (Option 3)
- Choose Start Maintenance Mode for System Recovery (Option 3). The next screen displays prompts for the Maintenance menu.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
Welcome to Base Operating System Installation and Maintenance Type the number of your choice and press Enter. Choice is indicated by >>>. 1 Start Install Now with Default Settings 2 Change/Show Installation Settings and Install >>> 3 Start Maintenance Mode for System Recovery 4 Make Additional Disks Available 5 Select Storage Adapters |
- Choose Access a Root Volume Group (Option 1).
1
2
3
4
5
6
7
8
9
10
11
|
Maintenance Type the number of your choice and press Enter. >>> 1 Access a Root Volume Group 2 Copy a System Dump to Removable Media 3 Access Advanced Maintenance Functions 4 Erase Disks 5 Configure Network Disks (iSCSI) 6 Select Storage Adapters 7 Install from a System Backup |
- The next screen displays a warning that indicates you will not be able to return to the Base OS menu without rebooting.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
Choose 0 continue. Warning: If you choose to access a root volume group, you will not be able to return to the Base Operating System Installation menus without rebooting. Type the number of your choice and press Enter. 0 Continue 88 Help ? >>> 99 Previous Menu >>> Choice [99]: 0 |
- The next screen displays information about all volume groups on the system.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
Access a Root Volume Group Type the number for a volume group to display the logical volume information and press Enter. 1) Volume Group 0000e4720000d9000000011d6c3294dc contains these disks: hdisk0 24576 C3-T1-01 500507680b2122b4//0000000000000000 001e1f00 2) Volume Group 0000e4720000d9000000011d72e7acf8 contains these disks: hdisk1 130048 C3-T1-01 500507680b2122b4//0001000000000000 001e1f00 3) Volume Group 0000e4720000d9000000011d6c3296c9 contains these disks: hdisk2 33792 C3-T1-01 500507680b2122b4//0002000000000000 001e1f00 4) Volume Group 0000e4720000d9000000011d6c329781 contains these disks: hdisk3 132096 C3-T1-01 500507680b2122b4//0003000000000000 001e1f00 5) Volume Group 000fd1fb0000d4000000015385553deb contains these disks: hdisk4 74752 C3-T1-01 500507680b2122b4//0004000000000000 001e1f00 Choice: 5 |
- Select the root volume group by number. The logical volumes in rootvg will be displayed with two options below.
1
2
3
4
5
6
7
8
9
10
11
12
|
Volume Group Information ------------------------------------------------------------------------------ Volume Group ID 000fd1fb0000d4000000015385553deb includes the following logical volumes: hd5 hd6 hd8 log_SCCC sfs_SSS hd4 soft_lv backups_lv hd1 hd10opt hd3 tools hd2 hd9var varsyslog fslv01 nmon_lv audit_lv cores_lv hd11admin logbbb ------------------------------------------------------------------------------ |
- Choose Access this volume group and start a shell before mounting the file systems (Option 2).
1
2
3
4
5
6
7
|
Type the number of your choice and press Enter 1) Access this Volume Group and start a shell 2) Access this Volume Group and start a shell before mounting filesystems 99) Previous Menu Choice [99]: 2 |
2. Run fsck to repair filesystems (Do not use -y option)
1
2
3
4
5
|
# fsck -p /dev/hd4 # fsck -p /dev/hd2 # fsck -p /dev/hd9var # fsck -p /dev/hd3 # fsck -p /dev/hd1 |
- If fsck indicates that block XX could not be read, the file system is probably unrecoverable (Nothing to do, stop this procedure andrecover system from backup)
- If fsck indicates that a file system has an unknown log record type, a corruption of the JFS log logical volume has been detected. Use the logform command to reformat it.
1
|
# /usr/sbin/logform /dev/hd8 |
- If the file system checks were successful, continue procedure
3. Reboot the system.
1
2
|
# exit # sync;sync;sync;reboot |
4. AIX failed again? Check ODM.
If AIX does not boot OK, is possible that ODM is corrupt. The following steps will overwrite your Object Data Manager (ODM) database files. You have to be carefull with this. You will loose important infotmation like network , devices and imported volume groups.
1
2
3
4
5
6
7
8
|
# mount /dev/hd4 /mnt # mount /dev/hd2 /mnt/usr # mkdir /mnt/etc/objrepos/bak # cp /mnt/etc/objrepos/Cu* /mnt/etc/objrepos/bak # cp /etc/objrepos/Cu* /mnt/etc/objrepos # umount /dev/hd2 # umount /dev/hd4 # exit |
- Determine which disk is the boot disk with the lslv command. The boot disk will be shown in the PV1 column of the lslv output.
1
2
3
4
|
# lslv -m hd5 hd5:N/A LP PP1 PV1 PP2 PV2 PP3 PV3 0001 0102 hdisk4 |
5. Save the clean ODM database to the boot logical volume. (# is the number of the fixed disk, determined with the previous command.)
1
|
# savebase -d /dev/hdisk# |
6. Recreate the boot image (hdisk4 in our case):
1
2
3
4
5
6
7
8
9
10
|
# bosboot -a -d /dev/hdisk4 trustchk: /usr/sbin/cfgmgr: Verification of attributes failed: mode trustchk: /usr/sbin/ifconfig: Verification of attributes failed: accessauths innateprivs secflags trustchk: /usr/sbin/chdev: Verification of attributes failed: mode trustchk: /usr/sbin/mknod: Verification of attributes failed: mode trustchk: /usr/sbin/route: Verification of attributes failed: mode trustchk: /usr/sbin/mount: Verification of attributes failed: mode trustchk: /usr/sbin/ipl_varyon: Verification of attributes failed: mode bosboot: Boot image is 51228 512 byte blocks. |
7. Make sure the bootlist is set correctly:
1
2
3
4
5
6
|
# bootlist -m normal -o hdisk4 blv=hd5 pathid=0 hdisk4 blv=hd5 pathid=1 hdisk4 blv=hd5 pathid=4 hdisk4 blv=hd5 pathid=5 hdisk4 blv=hd5 pathid=2 |
8. Make changes, if necessary:
1
|
# bootlist -m normal hdiskX cdX |
9. Make sure that the disk drive that you have chosen as your bootable device has a yes next to it:
1
2
3
4
5
6
7
8
9
10
|
# ipl_varyon -i [S 2359402 2490530 01/23/17-13:23:32:132 ipl_varyon.c 1312] ipl_varyon -i PVNAME BOOT DEVICE PVID VOLUME GROUP ID hdisk0 NO 000fd1eba39e6b120000000000000000 0000e4720000d900 hdisk1 NO 000fd1eba3a3de1c0000000000000000 0000e4720000d900 hdisk2 NO 000fd1eba3ba7f480000000000000000 0000e4720000d900 hdisk3 NO 000fd1eba3c6b7860000000000000000 0000e4720000d900 hdisk4 YES 000fd1fb7eaa2cf30000000000000000 000fd1fb0000d400 [E 2359402 0:334 ipl_varyon.c 1453] ipl_varyon: exited with rc=0 |
10. Reboot the system again.
1
|
# sync;sync;sync;reboot |