AIX Resolving "missing" or "removed" disks in AIX LVM
Technote (troubleshooting)
Problem(Abstract)
How do I get back a disk that is marked "missing" or "removed"?
What does this mean?
Symptom
AIX LVM will mark a disk as "missing" when it cannot successfully determine if the disk belongs to the volume group that it is in. Here are some ways to diagnose this problem and some possible solutions to get the disk back in an active state again.
One main symptom is that the lsvg command shows the disk in a "missing" state:
# lsvg -p vgname
datavg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk5 active 646 486 130..00..98..129..129
hdisk6 missing 646 6 00..00..00..00..06
Diagnosing the problem
First data gathering steps to take.
These steps can all be performed by a non-root user.
* Are there any disk errors for this physical volume in the system error report?
$ errpt | more
or for more in-depth error information
$ errpt -a | more
* Is the disk marked in an "Available" state in lsdev output?
$ lsdev -Cc disk -l hdiskX
* Does the disk show up in lspv?
$ lspv
* Is the disk in an "active" state in lsvg?
$ lsvg -p VGNAME
Resolving the problem
Try to read the PVID directly off the drive. This technique uses a lower-level command that bypasses the ODM and will print out information recorded on the disk. You will need to be root or su'ed to root to run this command and many of the ones that follow.
# lquerypv -h /dev/hdiskX 80 10
00000080 00050A85 A9B17061 00000000 00000000 |......pa........|
In the above output the PVID is the 2nd and 3rd columns combined.
Does the PVID returned match the output from lspv?
1. If yes then it's possible there was some temporary problem accessing the disk.
1a. Try running:
# varyonvg VGNAME
which should force the volume group to go out and probe all physical volumes belonging to it.
1b. If varyonvg does not go out and find the disk, we may have to force it into an available state so LVM will check it. To do this use:
# chpv -va hdiskX
It's possible after this you may need to try the varyonvg again.
2. If a PVID is returned but does not match what lspv or the ODM show, then does it exist in the VGDA?
The PVIDs in the VGDA can be viewed easiest using lqueryvg:
# lqueryvg -Ptp hdiskX
( -P will list the PVIDs only )
If the PVID on the drive is in the VGDA, but not in the ODM, the ODM can be updated by forcing a re-read of the PVID from the drive using the chdev command.
Do NOT run this command unless you have verified that there is a PVID on the drive AND that PVID is in the VGDA also on the drive.
# chdev -a pv=yes -l hdiskX
After this check to see if the physical volume shows up with no errors:
# lspv
# lsvg -p VGNAME
3. If the PVID on disk does not exist in the VGDA, a new PVID can be written to the drive and ODM, and the VGDA updated with that new PVID.
NOTE: You should be suspicious that this may not be the proper disk for this volume group. For example if a LUN was unmapped and then a different one remapped accidentally, that LUN may belong to a completely different volume group!
You can view the VGDA using lqueryvg:
# lqueryvg -Atp hdiskX
Run this against the disk in question, and one that is a known good disk in the volume group. Compare PVIDs, logical volume names, etc to insure it really belongs to the same volume group. If not then do not proceed with the steps below.
The volume group will be removed and re-imported using recreatevg.
3a. First get a list of all disks that are part of this volume group
# lsvg -p VGNAME
3b. The next steps will require that all logical volumes in the volume group be closed, so unmount any filesystems and stop any applications that are using raw logical volumes.
3c. Now remove the volume group:
# varyoffvg VGNAME
# exportvg VGNAME
3d. And bring it back in using recreatevg. In this instance we DO NOT want recreatevg to add the default prefixes onto the logical volume names and filesystem mount points, so we add flags to prevent that.
Using recreatevg in this manner it is IMPORTANT to list ALL disks belonging to this volume group. Unlike importvg, recreatevg needs a complete list of physical volumes in order to completely import the volume group and all logical volumes. The exception to this is when "-f" is used.
# recreatevg -L / -Y NA -y VGNAME hdiskX hdiskY hdiskZ
This will write new PVIDs to all drives listed on the command-line and update the VGDA with those PVIDs. It will also import and vary on the volume group.
4. If no PVID is returned at all, or the lqueryvg command hangs, then there is a disk problem. No LVM commands will fix this issue. Contact the correct team who support the disk type being used and have them find a solution to the problem.
Even a brand-new drive, or one completely clean of any LVM information should return with either a PVID or all zeroes:
# lspv | grep hdisk7
hdisk7 none None
# lquerypv -h /dev/hdisk7 80 10
00000080 00000000 00000000 00000000 00000000 |................|
Examples of errors that may accompany this issue.
This list is not complete, but may help you to identify the source of the problem. Many of these errors have been seen in conjunction with a disk marked as "missing".
LVM Errors you may see in errpt:
LABEL: LVM_SA_STALEPP
IDENTIFIER: EAA3D429
Description: PHYSICAL PARTITION MARKED STALE
LABEL: LVM_SA_PVMISS
IDENTIFIER: F7DDA124
Description: PHYSICAL VOLUME DECLARED MISSING
LABEL: LVM_SA_WRTERR
IDENTIFIER: 52715FA5
Description: FAILED TO WRITE VOLUME GROUP STATUS AREA
LABEL: LVM_SA_STALEPP
IDENTIFIER: EAA3D429
Description: PHYSICAL PARTITION MARKED STALE
LABEL: LVM_IO_FAIL
IDENTIFIER: E86653C3
Description: I/O ERROR DETECTED BY LVM
LABEL: LVM_QUORUMNOQUORUM
IDENTIFIER: 5BEAD71B
Description: Activation of a no quorum volume group without 100% of the disks.
LABEL: LVM_MISSPVADDED
IDENTIFIER: 26120107
Description: PHYSICAL VOLUME DEFINED AS MISSING
Filesystem errors you may see in errpt:
LABEL: J2_METADATA_EIO
IDENTIFIER: 78ABDDEB
Description: META-DATA I/O ERROR
LABEL: J2_FSCK_REQUIRED
IDENTIFIER: B6DB68E0
Description: FILE SYSTEM RECOVERY REQUIRED
LABEL: J2_LOG_EIO
IDENTIFIER: C1348779
Description: LOG I/O ERROR
Disk errors that may be seen during this problem:
LABEL: SC_DISK_ERR1
IDENTIFIER: 747725D9
LABEL: SC_DISK_ERR2
IDENTIFIER: B6267342
LABEL: SC_DISK_ERR7
IDENTIFIER: DE3B8540