Altering LVM Configuration When a Disk is Not in ODM Anymore
If you remove a disk from the system using rmdev -dl hdiskX
without having previously reduced the volume group to remove the disk from LVM, and thus have not updated properly the on-disk format information (called VGDA), you get a discrepancy between the ODM and the LVM configurations. Here is how to solve the issue (without any warranty though!).
What are the volume group informations:
# lsvg -p rootvg rootvg: PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION hdisk0 active 2157 1019 174..00..00..413..432 0516-304 : Unable to find device id 00ce4b6a01292201 in the Device Configuration Database. 00ce4b6a01292201 missing 2157 1019 174..71..00..342..432 # lspv hdisk0 00ce4b6ade6da849 rootvg active hdisk2 00ce4b6a01b09b83 drakevg active hdisk3 00ce4b6afd175206 drakevg active # lsdev -Cc disk hdisk0 Available Virtual SCSI Disk Drive hdisk2 Available Virtual SCSI Disk Drive hdisk3 Available Virtual SCSI Disk Drive
As we can notice, the disk is still in the LVM configuration but doesn't show up in the devices. To solve this issue, we need to cheat the ODM in order to be able to use LVM commands to change the LVM configuration, stored on the volume group disks. The idea is to reinsert a disk in the ODM configuration, remove the disk from LVM and then remove it from ODM. Here is how we do it. First, let's make a copy of the ODM files that we will change:
# cd /etc/objrepos/ # cp CuAt CuAt.before_cheat # cp CuDv CuDv.before_cheat # cp CuPath CuPath.before_cheat
Now, we will extract the hdisk0
's definition from ODM and add it back as hdisk1
's definition:
# odmget -q "name=hdisk0" CuAt CuAt: name = "hdisk0" attribute = "unique_id" value = "3520200946033223609SYMMETRIX03EMCfcp05VDASD03AIXvscsi" type = "R" generic = "" rep = "n" nls_index = 0 CuAt: name = "hdisk0" attribute = "pvid" value = "00ce4b6ade6da8490000000000000000" type = "R" generic = "D" rep = "s" nls_index = 11 # odmget -q "name=hdisk0" CuDv CuDv: name = "hdisk0" status = 1 chgstatus = 2 ddins = "scsidisk" location = "" parent = "vscsi0" connwhere = "810000000000" PdDvLn = "disk/vscsi/vdisk" # odmget -q "name=hdisk0" CuPath CuPath: name = "hdisk0" parent = "vscsi0" connection = "810000000000" alias = "" path_status = 1 path_id = 0
Basically, we need to insert new entries in the three classes CuAt
, CuDv
and CuPath
with hdisk0
changed to hdisk1
. A few others attributes need to be changed. The most important one is the PVID, located in CuAt
. We will use the value reported as missing by lsvg -p rootvg
. Attribute unique_id
also need to be changed. You can just change a few characters in the existing string, it just need to be unique in the system. The other attributes to change are connwhere
in CuDv
and connection
in CuPath
. Their value represent the LUN ID of the disk. Again, this value is not relevant, it just have to be unique. We can check the current LUN defined by running lscfg
on all the disks defined:
# lscfg -vl hdisk* hdisk0 U9117.570.65E4B6A-V6-C2-T1-L810000000000 Virtual SCSI Disk Drive hdisk2 U9117.570.65E4B6A-V6-C3-T1-L810000000000 Virtual SCSI Disk Drive hdisk3 U9117.570.65E4B6A-V6-C3-T1-L820000000000 Virtual SCSI Disk Drive
LUN 81 is used on controller C2 and LUNs 81 and 82 on C3. Let's choose 85, which for sure will not collide with other devices. The following commands will generate the text files that will be used to cheat the ODM, according to what was just explained:
# mkdir /tmp/cheat # cd /tmp/cheat # odmget -q "name=hdisk0" CuAt | sed -e 's/hdisk0/hdisk1/g' \ -e 's/00ce4b6ade6da849/00ce4b6a01292201/' \ -e 's/609SYMMETRIX/719SYMMETRIX/' > hdisk1.CuAt # odmget -q "name=hdisk0" CuDv | sed -e 's/hdisk0/hdisk1/' \ -e 's/810000000000/850000000000/' > hdisk1.CuDv # odmget -q "name=hdisk0" CuPath | sed -e 's/hdisk0/hdisk1/' \ -e 's/810000000000/850000000000/' > hdisk1.CuPAth
Let's look at the generated files:
# cat hdisk1.CuAt CuAt: name = "hdisk1" attribute = "unique_id" value = "3520200946033223719SYMMETRIX03EMCfcp05VDASD03AIXvscsi" type = "R" generic = "" rep = "n" nls_index = 0 CuAt: name = "hdisk1" attribute = "pvid" value = "00ce4b6a012922010000000000000000" type = "R" generic = "D" rep = "s" nls_index = 11 # cat hdisk1.CuDv CuDv: name = "hdisk1" status = 1 chgstatus = 2 ddins = "scsidisk" location = "" parent = "vscsi0" connwhere = "850000000000" PdDvLn = "disk/vscsi/vdisk" # cat hdisk1.CuPath CuPath: name = "hdisk1" parent = "vscsi0" connection = "850000000000" alias = "" path_status = 1 path_id = 0
So, we are ready to insert the data in the ODM:
# odmadd hdisk1.CuAt # odmadd hdisk1.CuDv # odmadd hdisk1.CuPath # lsvg -p rootvg rootvg: PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION hdisk0 active 2157 1019 174..00..00..413..432 hdisk1 missing 2157 1019 174..71..00..342..432
The disk is now back in ODM! Now, to remove the disk from the VGDA, we use the reducevg
command:
# reducevg rootvg hdisk1 0516-016 ldeletepv: Cannot delete physical volume with allocated partitions. Use either migratepv to move the partitions or reducevg with the -d option to delete the partitions. 0516-884 reducevg: Unable to remove physical volume hdisk1.
We will use the -d
flag to remove the physical partitions associated to each logical volumes and located hdisk1. A few lines have been remove to simplify listing...
# reducevg -d rootvg hdisk1 0516-914 rmlv: Warning, all data belonging to logical volume lv01 on physical volume hdisk1 will be destroyed. rmlv: Do you wish to continue? y(es) n(o)? y 0516-304 putlvodm: Unable to find device id 00ce4b6a012922010000000000000000 in the Device Configuration Database. 0516-896 reducevg: Warning, cannot remove physical volume hdisk1 from Device Configuration Database. # lsvg -l rootvg rootvg: LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT hd5 boot 2 2 1 closed/syncd N/A hd6 paging 256 256 1 open/syncd N/A hd8 jfs2log 1 1 1 open/syncd N/A hd4 jfs2 7 7 1 open/syncd / hd2 jfs2 384 384 1 open/syncd /usr hd9var jfs2 64 64 1 open/syncd /var hd3 jfs2 128 128 1 open/syncd /tmp hd1 jfs2 2 2 1 open/syncd /home hd10opt jfs2 32 32 1 open/syncd /opt fslv04 jfs2 256 256 1 open/syncd /usr/sys/inst.images loglv01 jfslog 1 1 1 closed/syncd N/A lv01 jfs 5 5 1 closed/syncd /mkcd/cd_images # lsvg -p rootvg rootvg: PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION hdisk0 active 2157 1019 174..00..00..413..432
The disk has been deleted from the VGDA. What about ODM?
# lsdev -Cc disk hdisk0 Available Virtual SCSI Disk Drive hdisk1 Available Virtual SCSI Disk Drive hdisk2 Available Virtual SCSI Disk Drive hdisk3 Available Virtual SCSI Disk Drive # rmdev -dl hdisk1 Method error (/etc/methods/ucfgdevice): 0514-043 Error getting or assigning a minor number.
We probably forgot to cheat one ODM class... Never mind: let's remove the cheat we added to ODM and see what appends:
# odmdelete -o CuAt -q "name=hdisk1" 2 objects deleted # lspv hdisk0 00ce4b6ade6da849 rootvg active hdisk2 00ce4b6a01b09b83 drakevg active hdisk1 none None hdisk3 00ce4b6afd175206 drakevg active # rmdev -dl hdisk1 Method error (/etc/methods/ucfgdevice): 0514-043 Error getting or assigning a minor number. # odmdelete -o CuDv -q "name=hdisk1" 1 objects deleted # lspv hdisk0 00ce4b6ade6da849 rootvg active hdisk2 00ce4b6a01b09b83 drakevg active hdisk3 00ce4b6afd175206 drakevg active # lspath Enabled hdisk0 vscsi0 Enabled hdisk2 vscsi0 Enabled hdisk2 vscsi1 Enabled hdisk3 vscsi1 Enabled hdisk3 vscsi0 Unknown hdisk1 vscsi0 # odmdelete -o CuPath -q "name=hdisk1" 1 objects deleted # lspath Enabled hdisk0 vscsi0 Enabled hdisk2 vscsi0 Enabled hdisk2 vscsi1 Enabled hdisk3 vscsi1 Enabled hdisk3 vscsi0
That's it! Use with care.
Side note: This entry was originally contributed by Patrice Lachance, which first wrote about this subject.
Comments
There is a much simpler way to do this. (Under normal circurmstances you would not be able to delete a disk using rmdev. More likely, the physical disk has been physically removed/destroyed - The command [b]rmdev -dl hdiskX [/b] will be refused for an active volume group).
How you end up with a PVMISSING disk is irrevevant. Getting the system repaired is relevant! So, the simpler way! to correct volume group VGDA and AIX ODM.
A situation like this is more common:
CASE: While the volume group is offline, maintenance is performed on the disks. One disk is/was damaged beyond repair, or replaced during the process. Now back at AIX the volumes are to be reactivated.
root@aix530:[/]lsvg -p vgExport
0516-010 : Volume group must be varied on; use varyonvg command.
root@aix530:[/]varyonvg vgExport
PV Status: hdisk1 00c39b8d69c45344 PVACTIVE
hdisk2 00c39b8d043427b6 PVMISSING
Here there is a PVMISSING. For this case, the old hdisk2 is physically destroyed. All the data is lost, but AIX ODM and the VGDA on all other disks in the volume group do not know this yet.
First document what is lost:
root@aix530:[/]lsvg -l vgExport
vgExport:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
lvExport jfs2 416 416 1 closed/syncd /export
lvTest jfs 32 32 1 closed/syncd /scratch
loglv00 jfslog 1 1 1 closed/syncd N/A
Examine each logical partition for data that may be lost from hdisk2. Just one physical partition on hdisk2 implies that the filesystem is corrupt!
[b]root@aix530:[/]lslv -m lvExport | grep hdisk2 | tail -1
root@aix530:[/]lslv -m lvTest | grep hdisk2 | tail -1
0032 0083 hdisk2
root@aix530:[/]lslv -m loglv00 | grep hdisk2 | tail -1
0001 0084 hdisk2
[/b]
So, with this info I know that any data in /scratch is suspect, and should be restored from a backup.
To prepare for this I run the AIX command for removing a MISSING disk:
[b]
root@aix530:[/]lqueryvg -p hdisk1 -vPt
Physical: 00c39b8d69c45344 2 0
00c39b8d043427b6 1 0
VGid: 00c39b8d00004c000000011169c45a4b
root@aix530:[/]umount /scratch
umount: 0506-347 Cannot find anything to unmount.
root@aix530:[/]rmfs /scratch
rmfs: 0506-936 Cannot read superblock on /dev/lvTest.
rmfs: 0506-936 Cannot read superblock on /scratch.
rmfs: Unable to clear superblock on /scratchrmlv: Logical volume lvTest is removed.
root@aix530:[/]rmlv loglv00
Warning, all data contained on logical volume loglv00 will be destroyed.
rmlv: Do you wish to continue? y(es) n(o)? y
rmlv: Logical volume loglv00 is removed.
root@aix530:[/]lsvg -p vgExport
vgExport:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk1 active 511 95 00..00..00..00..95
hdisk2 missing 255 222 51..18..51..51..51
root@aix530:[/]lsvg -l vgExport
vgExport:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
lvExport jfs2 416 416 1 open/syncd /export
root@aix530:[/]ldeletepv -g 00c39b8d00004c000000011169c45a4b -p 00c39b8d043427b6
[/b]
Note: there is no output for the above command when all proceeds accordingly.
Now the regular AIX commands to verify VGDA and ODM are in order.
[b]
root@aix530:[/]lsvg -p vgExport
vgExport:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk1 active 511 95 00..00..00..00..95
root@aix530:[/]lsvg -l vgExport
vgExport:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
lvExport jfs2 416 416 1 open/syncd /export
[/b]
Summary:
This is much less error prone than using ODM commands and has been available in AIX for disk management since at least 1995 (when AIX 4 first came out. It may have been in AIX 3 as well, taking it back to 1991-1992, but I never did any system administration on AIX 3 to know for sure.
Important commands to review:
lslv (-m)
lqueryvg
ldeletepv
First, you are right in your assumption: I never had followed an AIX administration course (advanced or not), nor any other UNIX or operating system's courses.
Secondly, thank you for your great inputs. This is pretty interesting, and the proposed follow-up fits very well its purpose. So, assuming I am sure there is nothing stored on the faulted disk (since it is just a wrong information from the LVM), I understand I can forcibly clean things up using only the `ldeletepv' command.
I have cleaned up the entry a bit, and found an easier way to locate logical partitions that might still be in the VGDA and/or ODM.
http://rootvg.net/content/view/174/...
My apologies for not answering your direct question. If there was anything on the MISSING disk, the command ldeletepv will not remove the disk. You will need to use rmfs (or rmlv if not a file system) to remove the information about the logical volumes from the rest of the VGDA's. Once there are no other references to data on the missing disk the disk entry in the VGDA can be removed as well.
In your example, assuming that it was a mirror of rootvg that was lost (so only a mirror is missing) you could first execute:
unmirrorvg rootvg hdisk1
ldelevepv -v VGID -p PVID
and your system would be satisfied.
Thank you, Michael. This is effectively the way I understood your first answer.