Altering LVM Configuration When a Disk is Not in ODM Anymore
If you remove a disk from the system using rmdev -dl hdiskX without having previously reduced the volume group to remove the disk from LVM, and thus have not updated properly the on-disk format information (called VGDA), you get a discrepancy between the ODM and the LVM configurations. Here is how to solve the issue (without any warranty though!).
What are the volume group informations:
# lsvg -p rootvg
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk0 active 2157 1019 174..00..00..413..432
0516-304 : Unable to find device id 00ce4b6a01292201 in the Device
Configuration Database.
00ce4b6a01292201 missing 2157 1019 174..71..00..342..432
# lspv
hdisk0 00ce4b6ade6da849 rootvg active
hdisk2 00ce4b6a01b09b83 drakevg active
hdisk3 00ce4b6afd175206 drakevg active
# lsdev -Cc disk
hdisk0 Available Virtual SCSI Disk Drive
hdisk2 Available Virtual SCSI Disk Drive
hdisk3 Available Virtual SCSI Disk Drive
As we can notice, the disk is still in the LVM configuration but doesn't show up in the devices. To solve this issue, we need to cheat the ODM in order to be able to use LVM commands to change the LVM configuration, stored on the volume group disks. The idea is to reinsert a disk in the ODM configuration, remove the disk from LVM and then remove it from ODM. Here is how we do it. First, let's make a copy of the ODM files that we will change:
# cd /etc/objrepos/ # cp CuAt CuAt.before_cheat # cp CuDv CuDv.before_cheat # cp CuPath CuPath.before_cheat
Now, we will extract the hdisk0's definition from ODM and add it back as hdisk1's definition:
# odmget -q "name=hdisk0" CuAt
CuAt:
name = "hdisk0"
attribute = "unique_id"
value = "3520200946033223609SYMMETRIX03EMCfcp05VDASD03AIXvscsi"
type = "R"
generic = ""
rep = "n"
nls_index = 0
CuAt:
name = "hdisk0"
attribute = "pvid"
value = "00ce4b6ade6da8490000000000000000"
type = "R"
generic = "D"
rep = "s"
nls_index = 11
# odmget -q "name=hdisk0" CuDv
CuDv:
name = "hdisk0"
status = 1
chgstatus = 2
ddins = "scsidisk"
location = ""
parent = "vscsi0"
connwhere = "810000000000"
PdDvLn = "disk/vscsi/vdisk"
# odmget -q "name=hdisk0" CuPath
CuPath:
name = "hdisk0"
parent = "vscsi0"
connection = "810000000000"
alias = ""
path_status = 1
path_id = 0
Basically, we need to insert new entries in the three classes CuAt, CuDv and CuPath with hdisk0 changed to hdisk1. A few others attributes need to be changed. The most important one is the PVID, located in CuAt. We will use the value reported as missing by lsvg -p rootvg. Attribute unique_id also need to be changed. You can just change a few characters in the existing string, it just need to be unique in the system. The other attributes to change are connwhere in CuDv and connection in CuPath. Their value represent the LUN ID of the disk. Again, this value is not relevant, it just have to be unique. We can check the current LUN defined by running lscfg on all the disks defined:
# lscfg -vl hdisk* hdisk0 U9117.570.65E4B6A-V6-C2-T1-L810000000000 Virtual SCSI Disk Drive hdisk2 U9117.570.65E4B6A-V6-C3-T1-L810000000000 Virtual SCSI Disk Drive hdisk3 U9117.570.65E4B6A-V6-C3-T1-L820000000000 Virtual SCSI Disk Drive
LUN 81 is used on controller C2 and LUNs 81 and 82 on C3. Let's choose 85, which for sure will not collide with other devices. The following commands will generate the text files that will be used to cheat the ODM, according to what was just explained:
# mkdir /tmp/cheat # cd /tmp/cheat # odmget -q "name=hdisk0" CuAt | sed -e 's/hdisk0/hdisk1/g' \ -e 's/00ce4b6ade6da849/00ce4b6a01292201/' \ -e 's/609SYMMETRIX/719SYMMETRIX/' > hdisk1.CuAt # odmget -q "name=hdisk0" CuDv | sed -e 's/hdisk0/hdisk1/' \ -e 's/810000000000/850000000000/' > hdisk1.CuDv # odmget -q "name=hdisk0" CuPath | sed -e 's/hdisk0/hdisk1/' \ -e 's/810000000000/850000000000/' > hdisk1.CuPAth
Let's look at the generated files:
# cat hdisk1.CuAt
CuAt:
name = "hdisk1"
attribute = "unique_id"
value = "3520200946033223719SYMMETRIX03EMCfcp05VDASD03AIXvscsi"
type = "R"
generic = ""
rep = "n"
nls_index = 0
CuAt:
name = "hdisk1"
attribute = "pvid"
value = "00ce4b6a012922010000000000000000"
type = "R"
generic = "D"
rep = "s"
nls_index = 11
# cat hdisk1.CuDv
CuDv:
name = "hdisk1"
status = 1
chgstatus = 2
ddins = "scsidisk"
location = ""
parent = "vscsi0"
connwhere = "850000000000"
PdDvLn = "disk/vscsi/vdisk"
# cat hdisk1.CuPath
CuPath:
name = "hdisk1"
parent = "vscsi0"
connection = "850000000000"
alias = ""
path_status = 1
path_id = 0
So, we are ready to insert the data in the ODM:
# odmadd hdisk1.CuAt # odmadd hdisk1.CuDv # odmadd hdisk1.CuPath # lsvg -p rootvg rootvg: PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION hdisk0 active 2157 1019 174..00..00..413..432 hdisk1 missing 2157 1019 174..71..00..342..432
The disk is now back in ODM! Now, to remove the disk from the VGDA, we use the reducevg command:
# reducevg rootvg hdisk1
0516-016 ldeletepv: Cannot delete physical volume with allocated
partitions. Use either migratepv to move the partitions or
reducevg with the -d option to delete the partitions.
0516-884 reducevg: Unable to remove physical volume hdisk1.
We will use the -d flag to remove the physical partitions associated to each logical volumes and located hdisk1. A few lines have been remove to simplify listing...
# reducevg -d rootvg hdisk1
0516-914 rmlv: Warning, all data belonging to logical volume
lv01 on physical volume hdisk1 will be destroyed.
rmlv: Do you wish to continue? y(es) n(o)?
y
0516-304 putlvodm: Unable to find device id 00ce4b6a012922010000000000000000 in the
Device Configuration Database.
0516-896 reducevg: Warning, cannot remove physical volume hdisk1 from
Device Configuration Database.
# lsvg -l rootvg
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot 2 2 1 closed/syncd N/A
hd6 paging 256 256 1 open/syncd N/A
hd8 jfs2log 1 1 1 open/syncd N/A
hd4 jfs2 7 7 1 open/syncd /
hd2 jfs2 384 384 1 open/syncd /usr
hd9var jfs2 64 64 1 open/syncd /var
hd3 jfs2 128 128 1 open/syncd /tmp
hd1 jfs2 2 2 1 open/syncd /home
hd10opt jfs2 32 32 1 open/syncd /opt
fslv04 jfs2 256 256 1 open/syncd /usr/sys/inst.images
loglv01 jfslog 1 1 1 closed/syncd N/A
lv01 jfs 5 5 1 closed/syncd /mkcd/cd_images
# lsvg -p rootvg
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk0 active 2157 1019 174..00..00..413..432
The disk has been deleted from the VGDA. What about ODM?
# lsdev -Cc disk
hdisk0 Available Virtual SCSI Disk Drive
hdisk1 Available Virtual SCSI Disk Drive
hdisk2 Available Virtual SCSI Disk Drive
hdisk3 Available Virtual SCSI Disk Drive
# rmdev -dl hdisk1
Method error (/etc/methods/ucfgdevice):
0514-043 Error getting or assigning a minor number.
We probably forgot to cheat one ODM class... Never mind: let's remove the cheat we added to ODM and see what appends:
# odmdelete -o CuAt -q "name=hdisk1"
2 objects deleted
# lspv
hdisk0 00ce4b6ade6da849 rootvg active
hdisk2 00ce4b6a01b09b83 drakevg active
hdisk1 none None
hdisk3 00ce4b6afd175206 drakevg active
# rmdev -dl hdisk1
Method error (/etc/methods/ucfgdevice):
0514-043 Error getting or assigning a minor number.
# odmdelete -o CuDv -q "name=hdisk1"
1 objects deleted
# lspv
hdisk0 00ce4b6ade6da849 rootvg active
hdisk2 00ce4b6a01b09b83 drakevg active
hdisk3 00ce4b6afd175206 drakevg active
# lspath
Enabled hdisk0 vscsi0
Enabled hdisk2 vscsi0
Enabled hdisk2 vscsi1
Enabled hdisk3 vscsi1
Enabled hdisk3 vscsi0
Unknown hdisk1 vscsi0
# odmdelete -o CuPath -q "name=hdisk1"
1 objects deleted
# lspath
Enabled hdisk0 vscsi0
Enabled hdisk2 vscsi0
Enabled hdisk2 vscsi1
Enabled hdisk3 vscsi1
Enabled hdisk3 vscsi0
That's it! Use with care.
Side note: This entry was originally contributed by Patrice Lachance, which first wrote about this subject.


Comments
There is a much simpler way to do this. (Under normal circurmstances you would not be able to delete a disk using rmdev. More likely, the physical disk has been physically removed/destroyed - The command [b]rmdev -dl hdiskX [/b] will be refused for an active volume group).
How you end up with a PVMISSING disk is irrevevant. Getting the system repaired is relevant! So, the simpler way! to correct volume group VGDA and AIX ODM.
A situation like this is more common:
CASE: While the volume group is offline, maintenance is performed on the disks. One disk is/was damaged beyond repair, or replaced during the process. Now back at AIX the volumes are to be reactivated.
root@aix530:[/]lsvg -p vgExport
0516-010 : Volume group must be varied on; use varyonvg command.
root@aix530:[/]varyonvg vgExport
PV Status: hdisk1 00c39b8d69c45344 PVACTIVE
hdisk2 00c39b8d043427b6 PVMISSING
Here there is a PVMISSING. For this case, the old hdisk2 is physically destroyed. All the data is lost, but AIX ODM and the VGDA on all other disks in the volume group do not know this yet.
First document what is lost:
root@aix530:[/]lsvg -l vgExport
vgExport:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
lvExport jfs2 416 416 1 closed/syncd /export
lvTest jfs 32 32 1 closed/syncd /scratch
loglv00 jfslog 1 1 1 closed/syncd N/A
Examine each logical partition for data that may be lost from hdisk2. Just one physical partition on hdisk2 implies that the filesystem is corrupt!
[b]root@aix530:[/]lslv -m lvExport | grep hdisk2 | tail -1
root@aix530:[/]lslv -m lvTest | grep hdisk2 | tail -1
0032 0083 hdisk2
root@aix530:[/]lslv -m loglv00 | grep hdisk2 | tail -1
0001 0084 hdisk2
[/b]
So, with this info I know that any data in /scratch is suspect, and should be restored from a backup.
To prepare for this I run the AIX command for removing a MISSING disk:
[b]
root@aix530:[/]lqueryvg -p hdisk1 -vPt
Physical: 00c39b8d69c45344 2 0
00c39b8d043427b6 1 0
VGid: 00c39b8d00004c000000011169c45a4b
root@aix530:[/]umount /scratch
umount: 0506-347 Cannot find anything to unmount.
root@aix530:[/]rmfs /scratch
rmfs: 0506-936 Cannot read superblock on /dev/lvTest.
rmfs: 0506-936 Cannot read superblock on /scratch.
rmfs: Unable to clear superblock on /scratchrmlv: Logical volume lvTest is removed.
root@aix530:[/]rmlv loglv00
Warning, all data contained on logical volume loglv00 will be destroyed.
rmlv: Do you wish to continue? y(es) n(o)? y
rmlv: Logical volume loglv00 is removed.
root@aix530:[/]lsvg -p vgExport
vgExport:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk1 active 511 95 00..00..00..00..95
hdisk2 missing 255 222 51..18..51..51..51
root@aix530:[/]lsvg -l vgExport
vgExport:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
lvExport jfs2 416 416 1 open/syncd /export
root@aix530:[/]ldeletepv -g 00c39b8d00004c000000011169c45a4b -p 00c39b8d043427b6
[/b]
Note: there is no output for the above command when all proceeds accordingly.
Now the regular AIX commands to verify VGDA and ODM are in order.
[b]
root@aix530:[/]lsvg -p vgExport
vgExport:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk1 active 511 95 00..00..00..00..95
root@aix530:[/]lsvg -l vgExport
vgExport:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
lvExport jfs2 416 416 1 open/syncd /export
[/b]
Summary:
This is much less error prone than using ODM commands and has been available in AIX for disk management since at least 1995 (when AIX 4 first came out. It may have been in AIX 3 as well, taking it back to 1991-1992, but I never did any system administration on AIX 3 to know for sure.
Important commands to review:
lslv (-m)
lqueryvg
ldeletepv
First, you are right in your assumption: I never had followed an AIX administration course (advanced or not), nor any other UNIX or operating system's courses.
Secondly, thank you for your great inputs. This is pretty interesting, and the proposed follow-up fits very well its purpose. So, assuming I am sure there is nothing stored on the faulted disk (since it is just a wrong information from the LVM), I understand I can forcibly clean things up using only the `ldeletepv' command.
I have cleaned up the entry a bit, and found an easier way to locate logical partitions that might still be in the VGDA and/or ODM.
http://rootvg.net/content/view/174/...
My apologies for not answering your direct question. If there was anything on the MISSING disk, the command ldeletepv will not remove the disk. You will need to use rmfs (or rmlv if not a file system) to remove the information about the logical volumes from the rest of the VGDA's. Once there are no other references to data on the missing disk the disk entry in the VGDA can be removed as well.
In your example, assuming that it was a mirror of rootvg that was lost (so only a mirror is missing) you could first execute:
unmirrorvg rootvg hdisk1
ldelevepv -v VGID -p PVID
and your system would be satisfied.
Thank you, Michael. This is effectively the way I understood your first answer.