Got Duplicate PVIDs in Your User VG? Try Recreatevg!


Got Duplicate PVIDs in Your User VG? Try Recreatevg!

 

June 2016 | by David Tansley

 

 

I recently installed a new SP (service pack) level on AIX. I then rebooted the box, as is usual practice for the changes to take effect. But upon checking the mounted file- systems, post reboot, I noticed one of the file-systems hadn’t mounted. I checked the VG (Volume Group) called: data_vg and indeed the LV (Logical Volumes) were closed. In fact the VG was offline, this was checked by listing the online VGs only:

# lsvg -o
root

The VG (data_vg) was definitely offline but was present as far as AIX was aware.

# lsvg
rootvg
data_vg

I knew from a configuration listing that always gets generated and mailed to the team prior to a reboot, what I should have in regards to the devices present on the box. This is when I noticed from the report that hdisk41 and hdisk42 had duplicate PVIDs (Physical Volume Identifiers), prior to the reboot and still have duplicate PVIDs after the reboot.

hdisk0	         00cd94b6d734dfa2             rootvg
hdisk40          00cd94b6d382cda5		data_vg
hdisk41          00cd94b6d2bc9362            data_vg                     
hdisk44          00cd94b6d2bc9362            data_vg

For the LV's I had the following in the VG:

data_vg:
LV NAME             TYPE       LPs     PPs     PVs  LV STATE      MOUNT POINT
fslv03                jfs2log    1       1       1    closed/syncd  N/A
fslv06                jfs2       1       1       1    closed/syncd  /maps_uk

I then decided to manually vary the VG online:

# varyonvg data_vg
0516-1775 varyonvg: Physical volumes hdisk41 and hdisk44 have identical PVIDS (00cd94b6d2bc9362).

Now this was interesting because no new disks had been imported to cause this issue. As far as I was aware no disk copies via the SAN were undertaken. I had the correct multi-path drivers on, though I couldn’t see that as the issue that caused this problem. I needed to look into it. But first I needed to get the VG online and file-system mounted.

So I decided to export the VG and re-import to see what I ended up with:

# exportvg data_vg
# importvg -y data_vg hdisk44
0516-776 importvg: Cannot import hdisk44 as data_vg.

As can be seen from the above output. I couldn’t import the VG data_vg. AIX was complaining about hdisk44. I needed to investigate this further.

Query the ODM

The next task was to check the duplicate PVID disks by querying the disk header. To make sure I had no mismatches with the disks

 # lquerypv -h /dev/hdisk41 80 10
00000080   00CD94B6 D2BC9362 00000000 00000000  |.......b........|
# lquerypv -h /dev/hdisk44 80 10
00000080   00CD94B6 D2BC9362 00000000 00000000  |.......b........|

The second and third columns of the output report the current PVIDs. In this case two different disks had the same PVID. Not looking good, but fixable. So I queried the ODM (Object Data Manager) to see what AIX actually thought it had regarding the duplicate PVIDs, using the odmget command:

# odmget -q "name=hdisk44 and attribute=pvid" CuAt
CuAt:
        name = "hdisk44"
        attribute = "pvid"
        value = "00cd94b6d2bc93620000000000000000"
        type = "R"
 …

# odmget -q "name=hdisk41 and attribute=pvid" CuAt
CuAt:
        name = "hdisk41"
        attribute = "pvid"
        value = "00cd94b6d2bc93620000000000000000"
…

Looking at the above output the ODM reports both disks with the same PVID, so no mismatch which is what I was expecting to find.

At this point I thought I would try and re-generate the PVID of one of the disks.

# chdev -a pv=yes -l hdisk41
hdisk41 changed

Hooray! AIX says it has changed the PVID, but let’s check that:

# lspv
hdisk0	        00cd94b6d734dfa2           rootvg
hdisk40         00cd94b6d382cda5	     None
hdisk41         00cd94b6d2bc9362           None
hdisk44         00cd94b6d2bc9362           None        

No PVID had changed. I tried the same procedure with the other disk, unfortunately the same result, no change on the PVID.

I then considered clearing the PVID, with:

chdev -a pv=clear -l hdisk41

However, this operation is permanent; I didn’t know what change this would make on the data contained in the exported VG, as this was a production box. I decided to tread carefully. I didn’t want to be put into the position requesting a restore of the file-system from the netbackup team.

Try Redefining the VG

I couldn’t import the VG or change the PVID of the disks. The only option I really had, with one eye on not losing the data, was to use redefinevg. This command is actually called as part of the importvg command, so I thought this is my best bet, not the only option available to me, but certainly the safest one. This command will define the disks in the volume group. With redefinevg you only need to specify one member disk of the VG, if all goes well it should import the VG as well. So I tried that.

# redefinevg -d hdisk41 data_vg

No output was returned which is good. So let’s see if the VG is present:

# lsvg
rootvg
data_vg

So far so good. Now I tried to varyon the VG, to make it come online:

# varyonvg data_vg
0516-1775 varyonvg: Physical volumes hdisk41 and hdisk44 have identical PVIDs (00cd94b6d2bc9362)

Same issue: duplicate PVIDs! At least I am back to where I started. So at this point I decided to let AIX re-generate the PVIDs with the recreatevg command.

First I decided to export the VG, and then remove the disks associated with the VG (data_vg):

# exportvg data_vg
# rmdev -dl hdisk41; rmdev -dl hdisk44; rmdev –dl hdisk40

Next I ran the cfgmr command to rediscover the devices:

# cfgmgr

Recreate the VG with Confidence

Now I was in a strong position to bring the data back in and online, by using recreatevg. This command will literally recreate the VG in question, take the information from the ODM and assign new PVIDs to the disks contained in the VG recreation. Typically this command is used when you want to clone a VG from a SAN disk and import it back onto the same box.

The format of the command I was going to use was:

recreatevg -Y NA < vg_name>  

With the recreatevg, unlike the importvg command, you need to specify all disks that are associated with the VG. It doesn’t matter what order the hdisks are in. With -Y and -L you can specify a prefix to LVs. This prefix would also be used in the mount points found in /etc/filesystems file. As I wasn’t cloning a set of file-systems, I just wanted the VG and file systems back as it was. So I specified NA, which means to not prefix.

# recreatevg -YNA data_vg hdisk40 hdisk41 hdisk44
# echo $?
0

No output from the above command, I expected it to echo out what LV's or file-systems it had imported. I checked the exit status from the command line. A zero (0), means it completed with no errors. This filled me with confidence. It was all starting to look all good.

 # lsvg –o
rootvg
data_vg

The VG data_vg has been successfully imported and varied on. I checked that I had no duplicate PVIDs. Looking at the below output no duplicates were found.

# lspv
hdisk0	        00cd94b6d734dfa2               rootvg
hdisk40         00cd94b6d3152d26	         data_vg
hdisk41         00cd94b6d3466c48              data_vg                     
hdisk44         00cd94b6d2bc8373              data_vg       

Then I checked the LV listing containing in the VG; all looked correct. Next I checked the file-system by running a fsck on it:

# fsck -y /dev/fslv06

This returned OK. Next task: mount the file-system. But first I double-checked the file /etc/filesytems to make sure AIX hadn’t prefixed the mount point in the /etc/filesytems file. It hadn’t, which is what I expected:

# grep "maps_uk" /etc/filesystems
/maps_uk:

No prefixes to change, so OK to mount then:

# mount -a
# df -g |grep maps
/dev/fslv06       40.00     32.45   19%     1880     1% /maps_uk

Lastly I synced the ODM with any new changes that might have occurred with the recreatevg with regards to the VG data_vg:

# synclvodm datavg
synclvodm: Physical volume data updated.
synclvodm: Logical volume fslv06 updated.
synclvodm: Logical volume fslv03 updated.

Now the system was usable to the business. I could hand it back to the application team.

Conclusion

Using the recreatevg, the duplicate PVIDs disappeared and new PVIDs were generated. The VG was recreated successfully. But more importantly I didn’t have to restore any data, as I had managed to recreate the VG with no data loss.

The next task was to try and find out why I had duplicate PVs in the first place. The file /tmp/lvmt.log was my friend here it did point me in the right direction. The lvmt.log file holds all LVM commands and processes that affect a LV, one may think of it as a history file on anything what was executed on a LV or VG. Looks like at some point someone had taken a SAN snapshot of one of the disks, but hadn’t removed it. At some point it got pulled into the VG.



Article Number: 470
Posted: Fri, Feb 1, 2019 2:36 PM
Last Updated: Fri, Feb 1, 2019 2:36 PM

Online URL: http://kb.ictbanking.net/article.php?id=470