Tips I Picked up at the Power Systems Technical University

Tips I Picked up at the Power Systems Technical University

 

 
PowerVC

In October, I attended and spoke at the Power Systems Technical University in New Orleans. While I was there I was able to attend some very valuable sessions and picked up some great tips. I thought this might be an opportunity to share some of those with you. Even though I have been working in the area for a very long time, I find these conferences extremely useful—both for networking and the technical content. At this conference I was focusing on several areas that I wanted to increase my knowledge on: enhanced mode HMC, networking and Spectrum Scale (formerly GPFS). Below are a series of tidbits I picked up. Hopefully you’ll find them useful as well.

Networks

One of the more interesting things I learned (from Alexander Paul) was very basic : how to quickly transfer an IP address without causing an outage. I’ve been moving my NIM server from fully virtualized to standalone and found this to be very useful.

In my case I was moving from en3 to en1

ifconfig en3 192.168.65.150  transfer  en1
I then test with ping, etc and if it is working I make it permanent as follows:
chdev -l en1 -a netaddr=192.168.65.150 -a netmask=255.255.255.128 -a state=up
Be careful that you disable en3 before the next reboot or it 
will come back up with the same IP address and you will have a duplicate 
IP error. You can use chdev to set en3 to state=down or my personal favorite is to 
rmdev the adapter and run cfgmgr so it comes back clean:
rmdev -dl et3
rmdev -dl en3
rmdev -dl ent3
cfgmr

I also went to a great session on Wireshark by Roy Spencer. I’ve been looking at a lot of trace output for VIO servers and the SEA has duplicate packets because the SEA sees the packet come in from the LPAR and the transmit out. It turns out that Wireshark includes a utility to remove these duplicate packets:

editcap -F nettle -d ‘source’ ‘dest’

This will help me a lot in my analysis especially when trying to produce packet counts.

Power Masters

The tricks of the Power Masters (Gareth Coates) session is always a great place to find nuggets. In particular, he provided details on how to turn on remote execution, remote operation and remote virtual terminals for the HMC in Enhanced mode. This is not under HMC management—instead you’ll find it under users and security, systems and console security. There’s a version of this session available at the Power Virtual User Group and a demo of the Enhanced HMC at the UK PowerVM Virtual User Group. I picked up a lot of tips on the HMC in this session including tips around RMC.

VIOS and Other Tips I Picked Up

export CLI_DEBUG=33 Then run a VIO command and it will show you the actual AIX commands that are executed behind the scenes.

I decided to test this on my system and here is what I saw:
$ export CLI_DEBUG=33
$ lsnports
AIX: "/usr/lib/methods/viosmgr -t npiv -f query_fc_ports >/dev/null"
name             physloc                        			fabric tports aports swwpns  awwpns
fcs4             U78C9.001.WZS0234-P1-C6-T1          1     	64     57   	3088    3065
fcs5             U78C9.001.WZS0234-P1-C6-T2          1     	64     57   	3088    3065

lsnports is a useful way to check if your fibre ports are 
connected to an NPIC capable switch. 
If this is not the case you will see the following message:
“System doesn't have NPIV capable ports or no NPIV ports with the link up.”

$ lsmap -all -vnic
AIX: "lsdev -c adapter -t IBM,vnic-server -s vdevice -F "name" | wc -l -c"
AIX: "lsdev -c adapter -t IBM,vnic-server -s vdevice -F "name""

Control channel

Another useful tip is how to check for the control channel (the new default is 4095 as of 2.2.3)

# lsdev -C | grep ent | grep Shared
ent7        Available             Shared Ethernet Adapter
# entstat -d ent7 | grep "Control Channel"

Control Channel PVID: 4095 Control Channel Adapter: ent4

As you can see, the SEA is on ent7 and the virtual ethernet carrying the traffic is ent4

Path Issues

Recently I have worked on multiple systems where the paths have been messed up for various reasons. It became necessary to remove the missing or failed paths without destroying the enabled paths. The way to do this is using the lspath command and get it to list all of the paths for that disk.

lspath -l hdisk15 -H -F "name:parent:connection:status" | grep Missing   (or grep Failed)

You can put this into a loop where it can produce the correct output for you.

As an example

lspath -l hdisk15 -H -F "name:parent:connection:status"
name:parent:connection:status
hdisk15:fscsi4:500507680d088ef6,10000000000000:Enabled
hdisk15:fscsi4:500507680d088ef7,10000000000000:Enabled
hdisk15:fscsi5:500507680d048ef6,10000000000000:Enabled
hdisk15:fscsi5:500507680d048ef7,10000000000000:Enabled

If the first path on fcsci4 was missing you could then remove it using:

rmpath -dl hdisk15 -p fscsi4 -w 500507680d088ef6,10000000000000

There are several examples of how to script production of the output in an article by David Tansley.

WWPNs

Gareth also included a script to list the WWNs for the LPAR—I modified it slightly to add the hostname and adapter slot information:

# cat seewwpns.sh
hostj=`hostname`
lsdev -Ccadapter | grep fcs | awk '{print $1,$3}' | while read fcs slot
do
wwpn=`lscfg -vl $fcs | grep -i network | cut -c 37-60`
wwpn1=` lscfg -vl $fcs | grep -i Hardware | cut -c 47-64`
echo $hostj $fcs $slot $wwpn1 $wwpn
done
# ./seewwpns.sh
vio2 fcs4 05-00 WZS0234-P1-C6-T1 10000090FA530975
vio2 fcs5 05-01 WZS0234-P1-C6-T2 10000090FA530976
vio2 fcs6 06-04 WZS0234-P1-C8-T1 10000090FA740155
vio2 fcs7 06-05 WZS0234-P1-C8-T2 10000090FA740156

This is a great way to get the WWPNs in order to give them to the storage admins for zoning.

There’s also a -client flag on fcstat (as of VIO 2.2.2.2) that will get you information on statistics for all the WWNs and WWPNs that the VIO sees. You cqan run just fcstat -client to get full statistics or you can use the command to get the WWPNs and WWNs. You can also use the viostat -adapter command to get statistics. To get the WWNs and WWPNs, as padmin on vio2 I ran: fcstat -client | cut -c 1-50 Results were:

hostname   	dev  wwpn

vio2  		fcs4  0x10000090FA530975
aix1nim  	fcs2  0xC0507607DBD8002C
gpfs1  		fcs2  0xC0507607DBD80034

vio2  		fcs5  0x10000090FA530976
aix1nim  	fcs3  0xC0507607DBD8002E
gpfs1  		fcs3  0xC0507607DBD80036

From the HMC you can also try:

lshwres -r io --rsubtype slotchildren -m Server-8286-41A-SN215D3AV -F phys_loc,description,mac_address,wwpn   (you can add a | grep Fibre)

I don’t use the grep at the end on Fibre as I also want to see what FCOe adapters I have

hscroot@HMC:~> lshwres -r io --rsubtype slotchildren -m Server-8286-41A-SN215D3AV -F phys_loc,description,mac_address,wwpn | grep PCI

U78C9.001.WZS0234-P1-C6-T1,PCIe2 16Gb 2-Port Fibre Channel Adapter,null,10000090fa530975
U78C9.001.WZS0234-P1-C6-T2,PCIe2 16Gb 2-Port Fibre Channel Adapter,null,10000090fa530976
U78C9.001.WZS0234-P1-C8-T2,"PCIe2 4-port (2 10GbE+2 1GbE) FCoE+NIC SRIOV Adapter, FCoE PF",null,10000090fa740156
U78C9.001.WZS0234-P1-C8-T1,"PCIe2 4-port (2 10GbE+2 1GbE) FCoE+NIC SRIOV Adapter, NIC PF",0090fa740151,null
U78C9.001.WZS0234-P1-C8-T1,"PCIe2 4-port (2 10GbE+2 1GbE) FCoE+NIC SRIOV Adapter, FCoE PF",null,10000090fa740155
U78C9.001.WZS0234-P1-C8-T4,"PCIe2 4-port (2 10GbE+2 1GbE) FCoE+NIC SRIOV Adapter, NIC PF",0090fa740154,null

As you can see there are two excellent ways to get the same information, depending on where you want to request it from and what format you need it to be in.

Useful IBM Tools

I have mentioned these before, but there are a group of tools that I use all the time. They consist of nmon, nmon analyser, HMC Scanner, FLRT (fix level recommendation tool, FLRTVC (FLRT vulnerability checker) and fcstat -D.

nmon

nmon has been part of the operating system since AIX 6.1 and is used by admins all over the world to gather performance data. I like to run it using the following flags so that I don’t miss anything:

nmon -ft -AOPV^dMLW -s 15 -c 120

The above takes a 30 minute snapshot (120 x 15 second snaps) and includes asynchronous IO (A), the SEA (O) paging (P), volume groups (V), fibre adapter statistics (^), disk service times (d), memory pages (M), large pages (L) and workload manager statistics (W) if you are running workload manager. I run nmon all the time so my normal cron job runs for 24 hours using “-s 150 -c 576”. Depending on how granular you need to get you can tweak these values.

nmon analyser

nmon analyser goes hand in hand with nmon. It’s an Excel spreadsheet that processes nmon files and produces graphs of what is happening. It’s not a total performance monitoring solution, but it provides some valuable information for day to day performance work. I supplement this with my own data gathering scripts that gather more in depth data.

HMC Scanner

The HMC Scanner is a great tool for documenting your system in one place. The current version is 0.11.35—you actually download 0.11.24 and then replace the .jar file to get to 0.11.35. You can install it on AIX and run it from there against the HMC or you can run it from your desktop, assuming you have the correct Java installed. I normally run it from AIX using:

./hmcScanner.ksh  servername  hscroot -p password -stats

This produces a bunch of files, but the one you want is the one that ends .xls. This is the Excel spreadsheet with all the systems connected to the HMC documented.

FLRT and FLRTVC

FLRT (Fix level recommendation tool) has been around for several years now and it’s something I use whenever I am planning an upgrade or trying to determine how long I can stave off an upgrade for. The most recent version provides you with the release date for the versions you are running and their endo of service date (if announced). It also provides those for the recommended upgrade levels. It will also provide links to the readme files for the recommended updates.

FLRTVC is a vulnerability checker. You download the script and run it on the system to be reviewed. It uses wget or curl to try to download a file called apar.csv from IBM and it then checks known issues against your software levels. The most common things it finds are back levels of SSH, SSL and Java - by default these are at an insecure level and new levels need to be installed. At a minimum, Openssh should be at 7.1.102.1100 and openssl should be at 1.0.2.1100 to avoid security problems. Java should be at the August 2017 level which for Java7 is 7.0.0.610. I replace these on all my systems including my VIO servers. FLRTVC identifies the efixes and ifixes that need to go on, provides links to the readmes and also to the actual download. You can send the output to a .txt file, download it and open it in Excel (it’s in csv format) to make it easier to view. I typically run FLRTVC on my critical systems once a month.

fcstat

If you are looking at fibre adapter statistics then you will want to be familiar with the fcstat command, especially the fcstat -D. If you run fcstat -D against a fibre port it will provide the WWN/WWPN as well as the firmware version, the adapter speed that it is running:

FIBRE CHANNEL STATISTICS REPORT: fcs4
Device Type: FC Adapter (adapter/pciex/df1000e21410f10)
Serial Number: 1A34200935

ZA: 10.2.252.1919					This is the microcode – lsmcode -A shows 
it as: fcs4!df1000e21410f103.00010000020025201919
World Wide Node Name: 0x20000090FA530975
World Wide Port Name: 0x10000090FA530975	WWPN for zoning 
……..
Port Speed (supported): 16 GBIT			Can run up to 16Gbit
Port Speed (running):   8 GBIT			But is connected at 8Gbit
Port FC ID: 0x010300					Adapter FCID – useful to figure out where you are connected
Port Type: Fabric
Attention Type:   Link Up
Topology:  Point to Point or Fabric

Further down you can review statistics that will provide indications of whether there is queuing at the adapter or protocol drivers.

Create a clone of a boot disk

alt_disk_copy –d hdisk2
bosboot –a –d hdisk2
bootlist –m normal hdisk2

The above copies the current rootvg disk to hdisk2 which will become the altinst_rootvg volume group. You can use this as a backup before making changes to the current system (simple reboot of the new volume group to recover), or you can then work on the new volume group using alt_disk_install and alt_rootvg_op commands. You can also make minor changes (such as adding a host to the /etc/hosts on the alternat disk) by waking up the volume group:

alt_disk_install -W hdisk2
You can now go and vi /alt_inst/etc/hosts

When you are done you put the volume group back to sleep:

alt_disk_install -S hdisk2

Whenever you use alternate disk install or any of its derivatives be sure to point the bootlist to whichever disk you want to boot from next. By default, many of the alternate disk install commands change the bootlist to point to the new volume group and this may not be what you want to do.

Summary

I found the Power Systems Technical University conference to be very valuable in terms of picking up tips to make my life easier as a sysadmin. Hopefully you will find these tips helpful as well. I highly recommend attending these conferences—there are some great sessions, but many tips get picked up during the networking discussions where you can compare notes with others. Many thanks to all those who presented and shared their ideas with myself and others.

5 (1)
Article Rating (1 Votes)
Rate this article
Attachments
There are no attachments for this article.
Comments
There are no comments for this article. Be the first to post a comment.
Full Name
Email Address
Security Code Security Code
Related Articles RSS Feed
AIX Oracle tuning
Viewed 218877 times since Tue, Jul 2, 2019
Recovery AIX system when hang on boot (554 code error).
Viewed 16051 times since Thu, Feb 21, 2019
How to deal with performance monitoring in AIX ?
Viewed 7777 times since Fri, May 25, 2018
AIX oslevel version OS
Viewed 4864 times since Wed, Apr 17, 2019
Part 3, Tuning swap space settings AIX7
Viewed 8983 times since Wed, Jun 19, 2019
IBM V7000f Performance Test with Vdbench Tool on IBM AIX
Viewed 15977 times since Thu, Jan 23, 2020
How to Configure Sendmail not to Look up MX records
Viewed 3589 times since Fri, Apr 19, 2019
Backup and Restore With AIX
Viewed 4398 times since Sat, May 19, 2018
0516-404 allocp: This system cannot fulfill the allocation request. [AIX]
Viewed 5157 times since Thu, Sep 20, 2018
Part 1, The basics of network troubleshooting
Viewed 5203 times since Tue, May 22, 2018