Managing System Dump Devices sysdumpdev

 

Managing System Dump Devices

Technote (FAQ)


Question

Managing System Dump Devices

Answer

 

This document discusses how to manage storage devices used by AIX to store a system dump in the event of a catastrophic operating system software failure.

Its intent is to help the system administrator ensure that a system dump will be complete and usable for troubleshooting purposes.

This document applies to AIX versions 5, 6, and 7.

Managing system dump devices
Determining proper size for dump device
Setting a tape drive as a dump device
Do not dump to a mirrored logical volume
Dumping outside the rootvg
Remote dumps over to a network
How to create a dedicated dump device
Related documentation

Managing system dump devices

When an unexpected system halt occurs, the system dump facility automatically copies selected areas of kernel data to the primary dump device. These areas include kernel segment 0 as well as other areas registered in the Master Dump Table by kernel modules or kernel extensions.

There are two dumps devices (a primary and secondary). To view information about the current dump devices, enter:

sysdumpdev -l 

Example:

# sysdumpdev -l
primary              /dev/lg_dumplv
secondary            /dev/sysdumpnull
copy directory       /var/adm/ras
forced copy flag     FALSE
always allow dump    TRUE
dump compression     ON
type of dump         traditional

In this example, the primary dump device is the logical volume lg_dumplv. Current versions of AIX ship with dump compression enabled. The "type of dump" field is new as of AIX 6.1. An explanation of this is given in the man page for sysdumpdev,/b>.

When the operating system is installed, the primary dump device is automatically configured.

In AIX 5L /dev/hd6 is the default dump device unless at the time of installation the server has 4GB or more of physical memory, in which case /dev/lg_dumplv is configured as the dump device.

The default secondary dump device is /dev/sysdumpnull. This is a null device and any dump written to this device is lost.

Note that the flags forced copy flag, always allow dump, and dump compression are set to FALSE and ON by default. The most important of these is always allow dump on a server not managed by an HMC . When set to FALSE a dump is not captured, which is also true if none of the dump devices are set. This flag must be set to TRUE in order for AIX to successfully capture a dump. This flag is set using

  sysdumpdev -K
When using an HMC, or Hardware Management Console, the hypervisor tends to ignore this setting and will write out a dump to the dump device.

If the primary dump device is the primary paging device, the only way it can copy the dump to the filesystem save area is if there is enough free space in that filesystem. The free space in the filesystem can be determined with the df command. If the free space in that filesystem is not at least as large as the space required for the dump (sysdumpdev -e), then either increase the size of that filesystem to have enough free space, remove files in that filesystem until enough free space is available, or move the save area to another filesystem with the required space. The latter can be accomplished with the sysdumpdev command. This filesystem must be in the rootvg volume group.

The copy directory entry specifies a filesystem in the rootvg volume group where the dump will be copied upon reboot after a system dump. This only applies if the primary dump is the primary paging space (hd6).

The force copy flag entry specifies if the system will prompt you to copy this dump to external media if there is not enough space in the specified filesystem. If this is set to FALSE and the system cannot copy this dump to the filesystem, then it will discard the contents of the dump.


Determining proper size for dump device

The default dump device created for system use may NOT be large enough for a complete dump. To determine how large the dump device is, first determine what the primary dump device is using the procedure mentioned in this section. If the dump device is not currently set to a tape drive, then this device should be a logical volume. To retrieve information about this logical volume enter:

lslv <LOGICAL VOLUME NAME> 

Example:

lslv hd7 

This command will return a screen of information. Obtain the values for LPs and PP SIZE. Multiply these two values to get the size of the dump device in megabytes.

Next, determine how large the dump device for your machine should be.

To view an estimate of how large the dump device should be, enter:

sysdumpdev -e 

Example:

# sysdumpdev -e 
Estimated dump size in bytes: 4526080 

NOTE: This value will be what the CURRENT running machine would require. This value can change based on the activity of the machine. It is best to run this command when the machine is under its heaviest work load.

This will return a value in bytes. The primary dump device should be a size that is greater than the value returned.

If the dump device is a standard dump logical volume, such as lg_dumplv, then use the command extendlv to increase its size. If it is the primary paging space hd6, use the command chps.


Setting a tape drive as a dump device

If you do not have sufficient space on the system to store a dump, use a tape drive as the dump device. To accomplish this, put a blank tape in the desired tape drive and enter:

sysdumpdev -Pp /dev/rmt# 

In this case, rmt# refers to the specific tape drive you want to use for this (for example, rmt0, rmt1, rmt2, etc.)

Be aware that the tape drive will not be usable by any other application until you re-assign the dump device to another location.


Do not dump to a mirrored logical volume

It is not recommended that a standalone dump logical volume be mirrored. It is much better practice to have a primary and a secondary dump device, each wholly contained on separate hdisks, rather than mirroring these devices. If for some reason the primary dump device is inaccessible the dump program will then attempt to dump to the secondary device.


Dumping outside the rootvg

It is possible to write a dump to a dump device outside of the rootvg. This can be done by temporarily setting the primary dump device to a logical volume not in the rootvg or by setting the secondary device to a logical volume not in the rootvg. For more information please consult the man page for sysdumpdev:

  man sysdumpdev 

Remote dumps over a network

Note: This only applies to uniprocessor systems.

Currently, the system dump does not handle ARP requests received from the server, or the gateway used, during the dump. If an ARP request is received while taking a dump, this causes the dump to hang. If your system takes a system dump and hangs on 0c7, this is likely the problem. At this point, power the system off and reboot.

To avoid this problem, create a permanent ARP entry for the client (the dumping machine) on the server or gateway. The machine that needs the permanent ARP entry is the machine on the same local network or ring as the client. This can be thought of as the logical server, since, if it is not the real server, the dump data must pass through it to get to the real server.

NOTE: "Real server" refers to the machine designated in the remote dump specification on the client.

Run the following steps on the real server to establish a permanent ARP entry on the server or gateway machine.

  1. Ensure an ARP entry exists by pinging the client. Example:
        ping myclient.xyz.com 
    
  2. Use arp -a to see the ARP table. Example:
     # arp -a 
    

    The following four lines of text should appear as two full lines.

     myclient.xyz.com (128.3.56.9) 
                       at 10:0:5a:9:e:7d [token ring] 
     myserver.xyz.com(128.3.56.20) 
                       at 10:0:5a:8f:12:bf [token ring]| 
    
  3. Now use the arp command to make the dumping client's entry permanent. Example:
         # arp -s 802.5 myclient.xyz.com 10:0:5a:9:e:7d 
    

The 802.5 refers to a token-ring network. Valid network types are listed in the ARP documentation of the product documentation, and are currently ether(802.3), fddi, and 802.5.

NOTE: If the dump hangs and the client must be rebooted, the partial dump on the server may still be useful.


How to create a dedicated dump device

  1. View an estimate of the dump size, enter:
    sysdumpdev -e
    

    You should see information similar to the following:

    0453-041 Estimated dump size in bytes: 25103360
    
  2. View the PP size, enter:
    lsvg rootvg
    

    You should see information similar to the following:

    VOLUME GROUP:   rootvg          VG IDENTIFIER:  0000003173650c77
    VG STATE:       active          PP SIZE:        4 megabyte(s)
    VG PERMISSION:  read/write      TOTAL PPs:      479 (1916 megabytes)
    MAX LVs:        256         FREE PPs:       258 (1032 megabytes)
    LVs:            11  USED PPs:       221 (884 megabytes)
    OPEN LVs:       10  QUORUM:         2
    TOTAL PVs:      1  VG DESCRIPTORS: 2
    STALE PVs:      0  STALE PPs       0
    ACTIVE PVs:     1  AUTO ON:        yes
    
  3. Determine necessary number of PPs (physical partitions). Divide the estimated size (sysdumpdev -e), by the PP size to estimate the proper number of PPs that the dump logical volume should have.

     

  4. Determine where you have free PPs, enter:
    lsvg -p rootvg
    

    You should see information similar to the following:

    rootvg:
    PV_NAME           PV STATE    TOTAL PPs   FREE PPs    FREE DISTRIBUTION
    hdisk1             active       479         258       78..02..00..82..96
    hdisk2             active       159          0        00..00..00..00..00
    hdisk3             active        75          8        00..00..00..00..08
    

    NOTE: You should use the hdisk with the highest number of free PPs (in this example hdisk1).

     

  5. Create a LV, enter:
    mklv -y dumplv -t sysdump rootvg 7 hdisk1 
    
  6. Set LV as the dump device, enter:
    sysdumpdev -Pp /dev/dumplv
    

    You should see information similar to the following:

    primary              /dev/dumplv
    secondary            /dev/sysdumpnull
    copy directory       /var/adm/ras
    forced copy flag     TRUE
    always allow dump    FALSE
    
  7. Change always allow dump to TRUE, enter:
    sysdumpdev -K
    
  8. Verify that the flag has been changed, enter:
    sysdumpdev -l
    

    You should see information similar to the following:

    primary              /dev/dumplv
    secondary            /dev/sysdumpnull
    copy directory       /var/adm/ras
    forced copy flag     TRUE
    always allow dump    TRUE
    

Related documentation

For more in-depth coverage of this subject, the following IBM documents are recommended:

  • Common Diagnostics and Service Guide (SA23-2687)
  • Diagnostic Information for Multiple Bus Systems (SA38-0509)
  • Problem Solving Guide and Reference (SA23-2204) (SA23-2606)
  • System Management Guide, V4 (SC23-2525)

You can also visit the following URL:
http://publib16.boulder.ibm.com/pseries/

0 (0)
Article Rating (No Votes)
Rate this article
Attachments
There are no attachments for this article.
Comments
There are no comments for this article. Be the first to post a comment.
Full Name
Email Address
Security Code Security Code
Related Articles RSS Feed
LVM: Display basic information about Physical Volumes, Volume Groups and Logical Volumes
Viewed 2666 times since Sun, Jun 3, 2018
AIX 6/7 Script to create a file with commands to remove missing and failed paths
Viewed 2768 times since Tue, Jun 14, 2022
Recovery AIX system when hang on boot (554 code error).
Viewed 15144 times since Thu, Feb 21, 2019
IVM and VLAN Tagging
Viewed 10044 times since Mon, May 28, 2018
AIX, Networking↑ Adding and deleting a static network route using the command line
Viewed 2338 times since Fri, Apr 19, 2019
Awesome Command to show top 15 processes using memory on AIX
Viewed 23384 times since Thu, Nov 29, 2018
Authenticate AIX using MS DC’s kerberos servers (Active Directory)
Viewed 1950 times since Thu, Feb 21, 2019
AIX 6.1: Extend VG 0516-1714 extendvg 0516-1254 extendvg
Viewed 4245 times since Fri, Jul 6, 2018
AIX, user gets “pwd: The file access permissions do not allow the specified action.”
Viewed 10241 times since Tue, Mar 16, 2021
Useful AIX general commands
Viewed 11421 times since Wed, Apr 17, 2019