AIX Reviewing AIX Error and Boot Logs
Reviewing AIX Error and Boot Logs
AIX provides comprehensive logging of events—some are errors requiring attention and others are just notifications. For system administrators, tasked to make sure the system is running without major issues, logging provides alerts or apprises them of events as they happen.
AIX offers different logs depending on the action and when it occurred. These logs hold information on the boot-up process, console, hardware and system software events. It’s up to the system admin to take action on these events, because once AIX has published the log, its job is done.
Logs, Logs, Logs
AIX not only offers the errpt but also other error reporting logs. Using the alog command one can list and pick a log to view:
# alog -L boot bosinst nim console cfg mdmplog lvmt lvmcfg dumpsymp
When issues arise during the boot-up process, for example, and you’re not at the console, you can review the start-up process messages, particularly the boot and console messages. To list the available logs:
alog -o -t
For example, to view the console log:
alog -o -t console
Logging Your Own Entries
The standard errpt list hardware or software events in AIX that have occurred. However, you might want a message generated and inserted into errpt after some user interaction, for instance, if a system admin has made a change. This allows the change notification to be visible via errpt. Like the logger command that writes to the system log (messages file), errlogger will write an operator notification entry to the error log. For example, having completed an AIX upgrade, you could post that to the error log, so other users could view it, like so:
errlogger "AIX upgrade completed - no errors- test"
Working With errpt
The first thing AIX admins should do is get event notifications via email. Those errors/warnings will be emailed as well as posted to the errpt log. First, create an email alias containing all system admins’ addresses in the /etc/mail/aliases file. Insert the email alias into the notification list, using the following smit selections: smit diag, current shell diagnostic, task selection, automatic error log notification. Now you’ll get errpt log emails as they’re posted to the errpt file.
The errpt list has headers in the following format:
identifier, timestamp, type, class resource, description.
A typical list entry could be:
A6DF45AA 0410183413 I O RMCdaemon The daemon is started.
Some system admins view the errpt listing and list the errpt, in full, using the following commands, then clear the whole errprt when done:
errpt errpt -a errclear 0
However, one can be more explicit. To clear errpt entries older than two days, use
errclear 2
To clear all software errors by using the resource name, try:
errclear -d S 0
To clear down all ent0 entries:
errclear -N ent0 0
To clear all SYSPROC entries:
errclear -N SYSPROC 0
To clear by identifier:
errclear -J <identifier> 0
In the last example, identifier is used to locate and clear an entry. It can also be used to view entries:
errpt -j <identifier>
To view the full entries by identifier:
errpt -a -j <identifier>
Of course, it’s OK to get information from the errpt using the identifier, but sometimes you need to keep it simple. So to extract all entries relating to, say, hdisk1, use the resource name to extract from the errprt:
errpt -N hdisk1
To extract all entries relating to ent0, try:
errpt -N ent0
If you want to view entries based on hardware or software, simply supply the class type. To view any hardware-related issues, for instance, use:
errpt -d H
Similarly for software, which would include core dumps and shutdowns, use:
errpt -d S
For operator, including notice events, file system space issues and services that terminate:
errpt -d O
Another identifier, called U (undetermined), logs events that don’t fall into any other category.
Don’t Report These Errors
There are occasions when the errpt gets filled with notifications you don’t really care about. Still, you want AIX to log them—just not report them. This could be due to a rush of notifications that you don’t want reported until a certain issue has been fixed. To view current errpt entries that have been disabled from reporting, use:
errpt -t -F Report=0
To view the current repository list containing the complete list of identifiers, labels, descriptions, etc., try:
errpt -t
Consider a scenario where you wish to stop report logging of events for a disk raid. The system repeatedly tries to rebuild, but you don’t need AIX to keep telling you. To disable the reporting of the raid rebuild, first obtain the identifier—FE7D0EED—by listing the errpt repository. To disable reporting of that identifier:
# errupdate <hit return> =FE7D0EED: <hit return> Report=false <hit CTRL-D> <hit CTRL-D> 0 entries added. 0 entries deleted. 1 entries updated. #
In the output above, the “=” sign indicates to modify report entry. The text also shows where you should hit return and CTRL-D in the inactive errupdate utility. To confirm that reporting was disabled, use the errpt -t -F Report=0 command. At some point, you’ll want to re-enable this report. To do so:
# errupdate <hit return> =FE7D0EED: <hit return> < hit CTRL-D> < hit CTRL-D> 0 entries added. 0 entries deleted. 1 entries updated. #
Again, review the repository to check identifiers that have been disabled/enabled from reporting.
If Logging Stops
If your errlog stops logging/reporting events, chances are the log is full or corrupted. A quick fix is to zero the file. First, stop the errpt service:
# /usr/lib/errstop
Next, remove the /var/adm/ras/errlog:
# rm /var/adm/ras/errlog
Restart it:
# /usr/lib/errdemon
You’re good to go. To view attributes relating to the errolog, use:
# /usr/lib/errdemon -l