AIX Errpt - Diag - Alog
ERROR LOGGING:
The errdemon is started during system initialization and continuously monitors the special file /dev/error for new entries sent by either the kernel or by applications. The label of each new entry is checked against the contents of the Error Record Template Repository, and if a match is found, additional information about the system environment or hardware status is added. A memory buffer is set by the errdemon process, and newly arrived entries are put into the buffer before they are written to the log to minimize the possibility of a lost entry. The errlog file is a circular log, storing as many entries as can fit within its defined size, the default is /var/adm/ras/errlog and it is in binary format
The name and size of the error log file and the size of the memory buffer may be viewed with the errdemon command:
# /usr/lib/errdemon -l
Log File /var/adm/ras/errlog
Log Size 1048576 bytes
Memory Buffer Size 32768 bytes
------------------------------
/usr/lib/errdemon restarts the errdemon program
/usr/lib/errstop stops the error logging daemon initiated by the errdemon program
/usr/lib/errdemon -l shows information about the error log file (path, size)
/usr/lib/errdemon -s 2000000 changes the maximum size of the error log file
errpt retrieves the entries in the error log
errpt -a -j AA8AB241 shows detailed info about the error (with -j, the error id can be specified)
errpt -s 1122164405 -e 11231000405
shows error log in a time period (-s start date, -e end date)
errpt -d H shows hardware errors (errpt -d S: software errors)
Error Classes:
H: Hardware
S: Software
O: Operator
U: Undetermined
Error Type:
P: Permanent - unable to recover from error condition
Pending - it may be unavailable soon due to many errors
Performance - the performance of the device or component has degraded to below an acceptable level
T: Temporary - recovered from condition after several attempts
I: Informational
U: Unknown - Severity of the error cannot be determined
Types of Disk Errors:
DISK_ERR1: Disk should be replaced it was used heavily
DISK_ERR2: caused by loss of electrical power
DISK_ERR3: caused by loss of electrical power
DISK_ERR4: indicates bad blocks on the disk (if more than one entry in a week replace disk)
errclear deletes entries from the error log (smitty errclear)
errclear 7 deletes entries older than 7 days (0 clears all messages)
errclear -j CB4A951F 0 deletes all the messages with the specified ID
errlogger log operator messages to the system error log
(errlogger "This is a test message")
------------------------------
Mail notification via errpt and errnotify
AIX has an Error Notification object class in the Object Data Manager (ODM). An errnotify object is a "hook" into the error logging facility that causes the execution of a program whenever an error message is recorded. By default, there are a number of predefined errnotify entries, and each time an error is logged via errlog, it checks if that error entry matches the criteria of any of the Error Notification objects.
0. make sure mail sending is working correctly from the server
1. create a text file (i.e. /tmp/errnotify.txt), which will be added to ODM
Add the below lines if you want notifications on all kind of errpt entries:
errnotify:
en_name = "mail_all_errlog"
en_persistenceflg = 1
en_method = "/usr/bin/errpt -a -l $1 | mail -s \"errpt $9 on `hostname`\" aix4adm@gmail.com" <--specify here the email addres
Add the below lines if you want notifications on permanent hardware entries only:
errnotify:
en_name = "mail_perm_hw"
en_class = H
en_persistenceflg = 1
en_type = PERM
en_method = "/usr/bin/errpt -a -l $1 | mail -s \"Permanent hardware errpt $9 on `hostname`\" aix4adm@gmail.com"
2. root@bb_lpar: / # odmadd /tmp/errnotify.txt <--add the content of the text file to ODM:
3. root@bb_lpar: / # odmget -q en_name=mail_all_errlog errnotify <--check if it is added successfully
4. root@bb_lpar: / # errlogger "This is a test message" <--check mail notification with a test errpt entry
You can delete the addded errnotify object if it is not needed anymore:
root@bb_lpar: / # odmdelete -q 'en_name=mail_all_errlog' -o errnotify
0518-307 odmdelete: 1 objects deleted.
(source: http://www.kristijan.org/2012/06/error-report-mail-notifications-with-errnotify/)
--------------------------------------------------------------------------------------------
DIAGRPT: (DIAG logs reporter)
diagrpt Displays previous diagnostic results
cd /usr/lpp/diag*/bin
./diagrpt -r Displays the short version of the Diagnostic Event Log
./diagrpt -a Displays the long version of the Diagnostic Event Log
--------------------------------------------------------------------------------------------
ALOG:
/var/adm/ras this directory contains the master log files (alog command can read these files)
e.g. /var/adm/ras/conslog
alog -L shows what kind of logs there are (console, boot, bosinst...), these can be used by: alog -of ...
alog -Lt <type> shows the attibute of a type (console, boot ...): size, path to logfile...
alog -ot console lists of those errors which are on the console
alog -ot boot shows the bootlog
alog -ot lvmcfg lvm log file, shows what lvm commands were used (alog -ot lvmt: shows lvm commands and libs)
--------------------------------------------------------------------------------------------