AIX Errpt - Diag - Alog

ERROR LOGGING:

The errdemon is started during system initialization and continuously monitors the special file /dev/error for new entries sent by either the kernel or by applications. The label of each new entry is checked against the contents of the Error Record Template Repository, and if a match is found, additional information about the system environment or hardware status is added. A memory buffer is set by the errdemon process, and newly arrived entries are put into the buffer before they are written to the log to minimize the possibility of a lost entry. The errlog file is a circular log, storing as many entries as can fit within its defined size, the default is /var/adm/ras/errlog and it is in binary format

The name and size of the error log file and the size of the memory buffer may be viewed with the errdemon command:

# /usr/lib/errdemon -l

Log File                /var/adm/ras/errlog
Log Size                1048576 bytes
Memory Buffer Size      32768 bytes

------------------------------

/usr/lib/errdemon               restarts the errdemon program
/usr/lib/errstop                stops the error logging daemon initiated by the errdemon program
/usr/lib/errdemon -l            shows information about the error log file (path, size)
/usr/lib/errdemon -s 2000000    changes the maximum size of the error log file

errpt                           retrieves the entries in the error log
errpt -a -j AA8AB241            shows detailed info about the error (with -j, the error id can be specified)
errpt -s 1122164405 -e 11231000405
                                shows error log in a time period (-s start date, -e end date)
errpt -d H                      shows hardware errors (errpt -d S: software errors)

Error Classes:
    H: Hardware
    S: Software
    O: Operator
    U: Undetermined

Error Type:
    P: Permanent - unable to recover from error condition
       Pending - it may be unavailable soon due to many errors
       Performance - the performance of the device or component has degraded to below an acceptable level
    T: Temporary - recovered from condition after several attempts
    I: Informational
    U: Unknown - Severity of the error cannot be determined


Types of Disk Errors:
DISK_ERR1: Disk should be replaced it was used heavily
DISK_ERR2: caused by loss of electrical power
DISK_ERR3: caused by loss of electrical power
DISK_ERR4: indicates bad blocks on the disk (if more than one entry in a week replace disk)


errclear                  deletes entries from the error log (smitty errclear)
errclear 7                deletes entries older than 7 days (0 clears all messages)
errclear -j CB4A951F 0    deletes all the messages with the specified ID              
errlogger                 log operator messages to the system error log
                          (errlogger "This is a test message")


------------------------------

Mail notification via errpt and errnotify

AIX has an Error Notification object class in the Object Data Manager (ODM). An errnotify object is a "hook" into the error logging facility that causes the execution of a program whenever an error message is recorded. By default, there are a number of predefined errnotify entries, and each time an error is logged via errlog, it checks if that error entry matches the criteria of any of the Error Notification objects.

0. make sure mail sending is working correctly from the server
1. create a text file (i.e. /tmp/errnotify.txt), which will be added to ODM


Add the below lines if you want notifications on all kind of errpt entries:

errnotify:
  en_name = "mail_all_errlog"
  en_persistenceflg = 1
  en_method = "/usr/bin/errpt -a -l $1 | mail -s \"errpt $9 on `hostname`\" aix4adm@gmail.com"
        <--specify here the email addres


Add the below lines if you want notifications on permanent hardware entries only:

errnotify:
  en_name = "mail_perm_hw"
  en_class = H
  en_persistenceflg = 1
  en_type = PERM
  en_method = "/usr/bin/errpt -a -l $1 | mail -s \"Permanent hardware errpt $9 on `hostname`\" aix4adm@gmail.com"



2. root@bb_lpar: / # odmadd /tmp/errnotify.txt                                 <--add the content of the text file to ODM:
3. root@bb_lpar: / # odmget -q en_name=mail_all_errlog errnotify               <--check if it is added successfully
4. root@bb_lpar: / # errlogger "This is a test message"                        <--check mail notification with a test errpt entry

You can delete the addded errnotify object if it is not needed anymore:
root@bb_lpar: / # odmdelete -q 'en_name=mail_all_errlog' -o errnotify
0518-307 odmdelete: 1 objects deleted.

(source: http://www.kristijan.org/2012/06/error-report-mail-notifications-with-errnotify/)

--------------------------------------------------------------------------------------------

DIAGRPT: (DIAG logs reporter)

diagrpt                   Displays previous diagnostic results
cd /usr/lpp/diag*/bin
    ./diagrpt -r          Displays the short version of the Diagnostic Event Log
    ./diagrpt -a          Displays the long version of the Diagnostic Event Log



--------------------------------------------------------------------------------------------

ALOG:

/var/adm/ras             this directory contains the master log files (alog command can read these files)
                         e.g. /var/adm/ras/conslog

alog -L                  shows what kind of logs there are (console, boot, bosinst...), these can be used by: alog -of ...
alog -Lt <type>          shows the attibute of a type (console, boot ...): size, path to logfile...
alog -ot console         lists of those errors which are on the console
alog -ot boot            shows the bootlog
alog -ot lvmcfg          lvm log file, shows what lvm commands were used (alog -ot lvmt: shows lvm commands and libs)


--------------------------------------------------------------------------------------------

0 (0)
Article Rating (No Votes)
Rate this article
Attachments
There are no attachments for this article.
Comments
There are no comments for this article. Be the first to post a comment.
Full Name
Email Address
Security Code Security Code
Related Articles RSS Feed
AIX, Networking Etherchannel failover testing
Viewed 2472 times since Fri, Apr 19, 2019
Script HW/SW AIX
Viewed 8947 times since Mon, Jun 4, 2018
Aix: How to assign a specific PVID
Viewed 7115 times since Fri, Feb 1, 2019
sysdumpdev Command
Viewed 1729 times since Mon, Jul 16, 2018
IBM V7000f Performance Test with Vdbench Tool on IBM AIX
Viewed 15864 times since Thu, Jan 23, 2020
AIX www web Links
Viewed 3207 times since Fri, Apr 19, 2019
Calculate hdisk READ / WRITE throughput (sequential IO) from AIX systems
Viewed 2546 times since Thu, Feb 21, 2019
Topics: PowerHA / HACMP, Storage Adding a new volume group to an active PowerHA resource group
Viewed 2498 times since Mon, Jun 3, 2019
List STALE partitions across Volume Groups for each Logical Volume in AIX
Viewed 2423 times since Tue, Jul 17, 2018
What is OS Watcher Utility and How to use it for Database Troubleshooting ?
Viewed 29802 times since Thu, Jun 21, 2018