What is OS Watcher Utility and How to use it for Database Troubleshooting ?

What is OS Watcher Utility and How to use it for Database Troubleshooting ?

 
 

Oracle OS Watcher (OSWatcher) is a tool to help Remote DBA's to trouble shoot Database performance, Cluster reboot, node eviction, DB server reboot, DB instance Crash related issues and many more.

As we know, OS stats like top, mpstat, netstat plays an important role in Database trouble shooting but there is no way to keep historical date for these stats. Here, OS Watcher is the only rescue for Database Administrator. Suppose Yesterday, There was some performance issue on Database Node but you were not aware about that and when you know that the issue was resolved itself.

Now, DBA can get Database related stats from AWR reports but not OS related stats for last day, To overcome this challenge Oracle introduce OS Watcher utility, which collects OS stats data at a frequency of five minutes and keep it for seven days (default settings). So Now, DBA need not to worry about historical OS stats.

To Trouble shoot Database performance related issues AWR, ADDM and OS Watcher logs are the first place to start for a Remote DBA. Where as for Cluster reboot, node eviction, DB server reboot Alter log files, OS Watcher and System messages (/var/log/messages) plays an important role.

How to Install OS Watcher Utility ?

1. Download tar file from Oracle Support Article "OSWatcher Black Box (Includes: [Video]) [ID 301137.1]"

2. Copy the file oswbb601.tar to the directory where oswbb is to be installed.
3. Extract tar file with “oracle” user

# tar xvf oswbb601.tar

4. Change to oswbb directory created.

5. Start OS Watcher utility using below command.

Example 1:

./startOSW.sh 60 10
This would start the tool and collect data at 60 second intervals and
log the last 10 hours of data to archive files.

Example 2:

./startOSW.sh
This would use the default values of 30, 48 and collect data at 30
second intervals and log the last 48 hours of data to archive files.

Example 3:

./startOSW.sh 20 24 gzip
This would start the tool and collect data at 20 second intervals and
log the last 24 hours of data to archive files. Each file would be
compressed by running the gzip utility after creation.

STOPPING OSW:
To stop the OSW utility execute the stopOSW.sh command. This terminates
all the processes associated with the tool.

Example:

./stopOSW.sh









The default location of OS Watcher files is /opt/oracle.oswatcher/osw/archive. To collect OS Watcher files for a particular day use below command.

# cd /opt/oracle.oswatcher/osw/archive
# find . -name '*13.03.15*' -print -exec zip /tmp/osw_`hostname`.zip {} \;

{where 13- year 03- Month 15-day}

Below are the list of sub folders created under archive folder

-bash-4.1$ ls

osw_ib_diagnostics   oswiostat            oswnetstat           oswps                oswvmstat
osw_rds_diagnostics  oswmpstat            oswprvtnet           oswtop


Troubleshooting using OS Watcher


Recently, Remote DBA face a node eviction issue in a three node Real Application Cluster environment.  To resolve this, we start looking at alter log files for DB and RAC env and OS Watcher logs. In OsWatcher Mpstat values at time of issue are given below

                 CPU   %user   %nice  %sys %iowait    %irq   %soft  %steal   %idle    intr/s
16:27:00     all    2.60    0.00    1.64   46.53    0.01    0.06    0.00   49.16   3088.40
16:27:05     all    0.44    0.00    1.50   17.50    0.01    0.05    0.00   80.50   2397.39
16:27:10     all    0.47    0.00    0.62   12.98    0.02    0.03    0.00   85.88   2361.48
16:27:15     all    1.00    0.00   14.08    5.34    0.01    0.04    0.00   79.52   2097.98
16:27:21     all    1.11    0.00   72.81   25.22    0.02    0.23    0.00    0.61   6164.79
16:27:28     all    0.73    0.00   98.59    0.56    0.02    0.10    0.00    0.00   5348.05
16:27:33     all    0.39    0.00   99.44    0.11    0.02    0.04    0.00    0.00   3578.19
16:30:02     all    0.16    0.00   96.24    2.63    0.00    0.03    0.00    0.93   1688.58
16:30:07     all    1.34    0.00    1.79   60.13    0.02    0.09    0.00   36.62   5086.03
16:30:12     all    0.99    0.00    0.49   80.87    0.03    0.07    0.00   17.56   3650.30

From the above data, this is clear that, All CUP were 100% utilized which cause system resource at crunch and system was rebooted. Now, DBA needs to look what case this high system utilization.

To troubleshoot this, DBA check top output at time of issue from OS Watcher logs in the top folder. There were around 200 Parallel process running at time. Then I cross check these process with another Top command output at good time, and it was clear that, these process was not running at good time. In conclusion, high number of parallel processes cause this issue.

So, the problem and reason is clear with the help of Oracle OS Watcher tool. This is a simple real life scenario to understand how OS watcher can help remote DBA to resolve issues. I have also mentioned steps for detailed analysis of OS Watcher logs.
0 (0)
Article Rating (No Votes)
Rate this article
Attachments
There are no attachments for this article.
Comments
There are no comments for this article. Be the first to post a comment.
Full Name
Email Address
Security Code Security Code
Related Articles RSS Feed
YUM How to use yum command on CentOS/RHEL
Viewed 7047 times since Thu, Oct 25, 2018
How to Easily Generate AIX Systems Management Reports
Viewed 2972 times since Wed, May 30, 2018
Fix rpmdb: Thread died in Berkeley DB library
Viewed 21007 times since Fri, Feb 14, 2020
AIX Reviewing AIX Error and Boot Logs
Viewed 2819 times since Wed, Mar 20, 2019
Linux - How to get Memory information
Viewed 1867 times since Fri, Jun 8, 2018
RHEL: Displaying system info (firmware, serial numbers... )
Viewed 11926 times since Sun, May 27, 2018
RHEL7: Create and configure LUKS-encrypted partitions and logical volumes to prompt for password and mount a decrypted file system at boot.
Viewed 11948 times since Mon, Aug 6, 2018
Script HW/SW AIX
Viewed 9100 times since Mon, Jun 4, 2018
12 Linux Rsync Options in Linux Explained
Viewed 11958 times since Wed, Oct 31, 2018
http://ibmsystemsmag.com/aix/administrator/backuprecovery/remote-sync/
Viewed 5376 times since Wed, May 30, 2018