What is OS Watcher Utility and How to use it for Database Troubleshooting ?

What is OS Watcher Utility and How to use it for Database Troubleshooting ?


Oracle OS Watcher (OSWatcher) is a tool to help Remote DBA's to trouble shoot Database performance, Cluster reboot, node eviction, DB server reboot, DB instance Crash related issues and many more.

As we know, OS stats like top, mpstat, netstat plays an important role in Database trouble shooting but there is no way to keep historical date for these stats. Here, OS Watcher is the only rescue for Database Administrator. Suppose Yesterday, There was some performance issue on Database Node but you were not aware about that and when you know that the issue was resolved itself.

Now, DBA can get Database related stats from AWR reports but not OS related stats for last day, To overcome this challenge Oracle introduce OS Watcher utility, which collects OS stats data at a frequency of five minutes and keep it for seven days (default settings). So Now, DBA need not to worry about historical OS stats.

To Trouble shoot Database performance related issues AWR, ADDM and OS Watcher logs are the first place to start for a Remote DBA. Where as for Cluster reboot, node eviction, DB server reboot Alter log files, OS Watcher and System messages (/var/log/messages) plays an important role.

How to Install OS Watcher Utility ?

1. Download tar file from Oracle Support Article "OSWatcher Black Box (Includes: [Video]) [ID 301137.1]"

2. Copy the file oswbb601.tar to the directory where oswbb is to be installed.
3. Extract tar file with “oracle” user

# tar xvf oswbb601.tar

4. Change to oswbb directory created.

5. Start OS Watcher utility using below command.

Example 1:

./startOSW.sh 60 10
This would start the tool and collect data at 60 second intervals and
log the last 10 hours of data to archive files.

Example 2:

This would use the default values of 30, 48 and collect data at 30
second intervals and log the last 48 hours of data to archive files.

Example 3:

./startOSW.sh 20 24 gzip
This would start the tool and collect data at 20 second intervals and
log the last 24 hours of data to archive files. Each file would be
compressed by running the gzip utility after creation.

To stop the OSW utility execute the stopOSW.sh command. This terminates
all the processes associated with the tool.



The default location of OS Watcher files is /opt/oracle.oswatcher/osw/archive. To collect OS Watcher files for a particular day use below command.

# cd /opt/oracle.oswatcher/osw/archive
# find . -name '*13.03.15*' -print -exec zip /tmp/osw_`hostname`.zip {} \;

{where 13- year 03- Month 15-day}

Below are the list of sub folders created under archive folder

-bash-4.1$ ls

osw_ib_diagnostics   oswiostat            oswnetstat           oswps                oswvmstat
osw_rds_diagnostics  oswmpstat            oswprvtnet           oswtop

Troubleshooting using OS Watcher

Recently, Remote DBA face a node eviction issue in a three node Real Application Cluster environment.  To resolve this, we start looking at alter log files for DB and RAC env and OS Watcher logs. In OsWatcher Mpstat values at time of issue are given below

                 CPU   %user   %nice  %sys %iowait    %irq   %soft  %steal   %idle    intr/s
16:27:00     all    2.60    0.00    1.64   46.53    0.01    0.06    0.00   49.16   3088.40
16:27:05     all    0.44    0.00    1.50   17.50    0.01    0.05    0.00   80.50   2397.39
16:27:10     all    0.47    0.00    0.62   12.98    0.02    0.03    0.00   85.88   2361.48
16:27:15     all    1.00    0.00   14.08    5.34    0.01    0.04    0.00   79.52   2097.98
16:27:21     all    1.11    0.00   72.81   25.22    0.02    0.23    0.00    0.61   6164.79
16:27:28     all    0.73    0.00   98.59    0.56    0.02    0.10    0.00    0.00   5348.05
16:27:33     all    0.39    0.00   99.44    0.11    0.02    0.04    0.00    0.00   3578.19
16:30:02     all    0.16    0.00   96.24    2.63    0.00    0.03    0.00    0.93   1688.58
16:30:07     all    1.34    0.00    1.79   60.13    0.02    0.09    0.00   36.62   5086.03
16:30:12     all    0.99    0.00    0.49   80.87    0.03    0.07    0.00   17.56   3650.30

From the above data, this is clear that, All CUP were 100% utilized which cause system resource at crunch and system was rebooted. Now, DBA needs to look what case this high system utilization.

To troubleshoot this, DBA check top output at time of issue from OS Watcher logs in the top folder. There were around 200 Parallel process running at time. Then I cross check these process with another Top command output at good time, and it was clear that, these process was not running at good time. In conclusion, high number of parallel processes cause this issue.

So, the problem and reason is clear with the help of Oracle OS Watcher tool. This is a simple real life scenario to understand how OS watcher can help remote DBA to resolve issues. I have also mentioned steps for detailed analysis of OS Watcher logs.
0 (0)
Article Rating (No Votes)
Rate this article
There are no attachments for this article.
There are no comments for this article. Be the first to post a comment.
Full Name
Email Address
Security Code Security Code
Related Articles RSS Feed
AIX Changing ’defined’ or ’missing’ hard disk states to ’Available’ in IBM Smart Analytics System for AIX environment
Viewed 8318 times since Wed, May 22, 2019
Stunnel Setup
Viewed 16779 times since Fri, Sep 28, 2018
OpenSSL: Check SSL Certificate Expiration Date and More
Viewed 5928 times since Mon, Feb 18, 2019
RHEL: Rebuilding the initial ramdisk image
Viewed 6182 times since Sat, Jun 2, 2018
Open SSL Creating Certificate Signing Request — CSR Generation
Viewed 1480 times since Mon, Feb 18, 2019
Linux - How to unlock and reset user’s account
Viewed 3355 times since Fri, Jun 8, 2018
LVM: Extend an existing Volume Group by adding a new disk
Viewed 5082 times since Sat, Jun 2, 2018
Mirroring session (TTY) on AIX using portmir
Viewed 8703 times since Thu, Feb 21, 2019
RHEL: Crash kernel dumps configuration and analysis on RHEL 6
Viewed 4146 times since Sat, Jun 2, 2018
Manage SSH Key File With Passphrase
Viewed 1832 times since Tue, Mar 5, 2019