What is OS Watcher Utility and How to use it for Database Troubleshooting ?

What is OS Watcher Utility and How to use it for Database Troubleshooting ?

 
 

Oracle OS Watcher (OSWatcher) is a tool to help Remote DBA's to trouble shoot Database performance, Cluster reboot, node eviction, DB server reboot, DB instance Crash related issues and many more.

As we know, OS stats like top, mpstat, netstat plays an important role in Database trouble shooting but there is no way to keep historical date for these stats. Here, OS Watcher is the only rescue for Database Administrator. Suppose Yesterday, There was some performance issue on Database Node but you were not aware about that and when you know that the issue was resolved itself.

Now, DBA can get Database related stats from AWR reports but not OS related stats for last day, To overcome this challenge Oracle introduce OS Watcher utility, which collects OS stats data at a frequency of five minutes and keep it for seven days (default settings). So Now, DBA need not to worry about historical OS stats.

To Trouble shoot Database performance related issues AWR, ADDM and OS Watcher logs are the first place to start for a Remote DBA. Where as for Cluster reboot, node eviction, DB server reboot Alter log files, OS Watcher and System messages (/var/log/messages) plays an important role.

How to Install OS Watcher Utility ?

1. Download tar file from Oracle Support Article "OSWatcher Black Box (Includes: [Video]) [ID 301137.1]"

2. Copy the file oswbb601.tar to the directory where oswbb is to be installed.
3. Extract tar file with “oracle” user

# tar xvf oswbb601.tar

4. Change to oswbb directory created.

5. Start OS Watcher utility using below command.

Example 1:

./startOSW.sh 60 10
This would start the tool and collect data at 60 second intervals and
log the last 10 hours of data to archive files.

Example 2:

./startOSW.sh
This would use the default values of 30, 48 and collect data at 30
second intervals and log the last 48 hours of data to archive files.

Example 3:

./startOSW.sh 20 24 gzip
This would start the tool and collect data at 20 second intervals and
log the last 24 hours of data to archive files. Each file would be
compressed by running the gzip utility after creation.

STOPPING OSW:
To stop the OSW utility execute the stopOSW.sh command. This terminates
all the processes associated with the tool.

Example:

./stopOSW.sh









The default location of OS Watcher files is /opt/oracle.oswatcher/osw/archive. To collect OS Watcher files for a particular day use below command.

# cd /opt/oracle.oswatcher/osw/archive
# find . -name '*13.03.15*' -print -exec zip /tmp/osw_`hostname`.zip {} \;

{where 13- year 03- Month 15-day}

Below are the list of sub folders created under archive folder

-bash-4.1$ ls

osw_ib_diagnostics   oswiostat            oswnetstat           oswps                oswvmstat
osw_rds_diagnostics  oswmpstat            oswprvtnet           oswtop


Troubleshooting using OS Watcher


Recently, Remote DBA face a node eviction issue in a three node Real Application Cluster environment.  To resolve this, we start looking at alter log files for DB and RAC env and OS Watcher logs. In OsWatcher Mpstat values at time of issue are given below

                 CPU   %user   %nice  %sys %iowait    %irq   %soft  %steal   %idle    intr/s
16:27:00     all    2.60    0.00    1.64   46.53    0.01    0.06    0.00   49.16   3088.40
16:27:05     all    0.44    0.00    1.50   17.50    0.01    0.05    0.00   80.50   2397.39
16:27:10     all    0.47    0.00    0.62   12.98    0.02    0.03    0.00   85.88   2361.48
16:27:15     all    1.00    0.00   14.08    5.34    0.01    0.04    0.00   79.52   2097.98
16:27:21     all    1.11    0.00   72.81   25.22    0.02    0.23    0.00    0.61   6164.79
16:27:28     all    0.73    0.00   98.59    0.56    0.02    0.10    0.00    0.00   5348.05
16:27:33     all    0.39    0.00   99.44    0.11    0.02    0.04    0.00    0.00   3578.19
16:30:02     all    0.16    0.00   96.24    2.63    0.00    0.03    0.00    0.93   1688.58
16:30:07     all    1.34    0.00    1.79   60.13    0.02    0.09    0.00   36.62   5086.03
16:30:12     all    0.99    0.00    0.49   80.87    0.03    0.07    0.00   17.56   3650.30

From the above data, this is clear that, All CUP were 100% utilized which cause system resource at crunch and system was rebooted. Now, DBA needs to look what case this high system utilization.

To troubleshoot this, DBA check top output at time of issue from OS Watcher logs in the top folder. There were around 200 Parallel process running at time. Then I cross check these process with another Top command output at good time, and it was clear that, these process was not running at good time. In conclusion, high number of parallel processes cause this issue.

So, the problem and reason is clear with the help of Oracle OS Watcher tool. This is a simple real life scenario to understand how OS watcher can help remote DBA to resolve issues. I have also mentioned steps for detailed analysis of OS Watcher logs.
0 (0)
Article Rating (No Votes)
Rate this article
Attachments
There are no attachments for this article.
Comments
There are no comments for this article. Be the first to post a comment.
Full Name
Email Address
Security Code Security Code
Related Articles RSS Feed
Remove disk from volumegroup in AIX
Viewed 3616 times since Tue, Apr 16, 2019
Manages processor scheduler tunable parameters schedo AIX
Viewed 1925 times since Thu, Sep 20, 2018
How to create a Systemd service in Linux
Viewed 1684 times since Mon, Dec 7, 2020
AIX: Error code 0516-1339, 0516-1397 0516-792: cannot extendvg with a previous Oracle ASM disk
Viewed 1582 times since Wed, Feb 6, 2019
NMON nmon
Viewed 10004 times since Tue, Apr 16, 2019
AIX -- extending Logical Volumes online
Viewed 2019 times since Tue, Mar 12, 2019
DISK OPERATION ERROR in AIX
Viewed 10879 times since Thu, Feb 21, 2019
Super Grub2 Disk
Viewed 1994 times since Wed, May 22, 2019
haproxy linux
Viewed 1545 times since Sun, Dec 6, 2020
Exclude multiple files and directories with rsync
Viewed 1521 times since Wed, Oct 31, 2018