What is OS Watcher Utility and How to use it for Database Troubleshooting ?

What is OS Watcher Utility and How to use it for Database Troubleshooting ?

 
 

Oracle OS Watcher (OSWatcher) is a tool to help Remote DBA's to trouble shoot Database performance, Cluster reboot, node eviction, DB server reboot, DB instance Crash related issues and many more.

As we know, OS stats like top, mpstat, netstat plays an important role in Database trouble shooting but there is no way to keep historical date for these stats. Here, OS Watcher is the only rescue for Database Administrator. Suppose Yesterday, There was some performance issue on Database Node but you were not aware about that and when you know that the issue was resolved itself.

Now, DBA can get Database related stats from AWR reports but not OS related stats for last day, To overcome this challenge Oracle introduce OS Watcher utility, which collects OS stats data at a frequency of five minutes and keep it for seven days (default settings). So Now, DBA need not to worry about historical OS stats.

To Trouble shoot Database performance related issues AWR, ADDM and OS Watcher logs are the first place to start for a Remote DBA. Where as for Cluster reboot, node eviction, DB server reboot Alter log files, OS Watcher and System messages (/var/log/messages) plays an important role.

How to Install OS Watcher Utility ?

1. Download tar file from Oracle Support Article "OSWatcher Black Box (Includes: [Video]) [ID 301137.1]"

2. Copy the file oswbb601.tar to the directory where oswbb is to be installed.
3. Extract tar file with “oracle” user

# tar xvf oswbb601.tar

4. Change to oswbb directory created.

5. Start OS Watcher utility using below command.

Example 1:

./startOSW.sh 60 10
This would start the tool and collect data at 60 second intervals and
log the last 10 hours of data to archive files.

Example 2:

./startOSW.sh
This would use the default values of 30, 48 and collect data at 30
second intervals and log the last 48 hours of data to archive files.

Example 3:

./startOSW.sh 20 24 gzip
This would start the tool and collect data at 20 second intervals and
log the last 24 hours of data to archive files. Each file would be
compressed by running the gzip utility after creation.

STOPPING OSW:
To stop the OSW utility execute the stopOSW.sh command. This terminates
all the processes associated with the tool.

Example:

./stopOSW.sh









The default location of OS Watcher files is /opt/oracle.oswatcher/osw/archive. To collect OS Watcher files for a particular day use below command.

# cd /opt/oracle.oswatcher/osw/archive
# find . -name '*13.03.15*' -print -exec zip /tmp/osw_`hostname`.zip {} \;

{where 13- year 03- Month 15-day}

Below are the list of sub folders created under archive folder

-bash-4.1$ ls

osw_ib_diagnostics   oswiostat            oswnetstat           oswps                oswvmstat
osw_rds_diagnostics  oswmpstat            oswprvtnet           oswtop


Troubleshooting using OS Watcher


Recently, Remote DBA face a node eviction issue in a three node Real Application Cluster environment.  To resolve this, we start looking at alter log files for DB and RAC env and OS Watcher logs. In OsWatcher Mpstat values at time of issue are given below

                 CPU   %user   %nice  %sys %iowait    %irq   %soft  %steal   %idle    intr/s
16:27:00     all    2.60    0.00    1.64   46.53    0.01    0.06    0.00   49.16   3088.40
16:27:05     all    0.44    0.00    1.50   17.50    0.01    0.05    0.00   80.50   2397.39
16:27:10     all    0.47    0.00    0.62   12.98    0.02    0.03    0.00   85.88   2361.48
16:27:15     all    1.00    0.00   14.08    5.34    0.01    0.04    0.00   79.52   2097.98
16:27:21     all    1.11    0.00   72.81   25.22    0.02    0.23    0.00    0.61   6164.79
16:27:28     all    0.73    0.00   98.59    0.56    0.02    0.10    0.00    0.00   5348.05
16:27:33     all    0.39    0.00   99.44    0.11    0.02    0.04    0.00    0.00   3578.19
16:30:02     all    0.16    0.00   96.24    2.63    0.00    0.03    0.00    0.93   1688.58
16:30:07     all    1.34    0.00    1.79   60.13    0.02    0.09    0.00   36.62   5086.03
16:30:12     all    0.99    0.00    0.49   80.87    0.03    0.07    0.00   17.56   3650.30

From the above data, this is clear that, All CUP were 100% utilized which cause system resource at crunch and system was rebooted. Now, DBA needs to look what case this high system utilization.

To troubleshoot this, DBA check top output at time of issue from OS Watcher logs in the top folder. There were around 200 Parallel process running at time. Then I cross check these process with another Top command output at good time, and it was clear that, these process was not running at good time. In conclusion, high number of parallel processes cause this issue.

So, the problem and reason is clear with the help of Oracle OS Watcher tool. This is a simple real life scenario to understand how OS watcher can help remote DBA to resolve issues. I have also mentioned steps for detailed analysis of OS Watcher logs.
0 (0)
Article Rating (No Votes)
Rate this article
Attachments
There are no attachments for this article.
Comments
There are no comments for this article. Be the first to post a comment.
Full Name
Email Address
Security Code Security Code
Related Articles RSS Feed
Backing up your VIOS configuration with viosbr.
Viewed 11538 times since Mon, May 28, 2018
LVM: Reduce root PV/VG
Viewed 5102 times since Sat, Jun 2, 2018
AIX check the HBA status
Viewed 16431 times since Tue, May 22, 2018
Install and configure GNU watch (gwatch) on AIX
Viewed 7743 times since Thu, Feb 21, 2019
Nagrywanie sesji SSH do pliku
Viewed 2851 times since Thu, May 24, 2018
AIX 6.1: Extend VG 0516-1714 extendvg 0516-1254 extendvg
Viewed 4471 times since Fri, Jul 6, 2018
Linux Cluster Tutorial
Viewed 2034 times since Sat, Sep 29, 2018
RHEL: Displaying system info (firmware, serial numbers... )
Viewed 11882 times since Sun, May 27, 2018
AIX 6/7 Script to create a file with commands to remove missing and failed paths
Viewed 3357 times since Tue, Jun 14, 2022
RHCS6: Reduce a Global Filesystem 2 (GFS2)
Viewed 3300 times since Sun, Jun 3, 2018