Title: Oracle Grid Computing: Trending for Capacity Planning
1Oracle Grid Computing Trending for Capacity
Planning
Ashish RegeSEISession S307772
2What DBAs, SYS admins need
- Database Administrators, Sys Administrators are
called on everyday to make decisions on capacity
planning. - Which database has trended high for CPU, Memory,
I/O etc. over the last year, month, week, day
versus other databases on the same server? - Do these trends have some unique cyclical
patterns based on year, month, week, day for
different applications? - Do we see new functionality, additional users,
increased concurrency with time? - Do we have the capacity to add one or more
database to existing servers? - Where do we re-allocate databases at the next
opportunity and where do we see capacity? - Do we need to buy new servers and factor this
into next years budget?
3Approach and Toolset
- Top down and Bottom up approach edge at the
this sideways too .. - SA-DBA-Middleware cross functional analysis
- Toolset that can help trend CPU/Memory/Elapsed
Time - OS extended stats
- OEM repository DB/Host performance metrics
- Oracle Services stats
- App-DB-Integration-elapsed time/time-out metrics
- Apache Logs fact-dimensional model
- Batch Job-stream runtime metrics
- Business measures e.g. positions, txn counts, tax
lots, fees etc - Fill in the gaps with data points to construct a
complete picture
4Requirements summary
- Database administrators and sys administrators
need - consolidated data points to provide holistic
- "Capacity Trend Analysis"
- from an intraday to multiyear via a web
interface for the databases and their host
servers from CPU, Memory, I/O, wait bottlenecks,
throughput, and efficiencies perspective. - Stack up trends for different database per server
on the same graph to enable side-by-side
comparison and a better awareness of percentage
usage by database/application and its alignment
with the business cycle. - Help trend uptick or downward spiral of business
measures and find correlation of those with
database and server statistics to model what if
analysis - end-to-end trending for
- UI response times
- Batch jobs
- Processed Business measures
5Now to the details
- OEM grid captures at a 15 minutes interval, write
PERL scripts to transfer this to a trending
utility that graphs those on the web with
reporting - At the grain of intraday to multiyear for
performance counters like consumption of CPU,
memory, I/O, throughput, efficiency, wait stats
etc. for the different databases. - Capture metadata into flat files
- (1) Details per database
- (2) Comparative numbers across different
databases on the same or different servers - (3) Supplement the above with details from stats
pack
6Putting it all together
7Reporting Classifications ..
- Thus the reporting that comes out of the OEM is
at two levels - Database statistics
- Host statistics, this also includes side-by-side
Oracle comparison for databases on that host - The above statistics have associated flat files
extracted out of OEM to feed this data to
Orca/RRDTool to generate the graphs/trend. - Have found these graphs very useful in
- Re-alignment decisions
- Budget discussions around shared infrastructure
resources for different cost centers
8Orca/RRD Tool
- Orca/RRD Tool is a tool useful for plotting
arbitrary data from text files onto a directory
on a Web server. It has the following features - files into the same or different plots. Creates
an HTML tree of HTML and image (PNG or GIF)
files. - Creates an index of URL links listing all
available targets. - Creates an index of URL links listing all
different plot types. - No separate CGI set up required.
- Can be run under cron or it can sleep itself
waiting for file updates based on when the file
was last updated. - Configuration file based.
- Reads arbitrarily formatted text or binary data
files. - Watches data files for updates and sleeps between
reads. - Finds new files at specified times.
- Remembers the last modification times for files
so they do not have to be reread continuously. - Allows arbitrary grouping of data from different
sources - Allows arbitrary math performed on data read from
one file
9Orca architecture
- Out of the box Orca statistics for the host
- On clients, orcallator.se, a component of the SE
Toolkit, collects data every 5 minutes and dumps
the data in orca's home directory. Orcallator.se
has a startup script in /etc/init.d. This data is
dumped in /home/orca/data/lthostnamegt - A crontab entry for the orca user polls each
client every 5 minutes, grabs the current data,
and prunes old data. This is done via an SCP from
orca's crontab on the server, and is driven by a
list of hosts on the orca server at
/apps/orca/xfer/hostlist. - Â scp -r -p -v orca_at_host/home/orca/data/
/apps/orca/orcallator - Orca's crontab on the orca server calls
/apps/orca/xfer/pull.sh which processes all files
in /apps/orca/xfer/hostlist
10Out of box OS performance Statistics
- The Orca server app which simply checks a
directory tree (/apps/orca/orcallator) every five
minutes and graphs any new data which has
appeared there - Then, as orca on the orca server does an ssh to
the new client ssh ltclientgt" and when prompted,
adds to known hosts, unless this is done for each
client added, no data transfer can succeed. - Then, add the client to the file
/apps/orca/xfer/hostlist. This file contains a
list of all clients and is used by the script
'pull.sh', driven by orca's crontab, included
here. Orca's crontab also prunes some files on
each client. Clients are completely passive,
orcallator.se dumps performance data in orca's
home directory on each client, the server
collects it, processes it. - Orca produces HTML files, along with
"index.html". A standard Apache server is
required to publish the pages. - The url for viewing orca is http//ltorac server
IPgt/orca/
11OEM Custom statistics
- Extract performance data out of OEM repository
into flat files, via a custom script running at
an hourly frequency. These flat files are dropped
into the same directory which Orca checks and
processes data from to plot graphs
(/apps/orca/orcallator). Thus data is generated
right off the Orca server connecting remotely to
the OEM repository. - Thus any data can be plotted including business
measures. It is that simple - Using the DBIOracle library to query the OEM
GRID Repository joining tables like
sysman.MGMT_TARGETS, sysman.MGMT_METRICS, sysman.
MGMT_METRICS _1HOUR extracting data out of into
files. - A timestamp is written to a snapfile that tracks
time intervals already queried and reported on
the next run is based on querying the sysman
repository table to find snapshots with
rollup_timestamp greater than the last run - e.g. 2008-11-12_190000
12Quarterly Instance Efficiency
13Comparative Quarterly Instance CPU Utilization
The sample compares Waits User CPU, User I/O,
Other between different databases GWMPE02, E2L1,
E2L2, EC02, ER02 across the DB server
seidevdb34 Â E02 is the max consumer, at times
taking 50 or more of CPU compare to other
databases which have negligible CPU consumption.
14SNAPFILE
Snapfile helps to track last extraction time of
performance data. Â pwd /export/home/oracle/orca_or
acle/snapfile  oracle_at_seieaas52z1 tail -f
ltDBNAMEgt.seic.com_Oem_snapid 2009-03-10_120000 2
009-03-10_140000 2009-03-10_160000 2009-03-10_
180000 2009-03-10_200000 Â oracle_at_seieaas52z1
tail -f ltSERVERNAMEgt_OemHost_snapid 2009-03-10_12
0000 2009-03-10_140000 2009-03-10_160000 2009
-03-10_180000 2009-03-10_200000 Â
15OEM SQL QUERY HOST DB PERFORMANCE DATA
Part 1 Decide on beginning and ending
snapshots OEM repository query "select
min(to_char (d.rollup_timestamp
,'YYYY-MM-DD_HH24MISS')) FROM sysman.mgmt_target
s tgt , sysman.mgmt_metrics met ,
sysman.mgmt_metrics_1hour d WHERE lower(tgt.target
_name) lower(?) AND tgt.target_type
'oracle_database' AND tgt.target_guid
d.target_guid AND met.metric_guid
d.metric_guid AND d.rollup_timestamp gt
to_date(?,'YYYY-MM-DD_HH24MISS') " Â
16OEM SQL QUERY HOST DB PERFORMANCE DATA
Part 2 Get the latest snapshot and it's
time "select min(to_char ( d.rollup_timestamp
,'YYYY-MM-DD_HH24MISS' )) ,max(to_char (
d.rollup_timestamp ,'YYYY-MM-DD_HH24MISS'
)) FROM sysman.mgmt_targets tgt ,
sysman.mgmt_metrics met , sysman.mgmt_metrics_1hou
r d WHERE lower(tgt.target_name) lower(?) AND
tgt.target_type 'oracle_database' AND
tgt.target_guid d.target_guid AND
met.metric_guid d.metric_guid "
17OEM SQL QUERY HOST DB PERFORMANCE DATA
Part 3 Collect the raw values from OEM for DB
stats  "SELECT DISTINCT met.metric_column ,d.key_v
alue ,d.value_average FROM sysman.mgmt_targets
tgt , sysman.mgmt_metrics met ,
sysman.mgmt_metrics_1hour d WHERE lower(tgt.target
_name) lower(?) AND tgt.target_type
'oracle_database' AND tgt.target_guid
d.target_guid AND met.metric_guid
d.metric_guid AND d.rollup_timestamp
to_date(?,'YYYY-MM-DD_HH24MISS') ORDER BY
d.rollup_timestamp,met.metric_column"
18OEM SQL QUERY HOST DB PERFORMANCE DATA
Part 4 Collect the raw values from OEM for
HOSTS stats  "SELECT DISTINCT met.metric_column ,d
.key_value ,d.value_average . ..FROM
sysman.mgmt_targets tgt, sysman.mgmt_metrics met,
sysman.mgmt_metrics_1hour d WHERE
lower(tgt.target_name) lower(?) AND
tgt.target_type 'host AND tgt.target_guid
d.target_guid AND met.metric_guid d.metric_guid
AND met.metric_column IN ('cpuLoad,'cpuLoad_15min
,'cpuLoad_1min,'longestServ,'cpuIOWait,'cpuKer
nel' ,'cpuUser,'cpuUtil,'memUsedPct,'memfreePct
,'swapUtil,'noOfProcs,'noOfUsers,'totIO,'pgSc
anRate) AND d.rollup_timestamp
to_date(?,'YYYY-MM-DD_HH24MISS')
19Orcallator.cfg OEM group
group oem find_files
/apps/orca/oem/(.)/(?oracle)-\d4-\d2-\d2(?
-\d3,)?(?\.(?Zgzbz2))? column_description
first_line date_source
column_name timestamp interval
3600 filename_compare sub
my (ay, am, ad) a
/-(\d4)-(\d\d)-(\d\d)/
my (by, bm, bd) b /-(\d4)-(\d\d)-(\d\
d)/ if (my c (( ay
ltgt by)
( am ltgt bm)
((ad gtgt 3) ltgt (bd gtgt
3)))) return 2c
ad ltgt bd Â
20Orcallator.cfg SERVER grouping
group ltSERVER1gt find_files
/apps/orca/oemhost/( SERVER1)/(?(?
SERVER1)(?percol))-\d4-\d2-\d2-\d3 colum
n_description first_line date_source
column_name timestamp interval
3600 filename_compare sub
my (ay, am, ad) a
/-(\d4)-(\d\d)-(\d\d)/
my (by, bm, bd) b /-(\d4)-(\d\d)-(\d\
d)/ if (my c (( ay
ltgt by)
( am ltgt bm)
((ad gtgt 3) ltgt (bd gtgt
3)))) return 2c
ad ltgt bd
21Services trending
Service performance in VSERVICE_STATS VSERVICE
_EVENT VSERVICE_WAIT_CLASS VSERVICEMETRIC VSERV
ICEMETRIC_HISTORY
22Items Learned in this Session
- Database administrators and sys administrators
need consolidated data points to provide holistic
"Capacity analysis" for the databases and their
host servers from CPU, Memory, I/O, wait
bottlenecks, throughput, and efficiencies
perspective. - This presentation outlined a methodology that
helps with "Capacity Trend Analysis" from an
intraday to multiyear via a web interface the
key theme here is the ability to stack up trends
for different databases per server on the same
graph to enable side-by-side comparison and a
better awareness of percentage usage by
database/application and its alignment with the
business cycle.