Title: Those Who Don't Learn From History
1Those Who Don't Learn From History
- Tom Bascom, President
- Greenfield Technologies
2Those who do not learn from history are doomed
to repeat it. George Santayana (1863-1952)
3Overview
- Types Of monitoring activities
- What To monitor
- How Methods for gathering and
presenting monitoring data - Why Benefits of historical data
4Types of Monitoring Activities
- Baselining
- Benchmarking
- Interactive troubleshooting
- Capacity management
- Resource Optimization
5Baselining
- Allows you to quantify changes in performance
- Apply to
- Frequently executed tasks
- Important tasks
- Time, Activity, Costs, Revenue, Resources
6Benchmarking
- Benchmarking is much like baselining but it
generally seeks to find the limits of a
configuration. - SpecINT
- TPC
- ATM
- ReadProbe
- 4glProbe
7Interactive Troubleshooting
- Confirm configuration (trust, but verify)
- What is happening NOW?
- You can prove anything with a single data
point - How is NOW different from the baseline?
8Capacity Management
- Filesystem space
- Database extent utilization
- Memory consumption
- Peak CPU utilization
- Network utilization
- IO throughput
9Resource Optimization
- Finding unbalanced resources
- Disk hot spots
- Wasted memory
- -spin
- Buffers (-B, -mmax )
10What Metrics To Monitor
- DB metrics
- OS metrics
- Application metrics
- Business metrics
11DB Metrics
- Logical IO
- CRUD
- Global and per user
- Table and index stats
- Physical IO
- Latch waits and timeouts
- Record locks and transactions
- Connections and servers
- Extent and area utilization
12OS Metrics
- CPU utilization -- usr, sys, wio
- Disk
- Free space
- Operations
- Queues and service times
- Memory budget, usage leaks
- Network bandwidth, latency
- Tables and limits nfiles, nproc, semaphores
13Application/Business Metrics
- Orders, applications taken
- Shipped orders, closed loans
- Items shipped, invoices printed, document
packages prepared - Items in inventory, users online
- Turn time, fallout ratio
14Business Metrics
- Expenses
- Revenue
- Margin
- Profit
15How To Gather Data
- Screen Scraping
- PROMON
- Scripts
- VSTs
- ProTop
- ProMonitor
- Low Level APIs
- Fathom
16PROMON
04/29/05 Activity Summary
1012 (103605) Event Total
Per Sec Event Total Per
Sec Commits 354 35.4 DB Reads
1724 172.4 Undos 0
0.0 DB Writes 50 5.0 Record
Reads 15091 1509.1 BI Reads
0 0.0 Record Updates 64 6.4 BI
Writes 12 1.2 Record Creates
183 18.3 AI Writes 7
0.7 Record Deletes 71 7.1
Checkpoints 0 0.0 Record Locks
3776 377.6 Flushed at chkpt 0
0.0 Record Waits 0 0.0 Rec Lock
Waits 0 BI Buf Waits 0 AI Buf
Waits 0 Writes by APW 100 Writes by
BIW 17 Writes by AIW 71 DB Size
26 GB BI Size 249 MB AI Size 87
MB Empty blocks1268766 Free blocks 84945
RM chain 805939 Buffer Hits 96 Active
trans 0 121 Servers, 513 Users (204 Local,
309 Remote, 10 Batch), 4 Apws
17Screen Scraping
promon DBNAME gt TMP/mon.TM ltlt - "EOF" 2gt
/dev/nullRD17p4 3 p 2 p p p 2 1 p 3 x EOF
18Screen Scraping
ls -1 sample.??.?? while read FILE do
grep -i "logical read" FILE cut -c46-56 gtgt
TMP/lr grep -i "o/s read" FILE cut
-c46-56 gtgt TMP/osr grep -i "hit ratio"
FILE cut -c11-14 gtgt TMP/hr grep -i
"commit " FILE cut -c46-56 gtgt TMP/trx
grep -i "latch " FILE cut -c46-56 gtgt
TMP/lto done paste TMP/lr TMP/osr TMP/hr
TMP/trx TMP/lto
19VSTs -- ProTop
160650 ProTop xvi -- Progress Database
Monitor 05/01/05 Sample
sports2000 /data/s2k/sports2000
Rate Hit Ratio 631 1481 Commits
2 3 Sessions 18 Miss 1.590
0.676 Latch Waits 232 234 Local
17 Hit 98.410 99.324 Tot/Mod Bufs
170 25 Remote 0 Log Reads 4465
10498 Evict Bufs 908 206 Batch
16 OS Reads 71 71 Lock Table
8192 27 Server 0 Rec Reads 1141
1062 LkHWMOldTrx 67 0003 Other 1
Log/Rec 3.9132 9.8851 Old/Curr BI 1
1 TRX 6 Area Full 1 99.12
After Image Disabled Blocked 0
Resource Waits Id Resource
Locks Waits Lock --- --------------------
---------- ---------- ------- 10 DB Buf S Lock
10497 0 100.00 6 Record Get
1061 0 100.00 7 DB
Buf Read 71 0 100.00
2 Record Lock 28 0
100.00 11 DB Buf X Lock 14
0 100.00 19 TXE Share Lock 14
0 100.00
20Low Level APIs -- Fathom
21Low Level APIs -- Fathom
22Metrics Database
- Multiple targets
- Generic metrics
- Individualized metric properties
- Arbitrary grouping of metrics
- Flexible reporting
- Graphical output
23Why? The Benefits of History
- Spend your money wisely!
- Be the first to know when something is wrong!
- Better yet know before it happens and prevent
it! - Dont be Doomed!
24Why?
- Spend your money wisely!
- Be the first to know when something is wrong!
- Better yet know before it happens and prevent
it! - Dont be Doomed!
25Effect of Changes
EMC
8k Blocks
Storage areas
conv89
26Why?
- Spend your money wisely!
- Be the first to know when something is wrong!
- Better yet know before it happens and prevent
it! - Dont be Doomed!
27Wheres the Problem?
04/29/05 Activity Summary
1012 (10 sec) Event Total
Per Sec Event Total Per
Sec Commits 354 35.4 DB Reads
1724 172.4 Undos 0
0.0 DB Writes 50 5.0 Record
Reads 15091 1509.1 BI Reads
0 0.0 Record Updates 64 6.4 BI
Writes 12 1.2 Record Creates
183 18.3 AI Writes 7
0.7 Record Deletes 71 7.1
Checkpoints 0 0.0 Record Locks
3776 377.6 Flushed at chkpt 0
0.0 Record Waits 0 0.0 Rec Lock
Waits 0 BI Buf Waits 0 AI Buf
Waits 0 Writes by APW 100 Writes by
BIW 17 Writes by AIW 71 DB Size
26 GB BI Size 249 MB AI Size 87
MB Empty blocks1268766 Free blocks 84945
RM chain 805939 Buffer Hits 96 Active
trans 0 121 Servers, 513 Users (204 Local,
309 Remote, 10 Batch), 4 Apws
28TRX Rate
Normal
29TRX Rate
Elevated
30Jump in Background TRX Rate
31Why?
- Spend your money wisely!
- Be the first to know when something is wrong!
- Better yet know before it happens and prevent
it! - Dont be Doomed!
32Business Surge
5x init volume
Hiring lags
Again!
Rates Drop!
33Statistical Rules of Thumb
- A Metric is In Control if it is within 3
standard deviations of the baseline. - Metrics that are between the 2nd 3rd standard
deviation should be viewed as warnings. - 4 consecutive samples trending away from the mean
are a warning. - 4 out of 6 on one side of the mean is a warning.
34Summary
- Obtain a baseline
- Consistently gather comprehensive data
- Business
- Application
- Database
- Operating System
- Review, analyze, and publish
35Insanity is doing the same thing over and over
again and expecting different results. Albert
Einstein (1879 - 1955)
36Questions?
- ?
- Resources
- http//www.greenfieldtech.com/exchange05.shtml