Title: Storage Operations Management Work Plan
11
22
33
44
55
66
77
88
99
1010
1111
12Reduce Data Sprawl
13Add Intelligence
14Using Graphics to Tune Oracle
- Saturday, October 10, 2009
- Kyle Hailey
- Kyle.Hailey_at_oracle.com
15 Overview
16Target Audience Production DBA
- A Developer would have a different view
17Performance Anxiety
- When the manager is asking why is the
- Response is slow
- Database is hung
- Batch job behind schedule
- How do you respond?
18Respond Quickly and Confidently
- Like the Doctor
- Test collect Data
- Diagnosis - assessment
- Cure fix problems
19A New Approach to Oracle Tuning
- Old OEM Design Approach
- New OEM Design Approach
- Macintosh Design Approach
- Result of Good Design
- Our Goal
20Old Design Method
- List All Tuning data
- Create screens for all data
- Verify that all data is displayed
- gt Overwhelming and confusing
21OEMs New Design Method
- Start with Typical Problems
- Identify Solutions
- Design a Visual Representation
- gt Put intelligence into the interface
22Macintosh Method
- What do people do
- How do they do it
- Design to support it
- gt can be used without reading a manual
23Without Reading a Manual
- Handsprings site crashed Nov 25, 1999
- Biggest Sales Day of the Year
- Library cache latch contention
- No DBAs
- Downloaded Quests Spotlight
- Installed and Identified problem with Minutes
- Solution in code fix
24Goal
- Make Oracle Tuning easy for Anyone
25 Manageability
Obstacles and Approaches
26What Data to Collect ?
- On NT 9iR2
- Oracle Tuning Data in 9iR2 NT
- Stats 248
- Waits 361
- Latches 239
- Parameters 257
- Hidden Parameters 540
- Total 1645
- As well as
- 259 V tables
- 394 X tables
- And machine data
27Statspack Collects the Data
- Standard for Data Collection
- Selects from 44 x, v and dba tables and views
- 1000-1500 lines output
28But, How to Analyze the Statspack Data?
- One page Summary
- Waits are Emphasized to Guide Analysis
- Despite that, Data can be confusing to Analyze
- Oracle Press book is 600 pages
- Users still get led astray
- Recent mail
- IO waits dominate
- one SQL was 99 of the IO
- still concerned about parsing.
29Origin of Problem Tuning Rules Lists
- Originated from Oracle Version 6
- Calculate ratios to deduce bottlenecks
- Often lead down the wrong path
- Result tuning inconsequential aspects
30Wait Interface The Guide
- Introduced in Version 7
- Pinpoints performance Bottlenecks
- Used by Statspack
- Needs a clearer presentation than Statspack
31Silver Bullet Myth
- _go_fastertrue
- Coping with overwhelming amount of data
- Oracle Expert type products
- Not All Solutions are Known
32Combine Analysis with Diagnostics
- Known Solutions
- Diagnostics Methods
- Powerful Graphic Interface
- Flexibility to Grow with Changes
-
33 Graphics
the Advantage
34Why Use Graphics
-
- You can't imagine how many times I was told that
nobody wanted or would use graphics now
everybody uses them. - -- Jef Raskin, the creator of the Macintosh
35Macintosh Paradigm
- Introduced Graphical Windows
- Introduced the use of the mouse
- Revolutionized the Interface to the computer
- First computer I used without Reading the Manual
- Manageable computer
36Oracle and Manageability
- Microsoft's perceived ease of use
- Competition Brings Innovation
- Oracle is now tackling manageability
- BMC, Quest, Precise improve the manageability
perception of Oracle
37Why Use Graphics in Oracle Tuning
-
- The human nervous system is exceptionally
good at parsing visual information, especially
when that information is coded by color and/or
motion. - Knowledge representation in cognitive
science. Westbury, C. Wilensky, U. (1998)
38Why Use Graphics in Oracle Tuning
- Increase comprehension of Data by
- Correlating
- Increasing data Density
- Providing known Solutions
- Guiding Investigation
39Parsability of Graphics
-
- Infocus (overhead projectors) sited a study
that humans can parse graphical information
400,000 times faster than textual data
40Trends What are the Trends?
41Trends With Graphics
I II
III IV
42Trends Waits Over Time
- date 1713 1713 1714 1714 1715 1715 1716 17
16 1717 1717 1718 1718 1719 1719 1720 1720
1721 1721 1722 1722 1723 1723 1724 1724 1
725 1725 1726 1726 1727 1727 1728 1728 17
29 1729 1730 1730 1731 1731 1732 1732 1733
1733 1734 1734 1735 1735 1736 1736 1737 1
737 1738 1738 1739 1739 1740 1740 1741 17
41 1742 - log_file_switch_completion_3 0 0 0 0 2982936 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 237642 0 0 0 0 0 0 0 0 0 0 0 0 210041
9 0 0 0 0 - log_buffer_space_3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 6478221 0 14945538 5219836
0 0 - QLNet_break/reset_to_client_3 8501199 11385539 70
44359 8234690 6183237 2333297 7776659 6118198 6495
648 7518489 11455326 7702865 11085762 7634300 1058
2833 13260400 8544422 13172524 8502041 12010958 92
08533 9384118 10208957 7399676 6747457 6231611 134
41429 7344615 8761884 6642950 7256315 7073623 8442
915 7060399 7485662 6601072 6349018 9929440 490223
0 6079359 5751021 9455790 9087866 9341209 7664906
16499162 10234535 10588829 8638014 8918795 4312962
8056957 4944408 7195059 3332565 3154348 5530740 4
965315 7011341 5248675 6846116 6740378 5817539 901
2238 7464656 9055304 8781313 9115954 9858761 - latch_free_3 4306495 1931327 3327222 2181023 17640
23 3238736 1637877 1922399 1786672 2312507 2308379
3 1091835 1862666 6949805 6453080 781313 2767384 1
164596 2249654 440342 2076414 4062214 2408959 2886
800 2177722 3332627 1237003 2098221 5986478 828048
2570183 2104943 2686288 2259440 2388341 2044871 1
462208 2306024 1635624 2132461 783102 2677123 4743
636 1292606 1499421 2439544 2753717 1905407 301793
3 1651176 3686294 1088209 922402 2319624 2645539 1
260079 4316997 1311242 2087917 - rdbms_ipc_reply_3 27920139 31888722 29909917 27903
331 27939736 29891410 29901186 29953756 27895419 2
9879828 28086221 29906822 27903067 29915873 298963
07 31878918 28271932 29883910 29885870 29906113 27
933209 27794807 29879997 28842040 30138759 1210295
4 0 0 0 0 0 0 0 0 0 0 0 4681562 0 0 1018 22981 0 0
0 383 0 0 0 0 0 0 0 0 0 0 0 3079 0 700 0 604 0 0
42240 0 0 0 508 2702 - log_file_sync_3 4897094 5984818 8773859 6732708 25
28699 21640100 4667088 9349937 6102384 5648418 162
38557 9202562 4214425 12911555 8677281 1982443 554
9138 2978201 4361914 2924674 3969540 4608851 44990
43 5563593 7480457 8736114 4087525 7916623 9531711
4204114 4002465 4230277 4795701 6324440 4740469 5
123660 2891044 8931372 4254477 6177218 5119836 470
1245 9094181 6751040 6969569 8678671 10627609 8452
116 13505152 9052070 9199762 6939591 39153732 1897
4176 71305938 25005388 6781095 3927165 enqueue_3 3
9067 87598 58429 93754 38807 270867 64623 76158 12
5199 39092426 73909741 82618 78157 12192579 300841
39 30154100 30426042 30034564 30058775 30062114 27
062239 27585615 30040241 23741759 139008 75112 467
78 78246 30794 93934 850847 83964 91959 572765 189
38 214037 112043 93392 513358 80052 121984 148309
1683718 73709 33330 73894 476811 2072430 34138 402
73 88209 512731 460956 492492 2979781 50846 80892
84755 248836 - db_file_sequential_read_3 107039051 109566587 1270
08942 97033615 74615747 70728654 109231682 9375567
6 86966703 117638046 76065788 88137834 66759488 89
709570 80277614 50762622 58207938 61815450 6561058
0 51599402 74260828 78138965 49867005 21069074 176
80159 11460495 12467891 33786092 33211896 40736046
49212320 41162084 45786601 24122201 55651100 3431
6957 11773008 62455231 48897870 67248119 64077462
88899277 89622175 34483526 57402401 35319735 30619
562 61707668 34514340 39504714 37681062 34165368 2
4952296 30172023 25510144 40260331 83215876 784503
52 56963813 33846491 28869234 52964974 38161443 43
878778 31095295 26018747 58239638 19793917 7914975
- itch_(checkpoint_incomplete)_3 0 0 0 0 285950959 2
47915390 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
43Trends Waits over Time
44Quantity Map Alone is 20,000 points
45Quantity Extent Mapping
46Quantity Extent Mapping with Mouse
47Quantity Seeing Patterns Extent Anomalies
48Quantity Zoom Out, Over 8000 data pts
49Correlating Data
50PerspectiveTufte
51PerspectiveTufte
52Perspective Oracle
53Perspective Oracle
54Clutter
55Clear Concise
56Tuning Lists are Like Full Table Scans
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6
Rule 7
Rule 8
Rule 9
Rule 10
Rule 11
57Use an Index on the Tuning Rules
58Guiding User Shallow Index
A GUI can prune the decision by highlighting
and/or guiding the user (like a shallow index
tree)
59Keeping Context - Schema
A GUI can keep context giving keeping making the
tuning process seem less like Labyrinth
60Keeping Contexkt - Example
61Extending Visual Metaphors
- Graphics can increase the density and clarity
62Buffer Cache Advisor Example
63Ex Buffer Cache with Graph
64Using Visual Metaphors
65Larger Contention Area
66Providing Interactivity Simulations
67 Attempts at Graphics
68First Try 1994 Perot/Europcar
- Goal no interaction needed
- Visible across the room
- Used statistics shown in dials and bar graphs
- Used TCL/TK
69Second try at a Graphical Monitor
- TCL/TK front end that communicated over TCP/IP
with daemon attached to the SGA and reading out
Oracle statistics data - Latch Vlatch
- Library Cache Vlibrary Cache
- Statistics vsysstat
- Waits vsystem_event
- Rollback Segments vrollstat
- Row Cache vrowcache
70Data
- Values are Cumulative
- Necessitates snapshots
- Deltas need to be calculated between snapshots
- Normalizing Data needed to compare any interval
71Monitor Everything Approach
72Wait Events Became the Focus
73Improvements
- Scale Graphs Equally
- Eliminate
- Background waits
- Idle Waits
- Extraneous waits
- Wait counts
74With Improvements
75160K Log Buffer
7650M Log Files
772000 Block_buffers
78Log Files on Raw
794000 Block Buffers
808000 Block Buffers
81Wait vs CPU over tuning tests
http//oraperf.sourceforge.net/seminar/examples.ht
ml
82Quest Paradigm stacking Waits
83GUI Buffer Busy Waits
84GUI Log File Switch
85GUI Log Buffer Space
86GUI Latch and Log Sync
87GUI All Tests 8.1.7
88What do CPU and Wait Mean ?
89Service Time
90Service Time over 5 sec Snapshots
91Two Users Sharing the CPU
92Two Users Combining the Service Times
93Displaying CPU and Wait Time
94Wait Bound System
95CPU Bound System
96Resulting Graphic Interface
97TOP Triangle of Oracle Performance
HOST
System
ORACLE
System
SQL
Users
Sessions
SQL
98Methodology
- 1) Machine running correctly
- Run Q and Paging
- 2) Oracle instance correctly configure
- Compare Waits to CPU
- 3) Drill down to SQL and/or Sessions
- Root of all problems
- Possible group by Sessions, Modules, Actions,
Client id, Service or Machine
99Machine Statistics
- Which Machine Statistics to use?
- CPU
- User
- system
- Memory
- Size
- Free
- Swap
- IO Wait
- Network
1001 Machine Statistics Core
- Only need
- Paging
- Run Queue
- Paging and Run Queue can cause latch free and
unreliable CPU statistics in Oracle and are not
directly detectible in Oracle statistics. Other
Machine statistics such as Network and I/O
problems can be seen directly in Oracle
statistics.
1012 Oracle System Level
- Compare WAIT time to CPU
- If CPU bound look at CPU consumers
- IF wait bound drilldown to data relevant to the
highest wait type
1023 Oracle Drill Down SQL
- SQL is the source of All problems
- Aggregate SQL usage by Session
- Possibly Aggregate by
- Machine
- Module
- Action
- Client ID
- Service
103Putting it Together Graphically
1
2
3
1041) Host Information
- Run Queue
- top sql ( and top user etc)
- Paging
- PGA Advisor
- Shared Pool Advisor
- Buffer Cache Advisor
- Else improve Hardware
1052) Oracle System View
1063) Oracle Correlation Data
107 Sampling
108Snapshot Method has Limitations
- Performance Page populated by Snapshoting
- Snapshoting is prohibitively expensive for the
drill down data
109Snapshots and Drilldown Data
- Top SQL by CPU
- Expensive
- Top SQL by Wait
- Doesnt exist
- Top Session by CPU
- Feasible
- Top Session by wait
- Expensive
- Top Object
- Expensive (could be thousands)
110Sampling Solution for Collecting Data
- Based on VSESSION_WAIT
- Allows easy Aggregation by
- Session
- SQL
- Machine
- Module
- Action
- Client Id
- Identifies SQL, Users and Objects to track
111Sample Data Example
112Sampled Data Visually
113Sampled Data for Top SQL
114 Developing a Drilldown
User I/O Example
115Determine I/O Problems
- Problem SQL
- Hot File or Disk
- Hot Objects due to Sparseness or Row Chaining
- Undersized Buffer Cache
116Determine Solutions
- Tune SQL
- Redistribute I/O load or change I/O subsystem
- Rebuild indexes or Tables
- Increase Buffer Cache size
117Design Visual Interface
- First, Orient the Viewer , give a context
118Generic Drilldown for Waits
119I/O Specific Drilldowns File Response
120I/O Specific Drilldowns Hot Objects
121I/O Specific Drilldowns Buffer Cache
122 Story Board
123IO Wait Problem
124IO Look at Wait Groups
125IO Click on Problem Group
126IO Drill Down Problem SQL
127IO Drill Down Problem SQL FlyOvers
128IO Drill Down Buffer Cache Size
129IO Drill Down Problem Disks
130CPU Bound
131CPU Bound SQL?
132CPU Bound Users
133CPU Drill Down on User