Title: Tuning a Data Warehouse According to its Usage
1Tuning a Data Warehouse According to its Usage
- Tim Gorman
- Principal
- Evergreen Database Technologies, Inc.
- http//www.evergreen-database.com
2Agenda
- Exactly what is going on in the DW?
- Alternatives and considerations
- If we knew, what could we do with it?
- Three ways to get these answers
- Summary
- Documentation and Info on the Web
- These questions are the aftermath of initial
success - Initial build of the DW involved more basic
issues these are the start of improvements and
refinements
3Exactly what is going on in the DW?
- Who are they? Should they be here?
- Dept X paid IT a lot of money to build the DW,
but I often see folks from Dept Y working in
there. - What are they doing? Why are they doing it?
- It was designed for analysis along these data
dimensions (i.e. product, city, month), but Mary
Contrary insists that it needs other dimensions
like color. - Joe Blow complained that his important job isnt
finishing after 12 hours, and he insists it used
to finish in 30 mins. - Jack Sprat wants a separate Data Mart?
- Do we need more H/W? How much? Who pays?
- It just keeps growing? Heck, Im not buying more
disk! - Should these questions be answered on supposition?
4Exactly what is going on in the DW?
- Blind people examining an elephant
- What is going on in the DW?
- End-users have one perspective
- Executives and directors have another
- Data-modelers yet another
- DBAs another
- Tech Support another
- Performance tuners (and other hacks) yet another!
- Put em all together the whole truth starts to
emerge - but its still all anecdotal meaningless to
compile anecdotes - How do we see the whole elephant?
5How can we find out what is going on?
- Alternative 1 centralize control
- Document known processes
- Reduce or eliminate ad-hoc activity
- centralize all queries through batch queues
- funnel all processing through MIS
- Alternative 2 monitor, record, and analyze
- Monitor all (or most) activity
- Record and save information on activity
- Analyze the saved information to detect trends,
anomalies - Using analysis, focus on major issues
- Let it happen, and then react
- Other alternatives?
6Considering the alternatives
- Alternative 1 centralize control
- Advantages
- Well-known requirement of mission-critical OLTP
systems - reports developed by MIS, under change control
- no (or little!) ad-hoc activity permitted
- Many tools available to support this
- job control packages and system management tools
- change control procedures and security measures
- Familiar to people trained in OLTP
- Disadvantages
- De-centralization and ad-hoc activity is often a
goal of DW! - Performance and capacity-planning info usually
not gathered - Tough to close the barn-door after the horse is
gone
7Considering the alternatives
- Alternative 2 monitor, record, and analyze
- Advantages
- Non-intrusive doesnt interfere with anything
directly - Analysis is a familiar activity in data
warehousing - why not extend this to data warehouse management?
- Answers the questions instead of side-stepping
them - Knowledge is power!
- Disadvantages
- Not easy to accept most people are trained in
OLTP - Tools just becoming available
- If ad-hoc activity is not an issue, then
everything must already be under controlperhaps?
8If we knew, what could we do with it?
- Getting the answers means controlling growth
costs - capacity planning can be based on facts
- identification and tuning makes more CPU
unnecessary - pruning unused data can delay adding storage
- archiving dormant data permits economical storage
- Aiding trouble-shooting issues end-user support
- verify/clarify end-user descriptions of problems
- identify training needs and security issues
- DW design is iterative
- re-design can be based on past trends
- end-user testimony can be inaccurate, vague, or
wrong - fact-finding part of infra-structure, not a
special project
9Three ways to find out whats going on?
- Purchase/build end-user tools which record usage
and present it for analysis - Example Oracles Discoverer
- Sample usage information using Oracles own
internal dynamic performance (V) views - Example USAGE package in PL/SQL
- Intercept, record, and analyze all (or selected)
traffic - Pine Cone Systems, Inc. Usage Tracker and Content
Tracker
10Oracles Discoverer
- End-user layer (EUL) is metadata and repository
- All queries submitted from user view are stored
in the EUL repository - Administrator view can analyze queries for
- creation of summary tables
- Discoverer can create and administer summaries
- User view can analyze submitted SQL and
automatically re-direct from detailed tables to
summary tables - Usage summary reports on who, when, frequency,
against which objects are provided
11Oracles Discoverer (contd)
Discoverer
Administrator
User
Target Database
EUL
12USAGE package
- Written in PL/SQL and uses DBMS_JOBS for
execution - periodically polls VSQLAREA (SGA Shared Pool)
- saves performance and session information
- performs EXPLAIN PLAN against saved SQL
statements - can report on worst SQL, worst users, repeated
SQL, busiest tables, more - typical poll time 5-15 mins
- this implies that it misses most things that
complete in less than the poll time - sees the heavy hitters only
13USAGE package
DBMS_JOB.SUBMIT(USAGE.POLL) (every 5 mins)
DBMS_JOB.SUBMIT(USAGE.EXPLAIN) (every 8 hours)
V dynamic performance views
USAGE tables
Canned SQLPlus queries (upon demand)
14Pine Cones Usage Tracker
- Intercepts SQLNet/Net8 traffic
- sniffs traffic for SQL statement text,
user-session information, SQL start- and
stop-time, rows processed - very low performance impact
- sniffed information is stored in a repository for
later analysis - Can provide a complete picture
- certain processes can be excluded either before
or after interception, if desired - Complete set of canned-reports
- usage summaries, can be sliced by
user/object/time - exception reports for security, performance
tuning, space management
15SQLNet
Client Process
1
1521
SQLNet Listener
Oracle instance
16SQLNet
Client Process
1
1521
SQLNet Listener
Oracle server
2
Oracle instance
17SQLNet
Client Process
1
3
1521
9900
SQLNet Listener
Oracle server
2
Oracle instance
18SQLNet
Client Process
1
3
1521
9900
4
SQLNet Listener
Oracle server
2
Oracle instance
19SQLNet
Client Process
1
5
3
1521
9900
4
SQLNet Listener
Oracle server
2
Oracle instance
20SQLNet
Client Process
6
1
5
3
1521
9900
4
SQLNet Listener
Oracle server
2
Oracle instance
21SQLNet
Client Process
1521
9900
SQLNet Listener
Oracle server
Oracle instance
22Pine Cones Usage Tracker
Client Process
1
1521
1523
Pine Cone Listener
SQLNet Listener
Oracle instance
23Pine Cones Usage Tracker
Client Process
1
1521
1523
Pine Cone Listener
SQLNet Listener
2
Oracle instance
24Pine Cones Usage Tracker
Client Process
1
1521
1523
3
Pine Cone Listener
SQLNet Listener
Oracle server
2
Oracle instance
25Pine Cones Usage Tracker
Client Process
1
4
1521
1523
9900
3
Pine Cone Listener
SQLNet Listener
Oracle server
2
Oracle instance
26Pine Cones Usage Tracker
Client Process
1
4
1521
1523
9900
3
Pine Cone Listener
SQLNet Listener
Oracle server
2
5
Oracle instance
27Pine Cones Usage Tracker
Client Process
1
4
1521
1523
9900
3
Pine Cone Listener
SQLNet Listener
Oracle server
2
5
6
Oracle instance
28Pine Cones Usage Tracker
Client Process
Pine Cone server
1
7
4
1521
1523
9900
3
Pine Cone Listener
SQLNet Listener
Oracle server
2
5
6
Oracle instance
29Pine Cones Usage Tracker
Client Process
Pine Cone server
8
1
7
4
1521
1523
9900
3
Pine Cone Listener
SQLNet Listener
Oracle server
2
5
6
Oracle instance
30Pine Cones Usage Tracker
9
9901
Client Process
Pine Cone server
8
1
7
4
1521
1523
9900
3
Pine Cone Listener
SQLNet Listener
Oracle server
2
5
6
Oracle instance
31Pine Cones Usage Tracker
9
9901
Client Process
Pine Cone server
8
1
7
10
4
1521
1523
9900
3
Pine Cone Listener
SQLNet Listener
Oracle server
2
5
6
Oracle instance
32Pine Cones Usage Tracker
9
9901
Client Process
Pine Cone server
8
1
7
10
11
4
1521
1523
9900
3
Pine Cone Listener
SQLNet Listener
Oracle server
2
5
6
Oracle instance
33Pine Cones Usage Tracker
9
9901
Client Process
12
Pine Cone server
8
1
7
10
11
4
1521
1523
9900
3
Pine Cone Listener
SQLNet Listener
Oracle server
2
5
6
Oracle instance
34Pine Cones Usage Tracker
9901
Client Process
Pine Cone server
1521
1523
9900
Pine Cone Listener
SQLNet Listener
Oracle server
Oracle instance
35Trade-offs
- End-users tools dont catch everything
- attractive and easy-to-install, configure, and
use - intrusive
- USAGE package misses all but the big stuff
- non-intrusive
- polling frequency misses little stuff, intended
for big stuff - query start/stop times not caught, lots not
captured - Network traffic-sniffing (i.e. Pine Cone) most
complete - non-intrusive
- captures whatever you want it to capture
- useful for providing metrics for many purposes
36Summary
- Recognizing the role of DWA is crucial to
improving and enhancing the data warehouse - using usage information generally done for
2nd-generation DW - no reason why it cant be built-in from the
beginning - excellent ROI both tangible (disk utilization)
and non-tangible (finding problems and fixing
them) - several approaches
- tradeoffs to each
- product with built-in EUL (i.e. Discoverer)
- home-grown polling tool (i.e. USAGE package)
- traffic-interception (i.e. Pine Cone products)
37Information on the Web
- Oracle Discoverer
- purchasing details http//www.oracle.com/
- technical details http//technet.oracle.com/
- USAGE package
- download http//www.evergreen-database.com/
- Pine Cone Systems, Inc.
- http//www.pine-cone.com/
- Other vendors/products involved in Usage
Tracking? - http//www.teleran.com/