Tuning a Data Warehouse According to its Usage - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Tuning a Data Warehouse According to its Usage

Description:

Documentation and Info on the Web. These questions are the ... Tough to close the barn-door after the horse is gone. Evergreen Database Technologies, Inc. ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 38
Provided by: TimGo6
Category:

less

Transcript and Presenter's Notes

Title: Tuning a Data Warehouse According to its Usage


1
Tuning a Data Warehouse According to its Usage
  • Tim Gorman
  • Principal
  • Evergreen Database Technologies, Inc.
  • http//www.evergreen-database.com

2
Agenda
  • Exactly what is going on in the DW?
  • Alternatives and considerations
  • If we knew, what could we do with it?
  • Three ways to get these answers
  • Summary
  • Documentation and Info on the Web
  • These questions are the aftermath of initial
    success
  • Initial build of the DW involved more basic
    issues these are the start of improvements and
    refinements

3
Exactly what is going on in the DW?
  • Who are they? Should they be here?
  • Dept X paid IT a lot of money to build the DW,
    but I often see folks from Dept Y working in
    there.
  • What are they doing? Why are they doing it?
  • It was designed for analysis along these data
    dimensions (i.e. product, city, month), but Mary
    Contrary insists that it needs other dimensions
    like color.
  • Joe Blow complained that his important job isnt
    finishing after 12 hours, and he insists it used
    to finish in 30 mins.
  • Jack Sprat wants a separate Data Mart?
  • Do we need more H/W? How much? Who pays?
  • It just keeps growing? Heck, Im not buying more
    disk!
  • Should these questions be answered on supposition?

4
Exactly what is going on in the DW?
  • Blind people examining an elephant
  • What is going on in the DW?
  • End-users have one perspective
  • Executives and directors have another
  • Data-modelers yet another
  • DBAs another
  • Tech Support another
  • Performance tuners (and other hacks) yet another!
  • Put em all together the whole truth starts to
    emerge
  • but its still all anecdotal meaningless to
    compile anecdotes
  • How do we see the whole elephant?

5
How can we find out what is going on?
  • Alternative 1 centralize control
  • Document known processes
  • Reduce or eliminate ad-hoc activity
  • centralize all queries through batch queues
  • funnel all processing through MIS
  • Alternative 2 monitor, record, and analyze
  • Monitor all (or most) activity
  • Record and save information on activity
  • Analyze the saved information to detect trends,
    anomalies
  • Using analysis, focus on major issues
  • Let it happen, and then react
  • Other alternatives?

6
Considering the alternatives
  • Alternative 1 centralize control
  • Advantages
  • Well-known requirement of mission-critical OLTP
    systems
  • reports developed by MIS, under change control
  • no (or little!) ad-hoc activity permitted
  • Many tools available to support this
  • job control packages and system management tools
  • change control procedures and security measures
  • Familiar to people trained in OLTP
  • Disadvantages
  • De-centralization and ad-hoc activity is often a
    goal of DW!
  • Performance and capacity-planning info usually
    not gathered
  • Tough to close the barn-door after the horse is
    gone

7
Considering the alternatives
  • Alternative 2 monitor, record, and analyze
  • Advantages
  • Non-intrusive doesnt interfere with anything
    directly
  • Analysis is a familiar activity in data
    warehousing
  • why not extend this to data warehouse management?
  • Answers the questions instead of side-stepping
    them
  • Knowledge is power!
  • Disadvantages
  • Not easy to accept most people are trained in
    OLTP
  • Tools just becoming available
  • If ad-hoc activity is not an issue, then
    everything must already be under controlperhaps?

8
If we knew, what could we do with it?
  • Getting the answers means controlling growth
    costs
  • capacity planning can be based on facts
  • identification and tuning makes more CPU
    unnecessary
  • pruning unused data can delay adding storage
  • archiving dormant data permits economical storage
  • Aiding trouble-shooting issues end-user support
  • verify/clarify end-user descriptions of problems
  • identify training needs and security issues
  • DW design is iterative
  • re-design can be based on past trends
  • end-user testimony can be inaccurate, vague, or
    wrong
  • fact-finding part of infra-structure, not a
    special project

9
Three ways to find out whats going on?
  • Purchase/build end-user tools which record usage
    and present it for analysis
  • Example Oracles Discoverer
  • Sample usage information using Oracles own
    internal dynamic performance (V) views
  • Example USAGE package in PL/SQL
  • Intercept, record, and analyze all (or selected)
    traffic
  • Pine Cone Systems, Inc. Usage Tracker and Content
    Tracker

10
Oracles Discoverer
  • End-user layer (EUL) is metadata and repository
  • All queries submitted from user view are stored
    in the EUL repository
  • Administrator view can analyze queries for
  • creation of summary tables
  • Discoverer can create and administer summaries
  • User view can analyze submitted SQL and
    automatically re-direct from detailed tables to
    summary tables
  • Usage summary reports on who, when, frequency,
    against which objects are provided

11
Oracles Discoverer (contd)
Discoverer
Administrator
User
Target Database
EUL
12
USAGE package
  • Written in PL/SQL and uses DBMS_JOBS for
    execution
  • periodically polls VSQLAREA (SGA Shared Pool)
  • saves performance and session information
  • performs EXPLAIN PLAN against saved SQL
    statements
  • can report on worst SQL, worst users, repeated
    SQL, busiest tables, more
  • typical poll time 5-15 mins
  • this implies that it misses most things that
    complete in less than the poll time
  • sees the heavy hitters only

13
USAGE package
DBMS_JOB.SUBMIT(USAGE.POLL) (every 5 mins)
DBMS_JOB.SUBMIT(USAGE.EXPLAIN) (every 8 hours)
V dynamic performance views
USAGE tables
Canned SQLPlus queries (upon demand)
14
Pine Cones Usage Tracker
  • Intercepts SQLNet/Net8 traffic
  • sniffs traffic for SQL statement text,
    user-session information, SQL start- and
    stop-time, rows processed
  • very low performance impact
  • sniffed information is stored in a repository for
    later analysis
  • Can provide a complete picture
  • certain processes can be excluded either before
    or after interception, if desired
  • Complete set of canned-reports
  • usage summaries, can be sliced by
    user/object/time
  • exception reports for security, performance
    tuning, space management

15
SQLNet
Client Process
1
1521
SQLNet Listener
Oracle instance
16
SQLNet
Client Process
1
1521
SQLNet Listener
Oracle server
2
Oracle instance
17
SQLNet
Client Process
1
3
1521
9900
SQLNet Listener
Oracle server
2
Oracle instance
18
SQLNet
Client Process
1
3
1521
9900
4
SQLNet Listener
Oracle server
2
Oracle instance
19
SQLNet
Client Process
1
5
3
1521
9900
4
SQLNet Listener
Oracle server
2
Oracle instance
20
SQLNet
Client Process
6
1
5
3
1521
9900
4
SQLNet Listener
Oracle server
2
Oracle instance
21
SQLNet
Client Process
1521
9900
SQLNet Listener
Oracle server
Oracle instance
22
Pine Cones Usage Tracker
Client Process
1
1521
1523
Pine Cone Listener
SQLNet Listener
Oracle instance
23
Pine Cones Usage Tracker
Client Process
1
1521
1523
Pine Cone Listener
SQLNet Listener
2
Oracle instance
24
Pine Cones Usage Tracker
Client Process
1
1521
1523
3
Pine Cone Listener
SQLNet Listener
Oracle server
2
Oracle instance
25
Pine Cones Usage Tracker
Client Process
1
4
1521
1523
9900
3
Pine Cone Listener
SQLNet Listener
Oracle server
2
Oracle instance
26
Pine Cones Usage Tracker
Client Process
1
4
1521
1523
9900
3
Pine Cone Listener
SQLNet Listener
Oracle server
2
5
Oracle instance
27
Pine Cones Usage Tracker
Client Process
1
4
1521
1523
9900
3
Pine Cone Listener
SQLNet Listener
Oracle server
2
5
6
Oracle instance
28
Pine Cones Usage Tracker
Client Process
Pine Cone server
1
7
4
1521
1523
9900
3
Pine Cone Listener
SQLNet Listener
Oracle server
2
5
6
Oracle instance
29
Pine Cones Usage Tracker
Client Process
Pine Cone server
8
1
7
4
1521
1523
9900
3
Pine Cone Listener
SQLNet Listener
Oracle server
2
5
6
Oracle instance
30
Pine Cones Usage Tracker
9
9901
Client Process
Pine Cone server
8
1
7
4
1521
1523
9900
3
Pine Cone Listener
SQLNet Listener
Oracle server
2
5
6
Oracle instance
31
Pine Cones Usage Tracker
9
9901
Client Process
Pine Cone server
8
1
7
10
4
1521
1523
9900
3
Pine Cone Listener
SQLNet Listener
Oracle server
2
5
6
Oracle instance
32
Pine Cones Usage Tracker
9
9901
Client Process
Pine Cone server
8
1
7
10
11
4
1521
1523
9900
3
Pine Cone Listener
SQLNet Listener
Oracle server
2
5
6
Oracle instance
33
Pine Cones Usage Tracker
9
9901
Client Process
12
Pine Cone server
8
1
7
10
11
4
1521
1523
9900
3
Pine Cone Listener
SQLNet Listener
Oracle server
2
5
6
Oracle instance
34
Pine Cones Usage Tracker
9901
Client Process
Pine Cone server
1521
1523
9900
Pine Cone Listener
SQLNet Listener
Oracle server
Oracle instance
35
Trade-offs
  • End-users tools dont catch everything
  • attractive and easy-to-install, configure, and
    use
  • intrusive
  • USAGE package misses all but the big stuff
  • non-intrusive
  • polling frequency misses little stuff, intended
    for big stuff
  • query start/stop times not caught, lots not
    captured
  • Network traffic-sniffing (i.e. Pine Cone) most
    complete
  • non-intrusive
  • captures whatever you want it to capture
  • useful for providing metrics for many purposes

36
Summary
  • Recognizing the role of DWA is crucial to
    improving and enhancing the data warehouse
  • using usage information generally done for
    2nd-generation DW
  • no reason why it cant be built-in from the
    beginning
  • excellent ROI both tangible (disk utilization)
    and non-tangible (finding problems and fixing
    them)
  • several approaches
  • tradeoffs to each
  • product with built-in EUL (i.e. Discoverer)
  • home-grown polling tool (i.e. USAGE package)
  • traffic-interception (i.e. Pine Cone products)

37
Information on the Web
  • Oracle Discoverer
  • purchasing details http//www.oracle.com/
  • technical details http//technet.oracle.com/
  • USAGE package
  • download http//www.evergreen-database.com/
  • Pine Cone Systems, Inc.
  • http//www.pine-cone.com/
  • Other vendors/products involved in Usage
    Tracking?
  • http//www.teleran.com/
Write a Comment
User Comments (0)
About PowerShow.com