Welcome to the Workshop on Query Optimization PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Welcome to the Workshop on Query Optimization


1
Welcome to the
Workshop on Query Optimization
  • I. I. T. Bombay
  • Jointly organized with and sponsored by
    PSPL Ltd, Pune
  • October 1 and 2, 1999

2
Workshop Overview
  • Goal
  • tutorial on research in query optimization
  • foster RD in query optimization in India
  • Programme
  • 5 research sessions
  • 2 discussion sessions
  • free time after dinner on Friday for discussions
  • lunch and dinner
  • sponsored by PSPL

3
Workshop Highlights
  • Extensions of query optimization
  • parametric query optimization, dynamic reordering
    of query plans, ..
  • Multi-query optimization
  • Caching
  • Semi-structured data and information retrieval
  • including XML
  • Directories and databases
  • Discussion sessions VLDB99 update, and Future
    Directions

4
Introductions.
  • Please introduce yourself (in less than 10
    seconds!)

5
A Brief History of Query
Optimization
  • S. Sudarshan
  • CSE Dept, IIT Bombay

6
The Dark Ages ...
  • Till late 70s
  • Before relational databases
  • Dark ages, all optimization by hand
  • Needed sorcerers and wizards (the human kind, not
    the Microsoft kind!)

7
Let there be light ...
  • Late 70s to mid 80s
  • SQL declarative syntax, optimization is job of
    system
  • System R
  • Seminal work on join order optimization, set
    stage for all later query optimization
  • Distributed database query optimization (System
    R)

8
Why Query Optimization?
  • SQL is declarative
  • Up to system to decide how to execute query
  • Disk access was and is slow compared to memory
    access
  • Join order etc. can have major impact on speed
  • e.g. query ran for days on MS SQL-Server 6.5 due
    to bad optimization decision, half an hour with
    good decision
  • Different from optimization in other domains
  • e.g. linear programming, compiler optimizations

9
System R
  • Join order selection
  • A1 A2 A3 .. An
  • Left deep join trees
  • Dynamic programming
  • Best plan computed for each subset of relations
  • Best plan (A1, .., An) min cost plan of(
    Best plan(A2, .., An) A1
    Best plan(A1, A3, .., An)
    A2 .
    Best plan(A1, .., An-1)) An

An
Ai
10
System R (cont)
  • Selects and projects pushed down to lowest
    possible place
  • Sort order
  • join may be cheaper if inputs are sorted on join
    attributes
  • gt Best plan(set-of-relations, sort-order)
  • Heuristic handling of other SQL features
  • views, nested queries, aggregates, selects,
    unions, ...
  • Many other databases used heuristic optimizers

11
Brief History (Contd)
  • Mid 80s to early 90s
  • Extensible query optimization
  • Exodus, Volcano
  • New data models
  • Nested relational model
  • Object oriented data model
  • Foreign functions
  • Recursive queries / deductive databases

12
Volcano Extensible Query Optimizer Generator
  • General purpose cost based query optimizer, based
    on equivalence rules on algebras
  • e.g. equivalences join associativity, select
    push down, aggregate push down, etc
  • extensible new operations and equivalences can
    be easily added
  • notion of physical properties generalizes
    interesting sort order idea of System R
  • Developed by Graefe and McKenna 1993

13
Key Ideas in Volcano
  • DAG representation of query
  • Equivalence node and operation nodes
  • Compactly represents set of all evaluation plans
  • choose one child of each equivalence node, and
    all children of operation nodes

14
Main Benefits of Volcano
  • Extensible
  • can handle arbitrary algebraic expressions
  • new operators and equivalence rules easy to add
  • must be careful of search space though
  • Yet (reasonably) efficient
  • generalizes the dynamic programming idea of
    System-R optimizer
  • Optimizations of Pellenkroft et al. VLDB 97
    eliminate redundant derivations for joins
  • Used in MS SQL Server and Tandem

15
Optimization in OODB/ORDBs
  • Major issues
  • Path expressions
  • e.g. forall ( p in person) print
    (p-gtspouse-gtname)
  • can convert pointer dereferences to joins
  • can assemble objects in a clever sequence to
    minimize I/O (Graefe 93, Blakeley et al, Open
    OODB optimizer 95)
  • Path indices
  • e.g. forall (p in person suchthat
    p-gtspouse-gtname Rabri)

16
Optimization in ORDBs
  • Expensive predicates/functions in
    selects/projects
  • e.g. selects based on image manipulation
  • usual heuristic of push select predicates to
    lowest possible level does not work
  • Extended ADTs with methods
  • optimizer order of applying methods, and ordering
    of method evaluation and joins

17
Brief History (Contd.)
  • Mid 90s to late 90s
  • Materialized views, view and index selection
  • Web, data warehouses, virtual databases
  • OLAP, data mining,
  • Approximate query answering
  • Improvements on earlier techniques
  • handling outerjoins, aggregates
  • PLUS all the topics covered in this workshop

18
Materialized Views
  • Can materialize (precompute and store) views to
    speed up queries
  • Incremental maintenance
  • when database is updated, propagate updates to
    materialized view without complete recomputation
  • Deciding when to use materialized views
  • even if query does not refer to materialized
    view, optimizer can figure out it can be used

19
Deciding What to Materialize
  • maintenance cost and query cost
  • workload
  • queries and update transactions
  • weights for each component of workload
  • workload cost depends on what is materialized
  • Goal find set of views that gives minimum cost
    if materialized, subject to space constraints
  • Note materializing views can reduce even update
    costs
  • indices, and SQL assertions

20
Data Warehouses
  • Characteristics
  • Very large
  • typical schema very large fact table, small
    dimension tables
  • typical query aggregate on join of fact table
    and dimension tables
  • Can exploit above characteristics for optimizing
    queries
  • e.g., join dimension tables (even if cross
    product), build in memory index, scan fact table,
    probe index. Summarize if required and output

21
Data Warehouses (Cont)
  • Synchronized scans
  • multiple queries can share a scan of fact table
  • slow some queries down so others catch up
  • Bit map indices
  • for selections on low cardinality attributes
  • e.g. M 10011100011001 F
    01100011100110
  • idea and-ing of bit maps is very efficient, use
    on bitmaps to filter to relevant tuples,
    retrieve them
  • Quass and ONeill Sigmod 1997

22
Virtual Warehouses and Databases
  • Data sources are numerous and distributed
  • may be accessible only via HTML / XML
  • gt wrappers needed
  • may support only limited number of access types
    through forms interfaces
  • site descriptions describe what data is
    contained at a site Levy et al 1995.
  • Query sent only to relevant sites.
  • Stanford TSIMMIS project, Junglee, and others

23
And on to the workshop ...
24
Decorrelation
  • Idea convert nested subqueries to joins
  • Consider select from emp E where
    E.numchildren ltgt (select count()
    from person where person.parent
    E.name
  • Cant always express using basic rel. algebra
  • Long history
  • special cases Kim 88, Dayal 88, Muralikrishna 93
  • general case P. Seshadri et al 95 use outerjoin

25
Incremental View Maintenance
  • E.g. R S (R U ir) S R
    S U ir S (R - dr) S R
    S - dr S
  • similar techniques for selection, projection
    (must maintain multiplicity counters though) and
    aggregation
  • Blakeley et al. SIGMOD 87, Gupta and Mumick
    survey DE Bulletin 95.

26
Deciding What to Materialize
  • History
  • Roussopolous 1982 exhaustive A algorithm
  • Ross, Srivastava and Sudarshan SIGMOD 96
    suggest materializing views can reduce update
    costs, give heuristics
  • Labio et al. 1997, Gupta 1997, Sellis et al
    1997, Yang, Karlapalem and Li 1997 give
    various exhaustive/heuristic/greedy algorithms
  • Chaudhuri and Narsayya 1998 considers only
    indices, being introduced in SQL server
  • Exhaustive algos are all doubly exponential!
Write a Comment
User Comments (0)
About PowerShow.com