Query processing - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Query processing

Description:

Many university and industry research projects. Asia, Europe, Americas ... Duality of sorting and hashing. Survey of query evaluation techniques. Post-Volcano ... – PowerPoint PPT presentation

Number of Views:188
Avg rating:3.0/5.0
Slides: 16
Provided by: goetzg
Category:

less

Transcript and Presenter's Notes

Title: Query processing


1
Query processing
  • Goetz Graefe
  • Microsoft
  • SQL Server

2
Volcano optimization
  • Many university and industry research projects
  • Asia, Europe, Americas
  • Red Brick
  • Dynamic query execution plans
  • EROC/NEATO
  • Teradata
  • Cascades
  • Tandem
  • Microsoft SQL Server

3
Volcano Extensible optimization
  • Optimizer generator ICDE 1993
  • Search strategy Bill McKenna
  • Open OODB with José Blakeley
  • Choose plan operation Karen Ward
  • Dynamic execution plans Rick Cole
  • Scientific queries Richard Wolniewicz

4
Volcano Parallel execution
  • Exchange operation SIGMOD 1990
  • Hierarchical hardware Diane Davison
  • Bottom-up scheduling for Informix
  • Resource management Diane Davison
  • Complex object assembly with David Maier and
    Tom Keller
  • Parallel sorting
  • Guided hash join with Henry Bremers
  • Duality of sorting and hashing
  • Survey of query evaluation techniques

5
Post-Volcano
  • Cascades query optimizer
  • Hash joins and hash teams
  • Replicate advantage of interesting orderings
  • B-tree indexes
  • CPU caches
  • Sorting with runs in a partitioned B-tree
  • Incremental index operations using control tables
  • Indexing and caching in nested iteration
  • Write-optimized (log-structured) B-trees
  • Locking and logging in indexed group-by views

6
My current work
  • Ship SQL Server 2005
  • Manage 35 development engineers
  • Query execution, plan caching
  • XML data type, XML indexing, XQuery execution
  • T-SQL execution, CLR hosting
  • Future of SQL Server extensibility
  • Index operations including sorting
  • Robust execution, graceful degradation
  • Miscellaneous data structure ideas
  • Max-diff histograms, order-preserving compression

7
Promising research topics
  • Plan caching
  • Memory management, plan fragments
  • Parameterized queries and plans
  • Policies for histograms and other statistics
  • Sampling, re-sampling thresholds
  • Replacement versus adding, incremental
    maintenance
  • Relationship to materialized and indexed views
  • Feedback and learning
  • Recompilation policies and mechanisms
  • Plan fragments, e.g., cascading updates

8
Promising research topics
  • Resource management within plans
  • Mechanisms, e.g., dynamic bitmaps
  • Policies, e.g., LRU of waiting memory
  • Memory for bitmaps and exchange
  • Nested iteration
  • Nested iteration
  • Index navigation, index optimizations
  • Cost calculation of buffer effects
  • Caching of inner results, use of merged indexes
  • Dynamic query execution plans for nested
    iteration
  • Merging plan caching and data caching

9
Promising research topics
  • Soft indexes, self-tuning
  • Auto-grow, auto-shrink, auto-reorganization
  • Incremental materialization, control tables
  • Streams
  • Recovery log, replication log, error log
  • Audit log, event log, tracing
  • Performance counters, monitoring, notifications
  • Bulk import, bulk update, cascading change
  • Queues of user work and system work
  • Tables and indexes as halted streams
  • Streams as implementation basis for grid
    databases

10
Storage engine interactions
  • Merry-go-round (shared) scans
  • Plan choices by buffer pool contents
  • Plan choices by availability
  • Cost of isolation levels
  • Access or reconstruction of prior versions
  • Consistency among indexes
  • Clustered and non-clustered indexes
  • Multiple non-clustered indexes
  • Indexes on views
  • Correctness of optimization rules
  • Join elimination, view substitution

11
Optimizer quality assurance
  • Daily regression testing is the easy part
  • Primitives for cardinality estimation and cost
    calculation
  • Key issue how to test an AI system?
  • Also database tuning advisor
  • Graceful transition between alternatives
  • All queries and plans
  • Even after errors in cardinality estimation

12
Reducing complexity
  • Code volume and maintenance cost
  • Threshold of multiplying redundancy
  • Supportability and user education
  • Major costs for vendors and users
  • Focus on orders-of-magnitude and on factors
  • Avoid complexity for percentage improvements
  • Peak versus dependable performance
  • Cost containment versus risk management

13
Reducing code volume
  • B-trees for all storage
  • Clustered and non-clustered indexes
  • Indexes on hash values and Z-order
  • Temporary storage for sorting and cursors
  • Materialized and indexed views, no histograms
  • Large objects and cursor positioning
  • Large in-memory data structures?
  • One join but with graceful degradation
  • Fewer optimizer choices and fewer mistakes
  • Fewer transaction isolation levels
  • Serializability, traditional or using snapshots

14
Why are we failing to capture the other 85 of
data?
  • We have yet to master traditional databases
  • Zero administration, zero knobs, zero data loss
  • Self-tuning better than any DBA
  • 100 application availability upgrades,
    failures, tuning, redundancy, etc.
  • Language syntax and semantics, DDL and DML
  • The real answer may be interoperability
  • Complete storage unification may never happen
  • Sociological arguments for separate stores

15
Whats keeping me up at night
  • Am I doing enough to grow my engineers?
  • Am I doing enough technical research?
  • Who will build the first storage toolkit good
    enough for the other 85?
  • Simple enough to use and to maintain
  • Dependable in performance and availability
  • Extensible as a toolkit
  • Truly disrupting the traditional database market
Write a Comment
User Comments (0)
About PowerShow.com