Query Processing for SQL Updates - PowerPoint PPT Presentation

About This Presentation
Title:

Query Processing for SQL Updates

Description:

Store locator of record corresp to primary access path ... locator needed for secondary indices ... sort split delta stream by (locator, action) ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 17
Provided by: SSuda7
Category:

less

Transcript and Presenter's Notes

Title: Query Processing for SQL Updates


1
Query Processing for SQL Updates
  • Galindo-Legaria, Stefani and Waas (Microsoft
    Corp)
  • Talk for CS 632 by S. Sudarshan

2
Issues in Efficient Processing of Updates
  • Checking constraints
  • primary/unique key
  • check constraints
  • referential integrity, esp. with cascading
    actions
  • Updating primary and secondary indices
  • Bulk updates vs. individual updates
  • Not in this paper
  • Write optimized index structures
  • Materialized view maintenance

3
Modeling Updates
  • Delta stream
  • StreamUpdate operator
  • issues updates to the storage engine
  • multiple of these can be cascaded or run serially
    for a single update

4
General form of update Plan
  • Read-query provides
  • for insert/update values to be inserted/new
    values for update
  • for delete/update locators of tuples to be
    deleted updated

5
Index Maintenance
  • Primary access path
  • B-tree or heap
  • locator uniquely identify row, and allow
    efficient retrieval
  • For B-tree index attributes uniquifier
  • For heap physical locator RID
  • Secondary indices (non-clustered)
  • Store locator of record corresp to primary access
    path
  • Optionally included columns at leaf level
  • (a.k.a. covering indices)
  • Also
  • computed columns
  • unique declaration

6
Index Maintenance
  • Primary access path updated first
  • locator needed for secondary indices
  • old values needed for secondary index update,
    obtained from primary access path
  • copied out into delta table (pre-copy)
  • Per-row vs Per-Index update plans
  • Per-row update all indices corresp. to a row,
    before processing next row
  • Per-index use all rows to update an index, then
    move to next index
  • Allows sorting of delta table, to reduce access
    costs
  • but requires spooling of delta table

7
Per-Index Maintenance
  • A single update affects two areas of an index
  • messes up access order even if delta table is
    sorted
  • idea create two records, one for delete action,
    one for insert action
  • called split delta
  • sort split delta stream by (locator, action)
  • delete sorts first to avoid spurious uniqueness
    violations

8
Split Delta
9
Choosing an Update Plan
  • Per-index plan can be more efficient for a large
    batch
  • Sorted order, piggy back other steps such as
    referential integrity checks
  • But may be less efficient in other cases due to
  • overhead of more operators for per-index plan
  • spooling of delta
  • Wide vs stacked plans
  • i.e. separate deltas for each secondary index vs.
    shared delta pipelined through many operators
  • wide seems better since it can prune unnecessary
    columns
  • Cache effects when you have a choice of index
    for selection, may be better to use one that is
    going to be updated

10
Checking Single Table Constraints
  • Only those that may be affected by update are
    checked
  • exception/rollback on failure

11
Referential Integrity Constraints
  • Updates/deletes on referenced side
  • Inserts/updates on referencing side
  • Use outerjoin of delta rows with other table
  • ensure match/lack of match depending on updated
    side being referencing/referenced
  • Checks performed after updating the table
  • But for updates to referenced table, old values
    must be used
  • May require index on attributes of referencing
    table
  • index on referenced table already present
    normally

12
Checking Referential Integrity
13
Referential Integrity (Cont.)
  • When checking self-referential integrity
    constraints, perform all updates done by a DML
    statement, only then check
  • Why?
  • If there is an index on f.k. attrs, integrity
    check can be combined with index maintenance,
    since both need same sort order
  • place ref integrity check operator just above
    index maintenance
  • Another optimization eliminate duplicates

14
Cascading Updates
  • E.g. S has f.k. referencing R, on update cascade
  • Update to p.k. of tuple in R cascades to S
  • Similar to f.k. checking, but instead of
    validating, update referencing relation(s)
  • to do so, create delta table, and use update plan
    to process delta table

15
Cascading Updates
16
Issues not addressed by paper
  • All optimizations are for a single transaction
  • What if there is a sequence of small
    transactions?
  • Insert/update/delete to large B-tree may require
    1 I/O per update to leaf page
  • even assuming internal pages are in memory
  • Solution write-optimized B-trees
  • collect deltas and apply as a batch
  • but ensure queries are correctly answered
    meanwhile, even if more slowly
  • Several schemes including work at IITB in 1997,
    and a nice survey paper by Graefe with some neat
    implementation ideas
Write a Comment
User Comments (0)
About PowerShow.com