Eddies: Continuously Adaptive Query Processing - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Eddies: Continuously Adaptive Query Processing

Description:

Based on the metadata the query optimizer determines the most energy ... a tuple the eddy one ticket is debited from the eddies running count for that operator ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 22
Provided by: rossro
Learn more at: https://www.cse.psu.edu
Category:

less

Transcript and Presenter's Notes

Title: Eddies: Continuously Adaptive Query Processing


1
Eddies Continuously Adaptive Query Processing
  • Ross Rosemark

2
Goals of Presentation
  • What are traditional query optimizers?
  • What are the problems with these optimizers?
  • What is an eddy?
  • How does an eddy work?
  • Discuss how an eddy self tunes query plans?

3
Traditional Query Optimizer
  • Metadata (statistics) about the distribution of
    data is collected by the query optimizer
  • Based on the metadata the query optimizer
    determines the most energy efficient query plan
  • Query plan is executed.
  • This is good for snapshot queries (short lived
    queries)

4
Problems
  • The problem with this approach is that data
    became streaming
  • Internet
  • Sensor nodes
  • Etc.
  • The query plan chosen by the query optimizer
    could eventually become inefficient if the
    metadata statistics change
  • Cost of operators
  • Operator selectivies could change
  • Rate tuples arrive from inputs
  • Also its difficult to just re-optimize a query
  • Determining when to re-optimize the query is a
    difficult research issue that has not been
    addressed.
  • Must determine not only when you should
    re-optimize but that re-optimizing still leaves
    produces the same state

5
Example
  • Assume a query that asks for Employees age and
    salary gt 100000 is issued into the sensor
    network.
  • Initially the predicate salary gt 100000 will be
    very selective hence eliminating a lot of tuples
  • As older employees in the database start to be
    read the predicate salary gt 100000 will be less
    selective

6
Lets show some things Eddie must consider.
  • Eddie should
  • Increase synchronization
  • Increase the moments of Symmetry

7
Synchronization Barriers
  • Synchronization barriers
  • Where one operation hinders the speed of another
    operations
  • Extreme example
  • Assume you have a merge join on two
    duplicate-free inputs. (slowlo and fasthi)
  • Assume that during processing the next tuple is
    always consumed from the relation that had the
    lowest values
  • Assume that slowlo is a slowly delivered external
    relation with many low columns in its bandwith
  • Assume that fasthi is high bandwith (i.e.
    delivers tuples fast) and has only high values in
    its column
  • In this example fasthi is delayed why slowlo
    delivers tuples
  • Known as a synchronization barrier
  • Desirable to minimize the number of
    Synchronization barriers

8
Moments of Symmetry
  • You can only re-optimize at a moment of symmetry
  • A moment of symmetry is when the query is
    executed to a point that the optimizer can change
    the query plan without affecting the way the
    query plans predicates are performed
  • Typically happens in joins
  • Example
  • Assume you have a nested loop join with inner
    relation R and outer relation S.
  • In this example you can only re-optimize this
    join when R is completely scanned.
  • It is possible to re-optimize in the middle of
    scanning S but the join algorithm would then have
    to be changed

9
Eddy
  • An Eddy was designed to dynamically re-optimize
    queries.
  • The authors implemented Eddies in a River
  • A river is a shared nothing parallel query
    processing framework that dynamically adapts to
    fluctuations and workloads
  • An eddy is implemented via a module in a river
    containing an arbitrary number of input
    relations, a number of participating unary and
    binary modules, and a single output relation
  • An Eddy also maintains a fixed size buffer of
    tuples that need to be processed
  • Each operator takes two tuples, processes them
    and delivers them back to the eddy

10
Eddy (Cont)
  • A tuple in a eddy is associated with a tuple
    descriptor
  • Contains a vector of Ready bits and Done bits
  • The Eddy ships the tuple to only the operators
    that have the Ready bits turned on
  • After an operators is processed its Done bits
    are set
  • If all done bits are set the tuple is sent to the
    Eddys output
  • Else its sent to another operators

11
Eddy (Cont)
  • So how do you route tuples to the different Eddy
    operators
  • In this paper they look at multiple different ways

12
Naïve Scheme
  • Eddies buffer is implemented as a priority queue
  • Recall the buffer is used to store tuples that
    should be processed by the Eddies
  • When a tuple enters a buffer its priority is set
    to low
  • After its processed by an operator in the Eddy
    and returned to the buffer its priority is set
    to high
  • This ensures that tuples do not get stuck in the
    Eddy. I.e. starvation
  • This scheme dynamically adjusts to work required
    of operators
  • Operators that are slower (i.e. take 4 seconds
    vs. 1 second will receive less tuples)
  • Note each operator has a fixed size queue
  • Once queue is filled up no more tuples can be
    inserted into queue

13
Lottery Scheme
  • Each time a tuple is routed to a operator the
    operator is credited with a ticket
  • When the operator returns a tuple the eddy one
    ticket is debited from the eddies running count
    for that operator
  • Tracks how efficiently a operator drains tuples
    from the system
  • When a tuple is to be routed to an operator the
    Eddy holds a lottery
  • Only the operators that have their Ready bit sets
    can participate in the lottery
  • An operators chance of winning the lottery and
    receiving
  • the tuple corresponds to the count of tickets
    for that operator.
  • Dynamically adjusts to selectivity of operators

14
Window Scheme
  • Problem with lottery scheme is that it uses to
    much past info
  • Problem An operator that gained a lot of ticket
    initially but then became slow
  • In this scheme the lottery scheme is modified
    such that the lottery only looks at tickets
    gained by an operator in a fixed window.
  • Keeps track of two types of tickets
  • Banked tickets
  • Used when running the lottery
  • Escrow tickets
  • Used to measure efficiency during the window
  • At the end of a window
  • Banked Tickets Escrow Tickets
  • Escrow Tickets 0
  • Ensure operators re-approve themselves each
    window

15
Comparison
  • Shows that due to Fluid Dynamics (i.e. the
    varying rates of operators) the Naïve approach
    naturally adjusts based on the cost of operators
  • Shows that Lottery also adjusts based on workload

16
Comparison
  • Naïve eddies does not adjust based on selectivity
  • Naïve performs between the best and worse
  • Lottery does adjust based on selectivity

17
Joins
  • Shows that Lottery performs nearly optimally
  • Naïve performs between the best and worst

18
Joins (Cont)
  • All joins are ripple joins
  • Change selectivity of join predicate
  • In all cases the Eddy with the Lottery is close
    to optimal

19
Window Scheme
  • Both cases are related to different query plans
  • Shows eddy is in between the best and worst case

20
Adaptability
  • Toggle the cost of the operator 3 times
    throughout experiment
  • Notice that Eddy switches how many tuples are
    first processed by that operator

21
Problems
  • Does not work well when there is initially a long
    delay at an operator
  • Eddy gives all tuples to operator because
    operator is not returning tuples that satisfy the
    join predicates criteria
  • Not until after the delay
  • Notice eventually this problem is figured out by
    Eddy
  • Just make take some time
Write a Comment
User Comments (0)
About PowerShow.com