Operator Scheduling in a Data Stream Manager - PowerPoint PPT Presentation

About This Presentation
Title:

Operator Scheduling in a Data Stream Manager

Description:

Operator Scheduling in a Data Stream Manager D. Charney, U. etintemel, A.Rasin, S.Zdonik, - Brown University M.Cherniack - Brandeis University – PowerPoint PPT presentation

Number of Views:134
Avg rating:3.0/5.0
Slides: 38
Provided by: wpi98
Learn more at: http://web.cs.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: Operator Scheduling in a Data Stream Manager


1
Operator Scheduling in a Data Stream Manager
  • D. Charney, U.Çetintemel, A.Rasin, S.Zdonik, -
    Brown University
  • M.Cherniack - Brandeis University
  • M.Stonebraker - MIT
  • Proceedings of the 29th VLDB Conference, Berlin,
    Germany

Presenter Sriram Krishnan Date 3/30/05
2
Agenda
  • Aurora DSMS Architecture
  • Scheduling Algorithms
  • Tuple Batching
  • Experimental Evaluation
  • QoS Aware Scheduling
  • Conclusion

3
Overview of Stream Processing
  • Many applications / devices create data streams
  • Examples sensor networks, position tracking,
    network management, Health monitor, etc.
  • These applications require timely processing of
    large number of continuous, potentially rapid and
    asynchronous data streams.

4
Aurora data stream manager
  • Addresses the performance and processing
    requirements of stream-based applications.
  • Supports multiple concurrent continuous queries
    on one or more application data streams
  • continuous query consists of a directed acyclic
    graph of a well-defined set of operators (boxes
    in Aurora)
  • Applications define their service expectations
    using Quality-of-Service (QoS) specifications

5
Operator Scheduler
  • A key component of any data stream management
    system.
  • Multiplexes processor usage to multiple
    continuous queries according to application
    specified QoS.
  • Simple processor allocation can be achieved by
    assigning a thread per operator.
  • Not good (why?)

6
Paper overview
  • This paper shows that having finer-grain control
    of processor allocation can make a significant
    difference to overall system performance.
  • The paper describes the design and implementation
    of the Aurora scheduler.

7
Motivation Cost components of continuous query
  • Random and Round robin scheduling.
  • Inference?
  • The actual time spent for processing is smaller
    than 5 of the overall execution time in both
    cases.

8
Aurora scheduler
  • Performs the following tasks
  • Constructs a Dynamic scheduling-plan that
    specifies,
  • Which boxes to schedule
  • In which order to schedule the boxes
  • How many tuples to process at each box execution.
  • Schedules based on the QoS
  • Strives to maximize the overall QoS delivered to
    the client applications

9
Aurora System Model (High Level)
  • Fundamentally a data-flow system.
  • Tuples flow through a loop-free, directed graph
    of processing operations (a.k.a. boxes).

10
Aurora System Model
  • Tuples generated by data sources arrive at the
    input and are queued for processing.
  • The scheduler selects boxes with waiting tuples
    and executes them on one or more of their input
    tuples.
  • The output tuples of a box are queued at the
    input of the next box in sequence.
  • The QoS is specified primarily based on the
    notion of the latency (i.e., delay) of output
    tuples
  • Output tuples should be produced in a timely
    fashion, otherwise, QoS will degrade as latencies
    get longer.

11
Aurora Architecture
The box processor executes the appropriate
operation and then forwards the output tuples to
the router. Question Why should we monitor QoS?
  • Conceptually the Scheduler
  • Picks a box for execution.
  • Ascertains how many tuples to process from its
    input.
  • Passes the information to the multi-threaded box
    processor.

12
Execution Model
  • Thread-based execution
  • Each operator/query is processed in its own
    thread
  • The operating system manages resource allocation
  • Advantages
  • Easy to program
  • Efficient operating system algorithms
  • Disadvantages
  • Overhead due to cache misses, lock contention and
    context switching.
  • Software has limited control of resource
    management.

13
Aurora - Execution Model
  • Aurora uses a state-based scheduling execution
    model.
  • There is a single scheduler thread that tracks
    system state and maintains the execution queue.
  • The execution queue is shared among a small
    number of worker threads
  • This model
  • Enables fine grained allocation of resources
    according to application specifications
  • Enables effective batching of operators and
    tuples (Why is this not possible with Thread
    based?).

14
Execution Model - Comparison
  • As system workload increases, Performance
    degrades almost linearly in Aurora and
    exponentially in thread-per-box.
  • What Does it mean?

15
Two-Level Scheduling
  • First level involves which continuous (sub-)query
    to process.
  • Used for dynamically assigning priorities to
    operators
  • Second level involves how precisely the selected
    query should be processed.
  • Used for choosing the order in which the
    component operators will be executed.
  • Outcome of above decisions are a sequence of
    operators, referred to as a scheduling plan.

16
Sample Query Tree
  • The tree is rooted at box b1 (Aurora constraint)
  • We will refer to this tree in subsequent slides

17
Superbox - Operator Batching
  • A tree of boxes rooted at an output box
  • Sequence of boxes that is scheduled as an atomic
    group.
  • Superboxes decrease the overall execution costs
    and improve scalability
  • They reduce the scheduling overhead by scheduling
    multiple boxes as a single unit
  • They eliminate the need to access the storage
    manager for each individual box execution.

18
Scheduling
  • First-level scheduling - Superbox selection
  • Static and dynamic scheduling approaches
  • Static approaches to scheduling are defined prior
    to runtime.
  • Aurora implements a static superbox selection -
    application-at-a-time one superbox per query.
  • Dynamic approaches use runtime information and
    statistics to adjust and prioritize scheduling
    order.
  • Second-level scheduling Superbox traversal
  • Specifies the ordering of the boxes in the
    scheduling plan.
  • Accomplished by traversing the superbox.

19
Superbox Traversal
  • Superbox traversal refers to how the operators
    within a superbox should be executed
  • Three traversal Algorithms
  • Min-Cost (MC)
  • Min-Latency (ML)
  • Min-Memory (MM)

20
Superbox Traversal Min Cost
  • Min-Cost (MC) Attempts to optimize throughput
    by minimizing the number of box calls per output
    tuple.
  • Accomplished by traversing the superbox in post
    order.
  • A box is scheduled for execution only after all
    the boxes in its sub-tree are scheduled.

21
Superbox Traversal Min Cost (Contd.)
  • Assume each box has
  • A Processing cost per tuple of p
  • A Box call overhead of o
  • A selectivity equal to one (what is this?)
  • Exactly one non-empty input queue that contains a
    single tuple.
  • MC traversal executes each box only once
  • In which order the boxes are traversed?
  • b4 ? b5 ? b3 ? b2 ? b6 ? b1
  • Execution cost - 15p 6o (why?)
  • Average output tuple latency is - 12.5p 6o

22
Superbox Traversal Min Latency
  • Min-Latency (ML) Average latency of the output
    tuples can be reduced by producing initial output
    tuples as fast as possible.
  • Defines a value called output cost for each box.
  • An estimate of the latency incurred in producing
    one output tuple.
  • Output Selectivity
  • How many tuples must be processed from the input
    to produce 1 tuple at the output.
  • Product of selectivity of all boxes downstream,
    including the current box.
  • Relation between output selectivity and Output
    cost?
  • Approximately inversely proportional (depends on
    the cost of boxes involved.)

23
Superbox Traversal Min Latency
  • Traversal?
  • b1 ? b2 ? b1 ? b6 ? b1 ? b4 ? b2 ? b1 ? b3 ? b2
    ? b1 ? b5 ? b3 ? b2 ? b1
  • The ML traversal incurs nine extra box calls over
    an MC traversal
  • Note MC incurred six box calls.
  • Total execution cost is 15p 15o
  • Which one has lower execution time ML or MC?
  • MC always has a lower execution time.

24
Superbox Traversal Min Memory
  • Min-Memory (MM) Attempts to minimize memory
    usage
  • Schedules boxes in an order that yields maximum
    increase in available memory
  • Defines Expected memory reduction rate for each
    box.
  • EMRR function (current queue size, Selectivity,
    Cost)

25
Superbox Traversal Min Memory
  • Assume following box selectivity and cost
  • b1 (0.9, 2) b2 (0.4, 2) b3
    (0.5, 1) b4 (1.0, 2) b5 (0.4, 3)
    b6 (0.6, 1)
  • Assuming initial queue size of 1
  • Computed EMRR for the boxes are
  • b10.05, b20.3, b30.5, b40, b50.2, b60.4
  • What will be the Scheduling Plan?
  • b3 ? b6 ? b2 ? b5 ? b3 ? b2 ? b1 ? b4 ? b2 ? b1

26
Tuple Batching (Train Processing)
  • A Tuple Train is the process of executing tuples
    in a batch within a single operator call.
  • The goal of Tuple Train processing is to reduce
    overall processing cost. How?
  • Decreased number of total box calls.
  • Cuts down on low level overhead such as context
    switching, scheduling, and execution queue
    maintenance
  • Improves memory utilization (low memory)
  • Reduces the tuple from shuttling back and forth
    between memory and disk.
  • Some operators execute faster with larger number
    of tuples available in their queues.

27
Tuple Batching
  • The Aurora scheduler implements train processing
    by telling each box when to execute and how many
    queued tuples to process.
  • Aurora allows an arbitrary number of tuples to be
    contained within a train.
  • What variables dictate the size of a train?
  • Variance in latencies
  • Total memory footprint

28
Operator Batching Evaluation
Capacity Percent of system resources used.
  • RR_BAAT - Round Robin - Box At A Time.
  • MC_AAAT Minimum Cost - Application At A Time.
  • What can we infer?
  • The scheduling overhead of the box-at-a-time
    approach is very evident.

29
Latency Min-cost Vs Min-Latency
  • What can we Infer?
  • For larger processing costs, ML wins as it
    optimizes the traversal by minimizing output
    latency.
  • For smaller box processing costs, box call
    overheads dominate overall costs and MC wins.

30
Memory requirements Evaluation
The curves are normalized with respect to the MM
values.
  • Inference?
  • ML is most inefficient in its use of memory with
    MC performing second.
  • Crossover towards the end of the time period is a
    consequence of the fact that different traversals
    take different times to finish.

31
Tuple Batching - Evaluation
Train size (x-axis) is given as a percentage of
the queue size. Overhead Total execution time
less processing time. In order to isolate the
effects of operator scheduling, round-robin BAAT
was used for this experiment.
  • Inference?
  • For a burst size of 4, the overhead quadruples.
  • When the train size is equal to one (the entire
    queue), the average overhead approaches the
    overhead for the non bursty case.

32
Comparison of Execution times
  • - TAAT (tuple-at-a-time)
  • - BAAT (tuple trains)
  • MC (Superbox)
  • Number at the top shows actual time for
    processing 100k tuples in the system.
  • TAAT is significantly worse than the other
    methods.
  • Superbox scheduling decreases the overall
    execution time of the system running tuple-trains
    almost by 50
  • As we go from left to right, the scheduler
    algorithms become increasingly more intelligent
    and sophisticated, taking more time to generate
    the scheduling plans.

33
QoS-Driven Scheduling
  • Keep track of the latency of tuples that reside
    at the queues.
  • Pick the tuples whose execution will provide the
    most expected increase in the aggregate QoS.
  • This approach is not scalable (Why?)
  • Tuple batching will be difficult
  • High scheduling overhead.
  • Aurora Scheduler maintains latency information at
    the granularity of individual boxes
  • Latency of a box is the averaged latencies of the
    tuples in its input queue.

34
QoS-Driven Scheduling - Algorithm
Expected output latency Eol(b) latency(b)
cost(D(b)) Utility utility(b)
gradient(eol(b)) Expected slack time est(b), is
an indication of how close a box is to a critical
point. Critical point A point where the QoS
changes sharply.
  • Priority is assigned so as to order the boxes in
    terms of their utility and urgency.
  • Utility - This value is an estimation of where a
    boxs tuples currently are on the QoS-latency
    curve at the corresponding output. When is the
    Utility lowest and highest?
  • Urgency Given by the expected slack time.

35
QoS-Driven Scheduling
  • Scheduler algorithm
  • First choose for execution those boxes that have
    the highest utility,
  • Choose from among those that have the same
    utility, the ones that have the minimum slack
    time.

36
QoS-Driven Scheduling - Evaluation
  • Inference?

37
CONCLUSION
  • Presents an experimental investigation of
    scheduling algorithms for stream data management
    systems.
  • The authors
  • showed that a naïve scheduling approach of using
    a thread per box does not scale.
  • showed that the approach of train scheduling and
    superbox scheduling help a lot to reduce system
    overheads.
  • addressed QoS issues and extended their basic
    algorithms to address application-specific QoS
    expectations.
Write a Comment
User Comments (0)
About PowerShow.com