Flux - PowerPoint PPT Presentation

About This Presentation
Title:

Flux

Description:

Flux: An Adaptive Partitioning Operator for Continuous Query Systems M.A. Shah, J.M. Hellerstein, S. Chandrasekaran, M.J. Franklin UC Berkeley Presenter: Bradley ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 32
Provided by: wpi48
Learn more at: http://web.cs.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: Flux


1
Flux
  • Flux An Adaptive Partitioning Operator for
    Continuous Query Systems
  • M.A. Shah, J.M. Hellerstein, S. Chandrasekaran,
  • M.J. Franklin
  • UC Berkeley
  • Presenter Bradley Momberger

2
Overview
  • Introduction
  • Background
  • Experiments and Considerations
  • Conclusion

3
Introduction
  • Continuous query (CQ) systems
  • Create unbounded, streaming results from
    unbounded, streaming data sources.
  • May in the long run have scalability issues, due
    to the need for fast response times, the
    possibility of large numbers of users, and the
    management of potentially large histories.
  • Are only as fast as their constituent operators
    will allow.

4
Parallelism
  • Traditional parallelism techniques
  • Poor fit for CQ systems
  • Not adaptive
  • CQ requires adaptability to changing conditions

5
Overview
  • Introduction
  • Background
  • Experiments and Considerations
  • Conclusion

6
Background
  • Exchange
  • Producer-consumer pair
  • Ex-Prod Intermediate producer instance connected
    to consumers
  • Ex-Cons Intermediate consumer instance which
    polls inputs from all producers.
  • Content sensitive routing
  • RiverDQ
  • Content insensitive routing
  • Random choice of Ex-Cons target

7
Flux
  • Flux, Fault-tolerant Load-balancing eXchange
  • Load balancing through active repartitioning
  • Producer-consumer pair
  • Buffering and reordering
  • Detection of imbalances

8
Short Term Imbalances
  • A stage runs only as fast as its slowest Ex-Cons
  • Head-of-line blocking
  • Uneven distribution over time
  • The Flux-Prod solution
  • Transient Skew buffer
  • Hashtable buffer between producer and Flux-Prod
  • Get new tuples for each Flux-Cons as buffer space
    becomes available.
  • On-demand input reordering

9
Flux-Prod Design
10
Long Term Imbalances
  • Eventually overload fixed size buffers
  • Cannot use same strategy as short term
  • The Flux-Cons solution
  • Repartition at consumer level
  • Move states
  • Aim for maximal benefit per state moved
  • Avoid thrashing

11
Flux-Cons Design
12
Memory Constrained Environment
  • First tests were done with adequate memory
  • Does not necessarily reflect reality
  • Memory shortages
  • Large histories
  • Extra operators
  • Load shedding with little memory
  • Push to disk
  • Move to other site
  • Decrease history size
  • May not be acceptable in some applications

13
Flux and Constrained Memory
  • Dual-destination repartitioning
  • Other machines
  • Disk storage
  • Local mechanism
  • Flux-Cons spills to disk when memory is low
  • Retrieves from disk when memory becomes available
  • Global Memory Constrained Repartitioning
  • Poll Flux-Cons operators for memory usage
  • Repartition based on results

14
Memory-Adaptive Flux-Cons
15
Overview
  • Introduction
  • Background
  • Experiments and Considerations
  • Conclusion

16
Experimental Methodology
  • Example operator
  • Hash-based, windowed group-by-aggregate
  • Statistic over fixed-size history
  • Cluster hardware
  • CPU 1000 MIPS
  • 1GB main memory
  • Network simulation
  • 1K packet size, infinite bandwidth, 0.07ms
    latency
  • Virtual machines, simulated disk.

17
Experimental Methodology
  • Simulator
  • TelegraphCQ base system
  • Operators share physical CPU with event simulator
  • Aggregate evaluation and scheduler simulated
  • Testbed
  • Single producer-consumer stage
  • 32 nodes in simulated cluster
  • Ex-Cons operator dictates performance

18
Short Term Imbalance Experiment
  • Give Flux stage a transient skew buffer
  • Compare to base Exchange stage with equivalent
    space
  • Comparison statistics
  • 500ms load per virtual machine, round robin
  • Simulated process 0.1ms processing, 0.05ms sleep
  • 16s runtime (32 machines ? 0.5s/machine)

19
Short Term Imbalance Experiment
20
Long Term Imbalance Experiment
  • Operator stage
  • 64 partitions per virtual machine
  • 10,000 tuple (800KB) history per partition
  • 160KB skew buffer
  • 0.2µs per tuple for partition processing
  • Network
  • 500mbps throughput for partitions
  • 250mbps point-to-point

21
Balancing Processing Load
22
Graceful Degradation
23
Varying Collection Time
24
Memory Constrained Experiments
  • Memory pressure
  • 768MB initial memory load
  • 6MB/partition ? 128 partitions/machine
  • Available memory gt 512MB (down from 1GB)
  • Change made after 1s of simulation
  • 14s required to push the remaining 256MB
  • May be to disk or to other machines

25
Throughput during Memory Balancing
26
Avg. Latency during Memory Balancing
27
Average Latency Degradation
28
Hybrid Policy
  • Combines previous policies
  • Memory-based policy when partitions are on disk
  • Minimize latency
  • Load-balancing policy when all partitions are in
    memory
  • Maximize throughput

29
Comparative Review
last 20 seconds of simulation

Steady state
30
Overview
  • Introduction
  • Background
  • Experiments and Considerations
  • Conclusion

31
Conclusions
  • Flux
  • Is a reusable mechanism
  • Encapsulates adaptive repartitioning
  • Extends the Exchange operator
  • Alleviates short- and long-term imbalances
  • Outperforms static partitioning when correcting
    imbalances
  • Can use hybrid policies to adapt to changing
    processing and memory requirements.
Write a Comment
User Comments (0)
About PowerShow.com