Load Management and High Availability in Borealis Magdalena Balazinska, Jeong-Hyon Hwang, and the Borealis team MIT, Brown University, and Brandeis University - PowerPoint PPT Presentation

About This Presentation
Title:

Load Management and High Availability in Borealis Magdalena Balazinska, Jeong-Hyon Hwang, and the Borealis team MIT, Brown University, and Brandeis University

Description:

Borealis is a distributed stream processing system (DSPS) based on Aurora and Medusa HA Semantics and Algorithms Contract-Based Load Management – PowerPoint PPT presentation

Number of Views:276
Avg rating:3.0/5.0
Slides: 5
Provided by: magdalenab
Learn more at: https://cs.brown.edu
Category:

less

Transcript and Presenter's Notes

Title: Load Management and High Availability in Borealis Magdalena Balazinska, Jeong-Hyon Hwang, and the Borealis team MIT, Brown University, and Brandeis University


1
Load Management and High Availability in
BorealisMagdalena Balazinska, Jeong-Hyon Hwang,
and the Borealis teamMIT, Brown University, and
Brandeis University
Borealis is a distributed stream processing
system (DSPS) based on Aurora and Medusa
HA Semantics and Algorithms
Contract-Based Load Management
  • Goals
  • Manage load through collaborations between
    autonomous participants
  • Ensure acceptable allocation where each nodes
    load is below threshold

Challenges Operator and processing
non-determinism
  • Goal Streaming applications can tolerate
    different types of failure recovery
  • Gap recovery may lose tuples
  • Rollback recovery produces duplicates but does
    not lose tuples
  • Precise recovery takes over precisely from the
    point of failure

Union, operators with timeouts
Arbitrary
Challenges Incentives, efficiency, and
customization
Deterministic
BSort, Resample, Aggregate
Participant
  • Approach
  • 1 - Offline, participants negotiate and establish
    bilateral contracts that
  • Fix or tightly bound price per unit-load
  • Are private and customizable (e.g., performance,
    availability guarantees, SLA)

Convergent
A
Contract specifying that A will pay C, p per
unit of load
Repeatable
Filter, Map, Join
p
p
C
B
Active Standby shortest recovery time
Passive Standby most suitable for precise
recovery
p,pe
0.8p
ACK
ACK
D
  • 2 - At runtime,
  • Load moves only between participants that have a
    contract
  • Movements are based on marginal costs
  • Each participant has a private convex cost
    function
  • Load moves when its cheaper to pay partner than
    to process locally

Total cost (delay, )
Convex cost function
B
A
C
B
A
C
Checkpoint
MC(t) at A
Trim
Contract at p
MC(t) at B
B
B
Upstream backup lowest runtime overhead
Offered load (msgs/sec)
load(t)
  • Task t moves from A to B if
  • unit MC task t gt p, at A
  • unit MC task t lt p, at B

Trim
ACK
B
A
C
  • Properties
  • Simple, efficient, and low overhead (provable
    small bounds)
  • Provable incentives to participate in mechanism
  • Experimental result A small number of contracts
    and small price-ranges suffice to achieve
    acceptable allocation

B
A
C
Replay
B
Network Partitions
  • Approach Favor availability. Use updates to
    achieve consistency
  • Use connection points to create replicas and
    stream versions
  • Downstream nodes
  • Monitor upstream nodes
  • Reconnect to available upstream replica
  • Continue processing with minimal disruptions
  • Challenges
  • Maximize availability
  • Minimize reprocessing
  • Maintain consistency

B
Goal Handle network partitions in a distributed
stream processing system
C
A
B
2
Load Management Demonstration Setup
All nodes process a network monitoring query over
real traces of connection summaries
Connection information
Clusters of IPs that establish many connections
F
Group by IP prefix, sum
Filter gt 100
60s
Group by IP count
Filter gt 100
60s
T
IPs that establish many connections
Group by IP count distinct port
Filter gt 10
IPs that connect over many ports
60s
Node A overloaded
A sheds load to B then to C
Query Count the connections established by each
IP over 60 sec and the number of distinct ports
to which each IP connected
A
1) Three nodes with identical contracts and
uneven initial load distribution
p
p
A
Acceptable allocation
C
B
p
B
C
2) As node A becomes overloaded it sheds load to
its partners B and C until system reaches
acceptable allocation
A
C
B
3) Load increases at node B causing system
overload
B
A
System overload
C
Acceptable allocation
4) Node D joins the system. Load flows from node
B to C and C to D until the system reaches
acceptable allocation
C
B
0.8p
D
D
Node D joins
Load flows from C to D and from B to C
3
High Availability Demonstration Setup
Identical queries traverse nodes that use
different high availability approaches
Passive Standby
1) The four primaries, B0, C0, D0, and E0 run on
one laptop
B0
B1
B0
Statically assigned secondary
2) All other nodes run on the other laptop
Active Standby
C1
C0
C0
3) We compare the runtime overhead of the
approaches
A
4) We kill all primaries at the same time
Upstream Backup
D1
D0
D0
5) We compare the recovery time and the effects
on tuple delay and duplication
Upstream Backup Duplicate Elimination
E0
E1
E0
Passive standby adds most end-to-end delay
Passive Standby
Active Standby
Failure
Tuples received
E2E delay
UB no dups
Upstream Backup
Upstream backup has highest overhead during
recovery
Active standby has highest runtime overhead
Duplicate tuples
Failure
4
Network Partition Demonstration Setup
No duplications and no losses after network
partitions
1) The initial query distribution crosses
computer boundaries
Laptop 1
A
C
R
2) We unplug the cable connecting the laptops
Tuples received through R
Tuples received through B
B
Laptop 2
3) Node C detects that node B has become
unreachable
4) Node C identifies node R as reachable
alternate replica Output stream has the same
name but a different version
Sequence nb of received tuples
5) Node C connects to node R and continues
processing from the same point on the stream
6) Node C changes the version of its output stream
End-to-end tuple delay
7) When partition heals, node C remains connected
to R and continues processing uninterrupted
End-to-end tuple delay increases while C detects
the network partition and re-connects to R
Write a Comment
User Comments (0)
About PowerShow.com