Load Management and High Availability in Borealis Magdalena Balazinska, Jeong-Hyon Hwang, and the Borealis team MIT, Brown University, and Brandeis University

About This Presentation

Title:

Load Management and High Availability in Borealis Magdalena Balazinska, Jeong-Hyon Hwang, and the Borealis team MIT, Brown University, and Brandeis University

Description:

Borealis is a distributed stream processing system (DSPS) based on Aurora and Medusa HA Semantics and Algorithms Contract-Based Load Management – PowerPoint PPT presentation

Number of Views:276

Avg rating:3.0/5.0

Slides: 5

Provided by: magdalenab

Learn more at: https://cs.brown.edu

Category:

more less

Transcript and Presenter's Notes

Title: Load Management and High Availability in Borealis Magdalena Balazinska, Jeong-Hyon Hwang, and the Borealis team MIT, Brown University, and Brandeis University

1
Load Management and High Availability in
BorealisMagdalena Balazinska, Jeong-Hyon Hwang,
and the Borealis teamMIT, Brown University, and
Brandeis University
Borealis is a distributed stream processing
system (DSPS) based on Aurora and Medusa
HA Semantics and Algorithms
Contract-Based Load Management

Goals
Manage load through collaborations between
autonomous participants
Ensure acceptable allocation where each nodes
load is below threshold

Challenges Operator and processing
non-determinism

Goal Streaming applications can tolerate
different types of failure recovery
Gap recovery may lose tuples
Rollback recovery produces duplicates but does
not lose tuples
Precise recovery takes over precisely from the
point of failure

Union, operators with timeouts
Arbitrary
Challenges Incentives, efficiency, and
customization
Deterministic
BSort, Resample, Aggregate
Participant

Approach
1 - Offline, participants negotiate and establish
bilateral contracts that
Fix or tightly bound price per unit-load
Are private and customizable (e.g., performance,
availability guarantees, SLA)

Convergent
A
Contract specifying that A will pay C, p per
unit of load
Repeatable
Filter, Map, Join
p
p
C
B
Active Standby shortest recovery time
Passive Standby most suitable for precise
recovery
p,pe
0.8p
ACK
ACK
D

2 - At runtime,
Load moves only between participants that have a
contract
Movements are based on marginal costs
Each participant has a private convex cost
function
Load moves when its cheaper to pay partner than
to process locally

Total cost (delay, )
Convex cost function
B
A
C
B
A
C
Checkpoint
MC(t) at A
Trim
Contract at p
MC(t) at B
B
B
Upstream backup lowest runtime overhead
Offered load (msgs/sec)
load(t)

Task t moves from A to B if
unit MC task t gt p, at A
unit MC task t lt p, at B

Trim
ACK
B
A
C

Properties
Simple, efficient, and low overhead (provable
small bounds)
Provable incentives to participate in mechanism
Experimental result A small number of contracts
and small price-ranges suffice to achieve
acceptable allocation

B
A
C
Replay
B
Network Partitions

Approach Favor availability. Use updates to
achieve consistency
Use connection points to create replicas and
stream versions
Downstream nodes
Monitor upstream nodes
Reconnect to available upstream replica
Continue processing with minimal disruptions

Challenges
Maximize availability
Minimize reprocessing
Maintain consistency

B
Goal Handle network partitions in a distributed
stream processing system
C
A
B
2
Load Management Demonstration Setup
All nodes process a network monitoring query over
real traces of connection summaries
Connection information
Clusters of IPs that establish many connections
F
Group by IP prefix, sum
Filter gt 100
60s
Group by IP count
Filter gt 100
60s
T
IPs that establish many connections
Group by IP count distinct port
Filter gt 10
IPs that connect over many ports
60s
Node A overloaded
A sheds load to B then to C
Query Count the connections established by each
IP over 60 sec and the number of distinct ports
to which each IP connected
A
1) Three nodes with identical contracts and
uneven initial load distribution
p
p
A
Acceptable allocation
C
B
p
B
C
2) As node A becomes overloaded it sheds load to
its partners B and C until system reaches
acceptable allocation
A
C
B
3) Load increases at node B causing system
overload
B
A
System overload
C
Acceptable allocation
4) Node D joins the system. Load flows from node
B to C and C to D until the system reaches
acceptable allocation
C
B
0.8p
D
D
Node D joins
Load flows from C to D and from B to C
3
High Availability Demonstration Setup
Identical queries traverse nodes that use
different high availability approaches
Passive Standby
1) The four primaries, B0, C0, D0, and E0 run on
one laptop
B0
B1
B0
Statically assigned secondary
2) All other nodes run on the other laptop
Active Standby
C1
C0
C0
3) We compare the runtime overhead of the
approaches
A
4) We kill all primaries at the same time
Upstream Backup
D1
D0
D0
5) We compare the recovery time and the effects
on tuple delay and duplication
Upstream Backup Duplicate Elimination
E0
E1
E0
Passive standby adds most end-to-end delay
Passive Standby
Active Standby
Failure
Tuples received
E2E delay
UB no dups
Upstream Backup
Upstream backup has highest overhead during
recovery
Active standby has highest runtime overhead
Duplicate tuples
Failure
4
Network Partition Demonstration Setup
No duplications and no losses after network
partitions
1) The initial query distribution crosses
computer boundaries
Laptop 1
A
C
R
2) We unplug the cable connecting the laptops
Tuples received through R
Tuples received through B
B
Laptop 2
3) Node C detects that node B has become
unreachable
4) Node C identifies node R as reachable
alternate replica Output stream has the same
name but a different version
Sequence nb of received tuples
5) Node C connects to node R and continues
processing from the same point on the stream
6) Node C changes the version of its output stream
End-to-end tuple delay
7) When partition heals, node C remains connected
to R and continues processing uninterrupted
End-to-end tuple delay increases while C detects
the network partition and re-connects to R

Write a Comment

User Comments (0)