Brahms - PowerPoint PPT Presentation

About This Presentation

Title:

Brahms

Description:

Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer – PowerPoint PPT presentation

Number of Views:177

Avg rating:3.0/5.0

Slides: 41

Provided by: Idi97

Learn more at: https://people.csail.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: Brahms

1
Brahms

Byzantine-Resilient Random Membership Sampling

Bortnikov, Gurevich, Keidar, Kliot, and Shraer
2
Edward (Eddie) Bortnikov
Maxim (Max) Gurevich
Idit Keidar
Alexander (Alex) Shraer
Gabriel (Gabi) Kliot
3
Why Random Node Sampling

Gossip partners
Random choices make gossip protocols work
Unstructured overlay networks
E.g., among super-peers
Random links provide robustness, expansion
Gathering statistics
Probe random nodes
Choosing cache locations

4
The Setting

Many nodes n
10,000s, 100,000s, 1,000,000s,
Come and go
Churn
Every joining node knows some others
Connectivity
Full network
Like the Internet
Byzantine failures

5
Byzantine Fault Tolerance (BFT)

Faulty nodes (portion f)
Arbitrary behavior bugs, intrusions, selfishness
Choose f ids arbitrarily
No CA, but no panacea for Cybil attacks
May want to bias samples
Isolate nodes, DoS nodes
Promote themselves, bias statistics

6
Previous Work

Benign gossip membership
Small (logarithmic) views
Robust to churn and benign failures
Empirical study Lpbcast,Scamp,Cyclon,PSS
Analytical study Allavena et al.
Never proven uniform samples
Spatial correlation among neighbors views PSS
Byzantine-resilient gossip
Full views MMR,MS,Fireflies,Drum,BAR
Small views, some resilience SPSS
We are not aware of any analytical work

7
Our Contributions

Gossip-based BFT membership
Linear portion f of Byzantine failures
O(n1/3)-size partial views
Correct nodes remain connected
Mathematically analyzed, validated in simulations
Random sampling
Novel memory-efficient approach
Converges to proven independent uniform samples

The view is not all bad
Better than benign gossip
8
Brahms

Sampling - local component
Gossip - distributed component

Gossip
view
Sampler
sample
9
Sampler Building Block

Input data stream, one element at a time
Bias some values appear more than others
Used with stream of gossiped ids
Output uniform random sample
of unique elements seen thus far
Independent of other Samplers
One element at a time (converging)

next
Sampler
sample
10
Sampler Implementation

Memory stores one element at a time
Use random hash function h
From min-wise independent family Broder et al.
For each set X, and all ,

next
init
Sampler
Keep id with smallest hash so far
Choose random hash function
sample
11
Component S Sampling and Validation
id streamfrom gossip
init
next
Sampler
Sampler
Sampler
Sampler
using pings
sample
Validator
Validator
Validator
Validator
S
12
Gossip Process

Provides the stream of ids for S
Needs to ensure connectivity
Use a bag of tricks to overcome attacks

13
Gossip-Based Membership Primer

Small (sub-linear) local view V
V constantly changes - essential due to churn
Typically, evolves in (unsynchronized) rounds
Push send my id to some node in V
Reinforce underrepresented nodes
Pull retrieve view from some node in V
Spread knowledge within the network
Allavena et al. 05 both are essential
Low probability for partitions and star topologies

14
Brahms Gossip Rounds

Each round
Send pushes, pulls to random nodes from V
Wait to receive pulls, pushes
Update S with all received ids
(Sometimes) re-compute V
Tricky! Beware of adversary attacks

15
Problem 1 Push Drowning
Push Alice
A
E
M
M
Push Bob
Push Mallory
Push Carol
B
M
M
Push Ed
Push Dana
Push MM
D
Push Malfoy
M
16
Trick 1 Rate-Limit Pushes

Use limited messages to bound faulty pushes
system-wide
E.g., computational puzzles/virtual currency
Faulty nodes can send portion p of them
Views wont be all bad

17
Problem 2 Quick Isolation
Ha! Shes out! Now lets move on to the next guy!
Push Alice
A
E
M
Push Bob
Push Carol
Push Mallory
Push Ed
Push Dana
Push MM
Push Malfoy
C
D
18
Trick 2 Detection Recovery

Do not re-compute V in rounds when too many
pushes are received
Slows down isolation does not prevent it

Push Bob
Push Mallory
Hey! Im swamped! I better ignore all of em
pushes
Push MM
Push Malfoy
19
Trick 3 Balance Pulls Pushes

Control contribution of push - aV ids versus
contribution of pull - ßV ids
Parameters a, ß
Pull-only ? eventually all faulty ids
Pull from faulty nodes all faulty ids, from
correct nodes some faulty ids
Push-only ? quick isolation of attacked node
Push ensures system-wide not all bad ids
Pull slows down (does not prevent) isolation

20
Trick 4 History Samples

Attacker influences both push and pull
Feedback ?V random ids from S
Parameters a ß ? 1
Attacker loses control - samples are eventually
perfectly uniform

Yoo-hoo, is there any good process out there?
21
View and Sample Maintenance
Pushed ids
Pulled ids
S
? V
?V
?V
View V
Sample
22
Key Property

Samples take time to help
Assume attack starts when samples are empty
With appropriate parameters
E.g.,
Time to isolation gt time to convergence

Prove lower bound using tricks 1,2,3(not using
samples yet)
Prove upper bound until some good sample
persists forever
Self-healing from partitions
23
History Samples Rationale

Judicious use essential
Bootstrap, avoid slow convergence
Deal with churn
With a little bit of history samples (10) we
can cope with any adversary
Amplification!

24
Analysis

Sampling - mathematical analysis
Connectivity - analysis and simulation
Full system simulation

25
Connectivity ? Sampling

Theorem If overlay remains connected
indefinitely, samples are eventually uniform

26
Sampling ? Connectivity Ever After

Perfect sample of a sampler with hash h the id
with the lowest h(id) system-wide
If correct, sticks once the sampler sees it
Correct perfect sample ? self-healing from
partitions ever after
We analyze PSP(t) probability of perfect sample
at time t

27
Convergence to 1st Perfect Sample

n 1000
f 0.2
40 unique ids in stream

28
Scalability

Analysis says
For scalability, want small and constant
convergence time
independent of system size, e.g., when

29
Connectivity Analysis 1 Balanced Attacks

Attack all nodes the same
Maximizes faulty ids in views system-wide
in any single round
If repeated, system converges to fixed point
ratio of faulty ids in views, which is lt 1 if
?0 (no history) and p lt 1/3 or
History samples are used, any p

There are always good ids in views!
30
Fixed Point Analysis Push
Local view node i
Local view node 1
i

Time t
push
lost push
push from faulty node

1

Time t1
x(t) portion of faulty nodes in views at round
t portion of faulty pushes to correct nodes p
/ ( p ( 1 - p )( 1 - x(t) ) )
31
Fixed Point Analysis Pull
Local view node i
Local view node 1
i

Time t
pull from i faulty with probability x(t)
pull from faulty

Time t1
Ex(t1) ? p / (p (1 - p)(1 - x(t))) ?
( x(t) (1-x(t))?x(t) ) ?f
32
Faulty Ids in Fixed Point
Assumed perfect in analysis, real history in
simulations
With a few history samples, any portion of bad
nodes can be tolerated
Perfectly validated fixed pointsand convergence
33
Convergence to Fixed Point