The User is the Computer: From Decentralized Systems to Social Computing - PowerPoint PPT Presentation

1 / 132
About This Presentation
Title:

The User is the Computer: From Decentralized Systems to Social Computing

Description:

Pastry: prefix-based routing. Similar to Plaxton Trees [Plaxton et al. 97] But added ... 'At any time, at most one overlay node accepts messages with a given key' ... – PowerPoint PPT presentation

Number of Views:132
Avg rating:3.0/5.0
Slides: 133
Provided by: andreash6
Category:

less

Transcript and Presenter's Notes

Title: The User is the Computer: From Decentralized Systems to Social Computing


1
The User is the Computer From Decentralized
Systems to Social Computing
Peter Druschel
2
Course overview
  • Todays computer systems augment a wide range of
    human activity, including cooperation among
    individuals, organizations, businesses
  • This course deals with some of the technology
    underlying this trend, as well as the challenges
    and opportunities that come with it

3
Course overview
  • Decentralized systems (2 hours)
  • Overlays, object lookup, routing
  • Shared state and coordination
  • Applications
  • Challenges
  • Accountability for distributed systems (1.5
    hours)
  • Why and what is accountability?
  • How can we implement it?
  • How well does it work?
  • Social computing and applications (1.5 hours)
  • Exploiting social networks for distributed
    computing
  • Example enhancing Web search
  • Example thwarting unwanted communication

4
Credits
  • Colleagues
  • Krishna Gummadi, MPI-SWS
  • Rodrigo Rodrigues, MPI-SWS
  • Anne-Marie Kermarrec, INRIA
  • Ant Rowstron, MSRC
  • Miguel Castro, MSRC
  • Ion Stoica, UC Berkeley
  • John Kubiatowicz, UC Berkeley
  • Frank Dabek, Google
  • Y. Charlie Hu, Purdue
  • Group members
  • Andreas Haeberlen
  • Jeff Hoye
  • Petr Kuznetsov
  • Alan Mislove
  • Animesh Nandi
  • Ansley Post
  • Atul Singh
  • Jim Stewart
  • Funding
  • Max Planck Society
  • National Science Foundation
  • Intel Research
  • Microsoft Research
  • Texas ATP

5
Decentralized (p2p) systems
  • Distributed computer system with
  • Symmetric components
  • Decentralized control and state
  • Self-organization
  • Promise
  • Organic growth
  • Low barrier to deployment
  • Resilience to faults, attack
  • Resource abundance, diversity

6
Partly vs. fully decentralized systems
  • Partly decentralized systems have a dedicated
    controller node
  • Organic growth, abundant/diverse resources
  • Limited scalability, resilience
  • Fully decentralized systems
  • Some fully decentralized systems have powerful
    supernodes
  • Increased efficiency, but reduced resilience

7
Decentralized systems deployment
  • Self-organization enables deployment in dynamic
    networks
  • Ad hoc wireless networks
  • Mobile wireless devices
  • Delay-tolerant networks
  • Devices with intermittent connectivity
  • Overlay networks (most common)
  • Internet-connected devices

8
Outline
  • Decentralized systems state-of-the-art
  • Overlays, object lookup, routing
  • Example Pastry
  • Shared state and coordination DHTs and
    Scribe/DOLR
  • Challenges
  • Putting it all together ePOST
  • Accountability for distributed systems
  • Social computing and applications

9
Overlay networks
Overlay network
Internet
  • Overlay links rely on unicast service in the
    Internet
  • Topology can be structured or unstructured

10
Why overlays?
  • Overcome limitations of Internet architecture
  • group communication, content-oriented networking
  • enable innovation
  • Low barrier to deployment
  • resource sharing enables organic growth
  • self-organization simplifies operation
  • Robustness to faults, attacks, unexpected
    workloads
  • decentralization
  • resource diversity, wealth

11
Decentralized (p2p) systems What do they enable?
  • Cooperative computing
  • Content sharing/distribution (Kazaa, BitTorrent)
  • Streaming media (SOPcast, PPLive, Joost, iPlayer)
  • Telephony (Skype), popular scientific computing
  • Low barrier to deployment, market entry
    Innovation
  • Digital preservation
  • Diversity, abundance of resources provides
    durability
  • Autonomous distributed systems
  • Self-managing networks of little or mobile
    devices
  • Decentralization is necessary for autonomy

12
Popular decentralized systems
  • File sharing, bulk content distribution
  • BitTorrent, eDonkey dominate Internet traffic
  • Streaming media distribution
  • PPLive, CoolStreaming, Joost, iPlayer,
    LiveStation
  • Skype
  • Volunteer computing
  • BOINC apps perform 1 PFLOPS on average

13
Decentralized (p2p) systems State-of-the-art
  • Decentralized state management
  • Object location
  • Replication
  • Availability, Durability
  • Load balancing
  • Efficient, consistent lookup routing in Internet
    overlays
  • Efficient cooperative content distribution
  • Dependable storage from untrusted components
  • Security secure routing, content integrity,
    incentives

14
Key problem Object location
  • Objects partitioned among participating nodes
  • Mapping from objects to nodes is dynamic
  • Unicast routing doesnt help
  • dont know who to talk to
  • dont know where to store objects
  • want to address (data) objects, not nodes !

15
Solution 1 Unstructured overlay
  • No assumptions about overlay graph structure
  • New node is assumed to know one participant
  • Performs random walk to find more nodes to attach
    to
  • Object placement
  • Inserting node or random walk target
  • May leave references along random path
  • Object lookup
  • Scoped flooding or random walk
  • Examples Gnutella, Kazaa, eDonkey

16
Unstructured object location
  • I inserts an object
  • Leave reference on R
  • S floods a request
  • Finds reference at R
  • Tradeoff between scalability and recall
  • Popular object easy to find

17
Solution 2 structured overlay networks
  • Overlay graph conforms to a specific graph
    structure
  • Key-based routing primitive (KBR)
  • KBR(M, X) route message M to the live node
    that is currently responsible for the object
    associated with numerical id X
  • Basis for content-oriented networking
  • Examples Chord, CAN, Pastry, Tapestry, Bamboo,
    Kademlia, SkipNet, Kelips, Accordeon, etc.

18
Structured vs. unstructured overlays
  • Structured
  • Pre-determined routes
  • Efficient identity lookup, tree formation
  • More susceptible to churn
  • Unstructured
  • Simple overlay formation
  • Tradeoff between recall and efficiency
  • Robust to churn
  • Can be combined
  • Stable nodes form structure
  • Others attach randomly

19
Outline
  • Decentralized systems state-of-the-art
  • Overlays, object lookup, routing
  • Example Pastry
  • Shared state and coordination DHTs and
    Scribe/DOLR
  • Challenges
  • Putting it all together ePOST
  • Accountability for distributed systems
  • Social computing and applications

20
Pastry Identifier space
  • Consistent hashing Karger et al. 97
  • 160 bit circular id space
  • nodeIds (uniform random)
  • keys (uniform random)
  • Each key is mapped to the live node with
    closest nodeId

2160-1
O
key
nodeIds
21
Pastry lookup
2160-1
O
Msg with key X is routed to live node with nodeId
closest to X Problem complete routing table
not scalable
X
KBR(M, X)
22
Pastry prefix-based routing
d471f1
d467c4
d462ba
d46a1c
d4213f
  • Properties
  • log16 N steps
  • O(log N) state

KBR(M, d46a1c)
d13da3
65a1fc
23
Pastry routing table (node 65a1fcx)
Row 0
Row 1
Row 2
Row 3
log16 N rows
24
Pastry prefix-based routing
  • Similar to Plaxton Trees Plaxton et al. 97
  • But added
  • Neigbor sets for consistency, robustness,
    security
  • Consistent routing
  • Self-organization (dynamic joins, fault
    tolerance)
  • Proximity neighbor selection for efficiency
  • Secure routing to defend against malicious nodes

25
Neighbor sets
A
B
  • Stabilization protocol ensures eventual
    consistency
  • aids routing consistency
  • enables secure routing
  • localizes fault detection within neighbor sets
  • enables application-specific local coordination
    (e.g., object replica
    management)

26
Challenge Inconsistent routing
New node N has informed X, but not yet Y of its
arrival
Y
  • Routing consistency
  • At any time, at most one overlay node accepts
    messages with a given key
  • Necessary for consistency of mutable data
  • Complicated by Internet routing anomalies

key
N
X
27
Ensuring routing consistency
  • To accept a message with key k, a node Y
    requires a lease from its neighbors, for an
    interval XltkltZ
  • Lease can be issued if grantor has a valid lease
    and previous lease has expired
  • Assumption
  • Any live node can be reached via one of its
    neighbor set members
  • Ensures
  • properly formed ring (eventually)
  • at most one node at a time accepts
  • messages with key k
  • gt routing consistency

L1 Y-Z
L2 X-Y
Z
X
Y
28
Challenge Self-organization
  • Initializing and maintaining node state
    (overlay construction and maintenance)
  • Node addition
  • Node departure (failure)

29
Pastry Node join
d471f1
d467c4
d462ba
d46a1c
New node d46a1c
d4213f
KBR(Join,d46a1c)
d13da3
65a1fc
30
Pastry Node departure (failure)
  • Neighbor set members exchange keep-alive
    messages (failure detection, neighbor set
    stabilization)
  • Neighbor set repair (eager) request set from
    farthest live node in set
  • Routing table repair (lazy) get table from peers
    in the same row, then higher rows

31
Challenge Overlay route efficiency
20x
OR-DSL
CMU
MIT
MA-Cable
Cisco
81x
Cornell
CA-T1
CCI
89x
NYU
Aros
Utah
80x
  • Nodes close in id space, but far away in Internet
  • Goal choose routing table entries that yield few
    hops and low latency

32
Proximity neighbor selection (PNS)
  • Assumptions
  • scalar proximity metric (e.g., RTT)
  • a node can probe distance to any other node
  • Proximity invariant
  • Each routing table entry refers to a node
    close to the local node (in the physical
    network), among all nodes with the appropriate
    nodeId prefix.

33
PNS Routes in delay space
34
PNS Properties
  • Low-delay routes Average delay stretch, relative
    to IP, is a small constant (1.3 - 2.2) and can be
    derived from the physical networks delay
    distribution
  • Route convergence Routes of messages sent by
    nearby nodes with the same key converge at a node
    near the source nodes
  • Details in Castro et al. MSR-TR-2002-82

35
Outline
  • Decentralized systems state-of-the-art
  • Overlays, object lookup, routing
  • Example Pastry
  • Shared state and coordination DHTs and
    Scribe/DOLR
  • Challenges
  • Putting it all together ePOST
  • Accountability for distributed systems
  • Social computing and applications

36
Sharing state Distributed hash tables (DHT)
  • Hashtable API put(obj,key), obj lt- get(key)
  • Layered on top of a structured overlay
  • Scalability, Robustness
  • Persistent storage
  • High availability
  • Examples Chord/CFS, Pastry/PAST, Bamboo, Kelips,
    Kademlia

37
Distributed hash table (DHT)
nodes
k1,v1
k2,v2
k3,v3
Overlay network
Operations insert(k,v) vlookup(k)
k4,v4
k5,v5
k6,v6
  • Structured overlay maps keys to nodes
  • Decentralized and self-organizing
  • Scalable, robust

38
DHT Insertion and replication
r4
Storage Invariant Tuple replicas are stored on
r nodes with nodeIds closest to key
key
Insert(key,value,r)
39
DHT Lookup

C
r replicas
Object located in log16 N steps
(expected) usually locates replica nearest
client C
Key
Lookup(key)
40
DHT Dynamic caching
  • Nodes cache tuples in the unused portion of their
    allocated disk space
  • Tuples cached on nodes along the route of lookup
    and insert messages
  • Goals
  • maximize query xput for popular tuples
  • balance query load
  • improve client latency

41
DHT Dynamic caching
Key
Delay space
Lookup(key)
42
Coordination Decentralized group management
  • E.g., SCRIBE Rowstron et al., JSAC 02
  • Spanning trees embedded in structured overlay
  • Multicast, anycast primitives
  • Scalable large numbers of groups, members, wide
    range of members/group, dynamic membership

43
Cooperative group communication
nodes
n0
gn1,n2
Operations create(g) join(g) leave(g) multicast(g
,m) anycast(g,m)

n1
g
n2
gn3,n4
n3
g
  • groupId g mapped to n0
  • decentralized membership
  • robust, scalable

n4
g
44
Scribe
groupId
Delay space
Join(groupId)
45
Structured overlay APIs
create(g) join(g) leave(g) multicast(g,m) anycast(
g,m
insert(k,v) vlookup(k)
DHT
SCRIBE / DOLR
route(M, X)
KBR
Dabek et al., IPTPS 05
46
Outline
  • Decentralized systems state-of-the-art
  • Overlays, object lookup, routing
  • Example Pastry
  • Shared state and coordination DHTs and
    Scribe/DOLR
  • Challenges malicious participants
  • Putting it all together ePOST
  • Accountability for distributed systems
  • Social computing and applications

47
Malicious participants threats
A
  • Prevent messages from reaching root
  • drop or corrupt
  • bias routing tables
  • Cause objects to be placed on faulty nodes
  • choose nodeId values
  • use many identities (Sybil attack)
  • impersonate root

B
key
C
F
I
J
L
48
Malicious participants threats
A
  • Prevent messages from reaching root
  • drop or corrupt
  • bias routing tables
  • Cause objects to be placed on faulty nodes
  • choose nodeId values
  • use many identities (Sybil attack)
  • impersonate root

B
C
F
I
J
L
49
Malicious participants threats
  • Prevent messages from reaching root
  • drop or corrupt
  • bias routing tables
  • Cause objects to be placed on faulty nodes
  • choose nodeId values
  • use many identities (Sybil attack)
  • impersonate root

A
B
key
C
F
I
J
L
50
Malicious participants threats
  • Prevent messages from reaching root
  • drop or corrupt
  • bias routing tables
  • Cause objects to be placed on faulty nodes
  • choose nodeId values
  • use many identities (Sybil attack)
  • impersonate root

A
B
key
C
D
E
F
G
H
I
J
L
K
51
Malicious participants threats
A
  • Prevent messages from reaching root
  • drop or corrupt
  • bias routing tables
  • Cause objects to be placed on faulty nodes
  • choose nodeId values
  • use many identities (Sybil attack)
  • impersonate root

B
C
F is my neighbor
key
F
I
J
L
K
52
Securing routing
A
B
key
  • Secure node identifier assignment
  • thwarts Sybil and id choosing attacks
  • Secure membership protocol
  • Prevents routing table bias attacks
  • Secure routing primitive
  • Prevents root impersonation
  • Can tolerate up to 25 malicious nodes

C
F
I
J
53
Securing routing
A
  • Secure routing primitive
  • Prevents root impersonation

B
C
key
F is my neighbor
F
Castro et al., OSDI 02
I
M
J
L
K
54
Other threats
  • Freeloading incentives mechanisms
  • Data corruption crypto
  • Denial-of-service
  • Several defenses needed

55
Outline
  • Decentralized systems state-of-the-art
  • Overlays, object lookup, routing
  • Example Pastry
  • Shared state and coordination DHTs and
    Scribe/DOLR
  • Challenges malicious participants
  • Putting it all together ePOST
  • Accountability for distributed systems
  • Social computing and applications

56
Putting it all together ePOST
  • Decentralized, cooperative email service
  • Based on users desktops/notebooks
  • Messages transmitted and stored securely
  • Standard mail clients (IMAP/POP)
  • Interoperability via SMTP
  • Nodes may fail arbitrarily
  • Users only trust their local node

Mislove et al., EuroSys 06
57
Why Email?
  • Demanding user expectations
  • Privacy
  • Integrity
  • Durability
  • Availability
  • Goal Demonstrate that a decentralized,
    cooperative email service can be built that users
    can entrust with their production email

58
ePOST Single-copy store
Email Data
  • Emails split into MIME components, stored in the
    DHT
  • Using its content-hash as the key
  • Self-certifying (integrity)
  • Identical items stored once
  • Convergent encryption
  • Items replicated thrice for availability
  • Additional erasure-coded replicas for durability
    (Glacier Haeberlen et al., NSDI05)

Attachment
Header
Body
Attachment
59
ePOST Single-writer log
  • Per-user metadata (folders, inbox, etc.) stored
    as an update log
  • All updates performed by owner
  • Stored in the DHT
  • Entries form a hash chain
  • Log head is signed with owners key
  • Periodic snapshots stored in log

Email Data Log Head Log Entry
Insert msg x
Attachment
Header
Insert msg y
Body
Mark msg y read
Attachment
60
ePOST Message Delivery
  • Message notifications are signed and contain
    encrypted headers and keys to the messages
    components
  • Each user has a Scribe group
  • Node joins users group if it has a message for
    the user
  • User announces to the group when online
  • Pending notifications delivered

61
ePOST Security
  • Users have certificates (public key, node id)
  • Secure communication (SSL)
  • All content stored in the DHT is protected
  • Authenticity
  • Integrity
  • Privacy
  • Incentives to prevent freeloading (Scrivener
    Nandi, Middleware05)
  • Secure KBR

62
Deployment and Experience
  • Rice / MPI rings reserved for internal members
  • PlanetLab ring open membership ring, backed by
    Planetlab
  • Usage
  • 26 internal users (16 used ePOST as primary
    email) over more than two years
  • 40 DHT nodes (Rice / MPI ring), 350 nodes
    (PlanetLab ring)
  • Several times, ePOST was available when Rice or
    MPI-SWS email had failed
  • No system-wide outages after initial testing
    phase
  • Shut down due to overhead of tracking spam
    filtering

63
Decentralized systems challenges
  • Maintaining mutable distributed state remains
    hard
  • Fortunately, lots of useful applications dont
    require it
  • Incentives are basis for cooperation
  • Strategy-prove protocols (e.g. tit-for-tat)
  • Accountability
  • Need to control membership
  • Certified identities (background check or fee)
  • proof-of-work, social networks?

64
Decentralized systems challenges
  • Need to protect data
  • Durability requires non-decreasing membership
  • Scalable storage, high availability, churn
    resilience pick two BlakeRodrigues, HotOS-IX
  • Manageability
  • Self-organization reduces administrative effort
  • Hardware management is decentralized
  • BUT Evidence that lack of centralized control
    may make it difficult to manage system-wide
    disruptions

65
Outline
  • Decentralized systems state-of-the-art
  • Accountability for distributed systems
  • Why accountability?
  • What is accountability?
  • How can we implement it?
  • How well does it work?
  • Accountable virtual machines
  • Social computing and applications

66
Byzantine faults occur in practice
  • Not all faults cause a node to stop
  • The faulty node continues to operate, but its
    behavior deviates from that of a correct node
  • Examples
  • Hardware malfunction
  • Misconfiguration
  • Software error
  • External security attack
  • Intentional software modification

67
Example LAX airport outage
Admin
  • Aug 2007 17,000 passengers stranded at LAX
  • Cause intermittent fault of a network card

68
Example Botnets in the Internet
Domain A
Domain B
Administrative domain
  • Compromised computer targets different domain
  • Admin A must localize fault, then convince admin
    B that her machine is faulty

69
Example Insider attack
Administrative domain
  • Mar 2002 UBS PaineWebber admin disrupts trade
    for days to weeks
  • Difficult to detect, defuse logical bombs

70
Why is detecting faults difficult?
Responsibleadmin
Incorrectmessage
  • How to detect faults?
  • How to identify the faulty node?
  • How to convince others that a node is (not)
    faulty?

71
Learning from the 'offline' world
  • Relies on accountability
  • Example Banks
  • Record can be used to (manually) detect, identify
    and convince
  • Is accountability useful in distributed systems?
  • Is it practical?

72
What does accountability mean?
  • Accountability tamper-evident record
    automated, reliable fault detection

73
Is accountability alone useful?
  • No, if faults are severe and irrecoverable
  • need byzantine fault tolerance (see Lorenzos
    course)
  • Yes, for
  • systems that provide best-effort service
  • systems that assume crash failures
  • systems that mask severe/irrecoverable faults
  • Accountability
  • reliably detects and localizes faults
  • provides incentives to avoid faults
  • builds trust, reputation

74
Which Systems can benefit?
  • Internet services (BGP, DNS, NTP, NNTP, SMTP)
  • Web services
  • Content distribution networks (CDN)
  • Grid computing
  • Peer-to-peer systems
  • Multi-player games
  • Cloud computing

75
Butler Lampson on accountability
  • "Dont forget that in the real world, security
    depends more on police than on locks, so
    detecting attacks, recovering from them, and
    punishing the bad guys are more important than
    prevention."
    -- Butler Lampson, "Computer Security in the Real
    World", ACSAC 2000

76
Outline
  • Decentralized systems state-of-the-art
  • Accountability for distributed systems
  • Why accountability?
  • What is accountability?
  • How can we implement it?
  • How well does it work?
  • Accountable virtual machines
  • Whats next? Social computing and applications

77
Ideal accountability
  • Fault Node deviates from expected behavior
  • Our goal is to automatically
  • detect faults
  • identify the faulty nodes
  • convince others that a node is (or is not) faulty
  • Can we build a system that provides the following
    guarantee?
  • Whenever a node is faulty in any way, the system
    generates a proof of misbehavior against that node

78
Can we detect all faults?
100101011000101101011100100100
  • Problem Faults that affect only a node's
    internal state
  • Would require online trusted probes at each node
  • Focus on observable faults
  • Faults that affect a correct node
  • Can detect observable faults without requiring
    trusted components

A
C
79
Can we always get a proof?
I sent X!
  • Problem He-said-she-said
  • Three possible causes
  • A never sent X
  • B refuses to acknowledge X
  • X was lost by the network
  • Cannot get proof of misbehavior!
  • Generalize to verifiable evidence
  • a proof of misbehavior, or
  • a challenge that a faulty node cannot answer
  • What if the challenged node does not respond?
  • Does not prove a fault, but node is suspected
    until it responds

A
X
?
B
I neverreceived X!
?!
C
80
Practical accountability
  • We propose the following requirement for an
    accountable distributed system
  • This is useful
  • Any (!) fault that affects a correct node is
    eventually detected and linked to a faulty node
  • It can be implemented in practice
  • Whenever a fault is observed by a correct node,
    the system eventually generates verifiable
    evidence against a faulty node

81
Outline
  • Decentralized systems state-of-the-art
  • Accountability for distributed systems
  • Why accountability?
  • What is accountability?
  • How can we implement it?
  • How well does it work?
  • Accountable virtual machines
  • Social computing and applications

82
An implementation PeerReview
  • Adds accountability to a given system
  • Implemented as a library
  • Provides tamper-evident record
  • Detects faults via state-machine replay
  • Assumptions
  • Nodes can be modeled as deterministic state
    machines
  • Nodes have reference implementations of the
    state machines
  • Correct nodes can eventually communicate
  • Nodes can sign messages

83
PeerReview from 10,000 feet
A is faulty
  • All nodes keep logs of their inputs outputs
  • Including all messages
  • Each node has a set of witnesses, which audit the
    node periodically
  • If the witnesses detect misbehavior, they
  • generate evidence
  • make the evidence avai-lable to other nodes
  • Other nodes check evi-dence, report fault

A's witnesses
C
D
E
M
M
A
M
B
A's log
B's log
84
PeerReview detects tampering
Message
  • What if a node modifies its log entries?
  • Log entries form a hash chain
  • Inspired by secure histories Maniatis02
  • Hash is included with every message
    authenticator ? Node commits to its
    current state ? Changes are evident

Hash(log)
B
A
ACK
Hash(log)
85
PeerReview detects omission
  • What if a node omits log entries?
  • While inspecting As log, As witnesses send msg
    authenticators signed by B to Bs witnesses
  • Thus, witnesses learn about all messages their
    node has ever sent or acknowleged
  • Omission of a message from the log is a fault

A's witnesses
B's witnesses
MB
MB
MB
MB
MB
MB
A
B
A's log
86
PeerReview detects inconsistencies
  • What if a node
  • keeps multiple logs?
  • forks its log?
  • Witnesses check whether all msg authenticators
    form a single hash chain
  • Two authenticators not connected by a log segment
    indicate a fault

87
PeerReview detects faults
  • How to recognize faults?
  • Assumption
  • Nodes can be modeled as deterministic state
    machines
  • To audit a node, witness
  • Fetches signed log
  • Replays inputs to a trusted copy of the state
    machine
  • Checks outputs against the log

Module A
State machine
Module B
Network
Log
Module A
Module B
Input
if ?
?
Output
88
PeerReview guarantees
  • Observable faults will be detected
  • Good nodes cannot be accused
  • Formal definitions and proof in the TR
  • If node commits a fault has a correct
    witness, then witness obtains
  • a proof of misbehavior (PoM), or
  • a challenge that the faulty node cannot answer
  • If node is correct
  • there can never be a PoM, and
  • it can answer any challenge

89
PeerReview is widely applicable
  • App 1 NFS server in the Linux kernel
  • Many small, latency-sensitive requests
  • Tampering with files
  • Lost updates
  • App 2 Overlay multicast
  • Transfers large volume of data
  • Freeloading
  • Tampering with content
  • App 3 P2P email
  • Complex, large, decentralized
  • Denial of service
  • Attacks on DHT routing
  • More information in Haeberlen et al., SOSP07
  • Metadata corruption
  • Incorrect access control
  • Censorship

90
Outline
  • Decentralized systems state-of-the-art
  • Accountability for distributed systems
  • Why accountability?
  • What is accountability?
  • How can we implement it?
  • How well does it work?
  • Accountable virtual machines
  • Social computing and applications

91
How much does PeerReview cost?
  • Log storage
  • 10 100 GByte per month, depending on
    application
  • Message signatures
  • Message latency (e.g. 1.5ms RTT with RSA-1024)
  • CPU overhead (embarrassingly parallel)
  • Log/authenticator transfer, replay overhead
  • Depends on witnesses
  • Can be deferred to exploit bursty/diurnal load
    patterns

92
P2p email, dedicated witnesses
100
80
Checking logs
60
Avg traffic (Kbps/node)
Baseline traffic
40
Signatures and ACKs
20
0
Baseline
2
1
3
5
4
W dedicatedwitnesses
Number of witnesses
  • Dominant cost depends on number of witnesses W
  • O(W2) component

93
P2p email, mutual auditing
Small randomsample of peers chosen as witnesses
Node
  • Small probability of error is inevitable
  • Example Replication
  • Can use this to optimize PeerReview
  • Accept that an instance of a fault is found only
    with high probability
  • Asymptotic complexity O(N2) ? O(log N)

94
PeerReview is scalable
Email system PeerReview(P1.0)
Email system PeerReview (P0.999999)
DSL/cableupstream
O((log N)2)
Avg traffic (Kbps/node)
Email systemw/o accountability
O(log N)
System size (nodes)
  • Assumption up to 10 of nodes can be faulty
  • Probabilistic guarantees provide scalability
  • Example email system scales to over 10,000
    nodeswith P0.999999

95
PeerReview summary
  • Accountability is a new approach to
    handlingfaults in distributed systems
  • detects faults
  • identifies the faulty nodes
  • produces evidence
  • PeerReview A system that enforces accountability
  • Offers provable guarantees and is widely
    applicable
  • Details in Haeberlen et al., SOSP 07

96
Challenges
  • Tension between accountability and privacy
  • PeerReview (PR) requires disclosure to witnesses
  • Zero-knowledge proofs?
  • Fault detection
  • PR uses state-machine replay for fault detection
  • Cant detect deterministic software bugs
  • Different implementations of underspecified
    protocols may diverge
  • Protocol specification or abstract model?

97
Challenges (contd)
  • Message signatures
  • PR assumes a public-key infrastructure
  • Web-of-trust (physical network, social network) ?
  • Partial deployment
  • Accountability zones, gateways ?
  • PR requires source code modifications
  • To enable deterministic replay
  • Accountable virtual machines?

98
NetReview
  • Accountability applied to inter-domain routing
  • Fault detection based on a spec of the routing
    policy
  • Web-of-trust-based certificates
  • Auditing limited to peering partners
  • Partial deployment accountability zones
  • Details in Haeberlen et. al., NSDI09

99
Outline
  • Decentralized systems state-of-the-art
  • Accountability for distributed systems
  • Why accountability?
  • What is accountability?
  • How can we implement it?
  • How well does it work?
  • Accountable virtual machines
  • Whats next? Social computing and applications

100
Accountable virtual machines (AVM)
  • Make unmodified binary VMs accountable
  • VMM provides deterministic logging/replay

VM
AVM
Unmodified binary
Packets
Authenticator
Accountable VMM
Log
101
What are AVMs good for?
  • Accountability for proprietary/legacy software
  • Accountable cloud computing
  • Customer can verify correct execution
  • Making an entire host computer accountable
  • Check for compromised software
  • Forensics

102
Trusted network probes
  • Making the Internet accountable, one host at a
    time

Authenticator
Packet
Secure log
Chain of authenticators validates log
Internet
Accountable Workstation
Cable/DSL modem or ISPs DSLAM
103
Related Work
  • Accountability Lampson 00, YumerefendiChase
    05, Yemerefendi et al. 07, Argyraki et al. 07,
    Michalakis et al. 07
  • Practical byzantine fault tolerance
    CastroLiskov 00, Ramasamy 07
  • General fault detection Kihlstrom et al. 07,
    Doudou et al. 99, MalkhiReiter 97
  • Intrusion detection, reputation systems Denning
    87, Ko et al. 94, Kamvar et al. 03
  • Trusted computing Garfinkel et al. 02
  • Fault-specific defenses CoxNoble 03,
    WaldmanMazieres 03
  • Tamper-evident logs SchneierKelsey 98,
    ManiatisBaker 02

104
Conclusion
  • Byzantine faults in distributed systems are real
  • Accountability is a new approach to handling
    faults
  • detects observable faults
  • identifies the faulty node
  • produces verifiable evidence
  • Presented a practical definition of
    accountability
  • Practical implementations exist
  • Many challenges remain

105
Outline
  • Decentralized systems state-of-the-art
  • Accountability for distributed systems
  • Social computing and applications
  • Exploiting social networks for distributed
    computing
  • Example enhancing Web search
  • Example thwarting unwanted communication

106
From service-centric to user-centric computing
  • Collaborative, social computing and communication
  • In peer-to-peer, users share technical resources
  • In social computing, users share knowledge,
    opinions, referrals, ratings

107
User-centric, social computing
  • Mass collaboration, enabled by technology
  • Human intelligence aggregated through technology
  • User contribution is the most important resource
  • (Underutilized resource of enormous scale?)
  • BUT Outcome depends on user behavior
  • depends on cooperation, good will
  • vulnerable to spoilers

108
Social networks two concepts
  • Users contribute
  • Content
  • Opinions, recommendations, ratings (ex- or
    implicit)
  • Users form social networks
  • Graph connecting users (ex- or implicit)
  • Links imply shared interest or trust

109
What are social networks?
  • Graphs connecting people
  • Edges connect friends
  • Imply shared interest or trust
  • Online friends may have never met in real life
  • E.g., email, Skype, IM
  • Online social networking sites
  • Network hosted by a Web site
  • Often used to share opinions, advice, ratings,
    multimedia content

Social Network
Online Social Network
110
Huge opportunity
  • to leverage collective user input, e.g.
  • to deal with unwanted communication
  • to thwart security attacks
  • to enable better organization, filtering, search,
    ranking, and distribution of content
  • may provide an answer to the ever-increasing
    flood of information

111
Outline
  • Decentralized systems state-of-the-art
  • Accountability for distributed systems
  • Whats next? Social computing and applications
  • Exploiting social networks for distributed
    computing
  • Example enhancing Web search
  • Example thwarting unwanted communication

112
Whats it got to do with Systems?
  • Social networks enhance distributed systems
  • Sybil attacks
  • Unwanted communication
  • Personalization
  • Social computing may need distribution
  • Privacy
  • Avoid dependence on a single provider

113
Leveraging social networks to enhance systems
  • Trust can help thwart security problems
  • Sybil attacks SybilGuard SIGCOMM06
  • Clones unlikely to have diverse links
  • Trust can help block unwanted communication
  • Friends unlikely to send SPAM RE NSDI06
  • Using social networks to thwart SPAM (Ostra)
  • Shared interest can improve search
  • Web search PeerSpective HotNets06
  • Related users likely to visit relevant content

114
Leveraging social networks More ideas
  • Sharing solutions and problem fixes
  • Configurations that work
  • Fixes that others have found
  • Copy what works for others
  • Combine technology and social networks to truly
    stand on the shoulders of giants
  • Answer to the increasing complexity of the
    information age?

115
Outline
  • Decentralized systems state-of-the-art
  • Accountability for distributed systems
  • Whats next? Social computing and applications
  • Exploiting social networks for distributed
    computing
  • Example enhancing Web search
  • Example thwarting unwanted communication

116
Example social network based Web search
  • PeerSpective experiment
  • Idea users can query their friends previously
    viewed pages
  • Results from friends appear alongside Google
    results

Google
PeerSpective
117
PeerSpective implementation
  • Prototype is a lightweight HTTP proxy
  • Runs on users desktop and indexes all browsed
    content
  • When Google search is performed,
  • query other PeerSpective proxies in parallel with
    Google
  • present PeerSpective results alongside Google
    results

PeerSpective
PeerSpective
PeerSpective
118
PeerSpective results summary
  • Explored potential of integrating Web and social
    network search
  • Evidence that PeerSpective added value
  • Additional coverage for viewed sites
  • Improved ranking of results
  • Aided in finding content serendipitously
  • However, just an experiment
  • Many challenges remain
  • Opportunities as well
  • Details in Mislove et al., HotNets 06

119
Outline
  • Decentralized systems state-of-the-art
  • Accountability for distributed systems
  • Whats next? Social computing and applications
  • Exploiting social networks for distributed
    computing
  • Example enhancing Web search
  • Example thwarting unwanted communication

120
Unwanted communication
  • Well-known problem
  • Email spam
  • Increasingly affects other systems
  • Search-engine spam
  • Mislabeled videos plaguing YouTube
  • Unwanted invitations in Skype
  • Existing solutions insufficient
  • Content filtering for videos?

121
Known defenses
  • Content filtering
  • Works very well for email, but
  • False positives reduce communication reliability
  • Doesnt work for multimedia
  • Holding senders accountable
  • Requires strong user identities
  • Imposing a per-communication cost
  • Refunded if communication is wanted
  • Requires micro-payments/quota market

122
Ostra Using social relationships
  • Assumptions
  • Cost for acquiring and maintaining social links
  • Cannot create links arbitrarily fast
  • Cannot maintain arbitrary number of links
  • Receivers are willing to classify content
  • Explicit (Junk button)
  • Implicit (Deletion, response)

123
Ostra Pair-wise credit exchange
-202
  • Credit balance/bound associated with each link
  • Credit balances decay at constant rate (10/day)
  • Sum of all credit 0 (invariant)

124
Ostra Pair-wise credit exchange
Receiver
-202
  • Message unwanted -gt sender pays receiver one
    credit
  • Sending spam exhausts senders link balance

125
Ostra End-to-end credit exchange
-2-12
-202
-212
-202
Rate of spam a user can send is proportional
to number of links (s)he has
126
Sybil attacks are not effective

Sybils
Total unwanted communication by Sybils is bounded
by the number of links with other users
127
Ostra
  • Thwarts unwanted communication existing systems
  • Examples Email, Skype, IM, YouTube
  • Uses existing relationships among users
  • Online social networks
  • Graph of email/IM/Skype users
  • Does not require strong user identities
  • Does not rely on automatic content classification
  • Respects recipients idea of wanted/unwanted
    communication
  • Details in Mislove et al., NSDI 08

128
SN and applications research agenda
  • Measurement/Analysis
  • Theory of complex networks
  • Empirical study of social networks
  • Understanding SN evolution
  • Understanding SN information flow
  • Design
  • Personalized search, filtering, content
    distribution
  • Using social networks to thwart unwanted behavior
  • Online social networks and privacy

129
Outline
  • Decentralized systems state-of-the-art
  • Accountability for distributed systems
  • Social computing and applications
  • Exploiting social networks for distributed
    computing
  • Example enhancing Web search
  • Example thwarting unwanted communication

130
Max Planck Institute for Software
Systems(MPI-SWS)
  • Part of Max Planck Society
  • Academic research institute, pub. funded
  • Focus on basic research
  • Kick-off in Aug 2005
  • 17 faculty positions (tenure-track)
  • 100 doctoral/post-doc positions
  • Administrative and technical support staff
  • Top international research institution

131
MPI-SWS Faculty

Dependable systems

Program analysis and verification
Rodrigo Rodrigues
Networked systems
Large scale Internet systems
Andrey Rybalchenko
Krishna Gummadi
Paul Francis
  • Distributed
  • systems

Functional Programming
Security and Cryptography
Peter Druschel
Derek Dreyer
Michael Backes (Fellow)
132
Graduate program (MS/PhD)
  • Advised by MPI-SWS faculty
  • Stimulating, competitive environment
  • International, diverse student body (80)
  • English language
  • Financial aid
  • Internships available

http//www.mpi-sws.org
133
Thanks for your attention!
Write a Comment
User Comments (0)
About PowerShow.com