Title: PANDA
1PANDA
- A generative communication system using a
structured p2p overlay
2Agenda
3Introduction
- Want to design a fast, scalable distributed
system - Need an interface for communication
- RPC?
- Easy to understand, integrate
- Distributed computing hidden in a local procedure
calls - Remote calls too slow
- Linda?
- No hidden communication
4Linda communicates using tuples
- Tuple an ordered list of typed parameters
- Tuple-space
Node
out
in
5Implementing Linda for scalability
- Tuples stored on a single server
- Not scalable
- Tuples stored on multiple servers
- Scalable
- How can multiple servers communicate effectively?
- Distributed Hash Tables (DHT)
6Issues associated with DHTs
- Tuple storage
- Keys based on a data hash
- Exact searches straight forward
- Partial searches very difficult
- Keys predetermined by shared naming scheme
- Loss of anonymity
- Security
- Decoupling requires trust among participants
7Pastry, PAST, and Scribe combine to build
tuple-space
- Pastry
- Bounded message routing
- Load-balancing
- Good locality in a decentralized fashion
- PAST
- A DHT service built on top of Pastry
- Provides content hashing and replication
functionality - Scribe
- A multicast infrastructure
- Provides communication between nodes storing and
requesting tuples
8Panda liNDA on PAstry
- Panda
- Provides Lindas functionality as a Java library
- Built on top of Pastry p2p overlay
- Supports multiple non-overlapping tuple-spaces
- Panda creates a persistent, efficient, scalable
p2p communication space
9Hypothesis
- Panda will gain efficiency as the system grows
10Agenda
11Characteristics of the Panda tuple-space
- Panda based on LighTS interface
- A lightweight Linda implementation in Java
- Tuple operations
- Default addition and block removal
- out(Tuple t)
- rd(Tuple t)
- in(Tuple t)
- Group addition and block removal
- outg(Group g)
- rdg(Group g)
- ing(Group g)
12Tuples A formal definition
- A tuple is an ordered sequence of typed fields
- Can be any java implementation of the
Serializable interface - A field can
- Contain a value (actual field)
- Be characterized by type (formal field)
- Tuples with formals typically used as a template
to query tuple space through pattern matching
13Active tuples
- Active tuple a tuple which can be executed on
the accepting node - Nodes in tuple space trust each other not to be
malicious - Cannot guarantee competency
- Active tuples not implemented due to tome
constraints
14Agenda
15Design objectives
- Focus on providing the benefits of a distributed
application - Maintain the generative communication
functionality - Definitions
- Tuple signature - data structure of the contents
of the tuple - Tuple signature group - set of all tuples in a
tuple space that share a tuple signature
16Two possible naïve insertion/retrieval policies
- Store by cID
- Hash of the tuples content
- Provides load balancing
- Infeasible search overhead
- Store by rID
- Hash of the tuples signature
- Not sufficient for load balancing
- Efficient query resolution
- ID collisions among tuples with the same signature
17Both IDs are used
- The rID is used to route tuple to a specific node
- Tuple is stored in a hashtable by cID
- All nodes of a given signature are stored on the
same node
18Tuple retrieval
- For each existing node signature, there is a
Request scribe group and an Announcement group - At tuple-insertion, the storing node joins (and
listens on) the Request group, but publishes the
arrival on the Announcement group - A requesting node joins (and listens on) the
Announcement group but publishes the request on
the Request group
19The primary storer is the first to subscribe to
the two groups
- Primary storer
- Receives first tuple of given signature
- Subscribes to request group for that signature
- Replicates tuple
- All secondary storers subscribe to the request
group
Request group as primary
20Both groups have a unique message type
- Groups receive rIDs concatenated with a suffix
for each of the two groups - A query message contains
- The tuple pattern to be matched
- The node ID of the query originator
- A notice of insertion contains
- A newly inserted tuple
- The node ID of its primary host
Query Message Node 1000
Notice of Insertion Node 1005
21A consumer joins the announce group before
sending a request
- Consumer joins an announce group
- Consumer sends out a request
- Request multicasted
- Nodes return a tuple to announce group
- Announce group multicasts tuple
Announcement Group
rID.q Node 1000
Request group as primary
22The primary storer announces the arrival of a new
tuple
- Primary receives a tuple
- Notifies announcement group of inserted tuples
- Subscribers compare the new tuple and select the
first matching tuple - NOTE multicasting in Scribe is only best effort
Notice of Insertion Node 1005
P
Announcement Group
23Fault tolerance
- Use PAST for replication
- Keep-alive messages verify member liveness
- Unresponsive nodes assumed dead
- Failed nodes replaced with closest outside node
- Internally, PAST shares the hash tables stored
under each rID - Changes in node membership will affect only the
groups sharing a given signature
P
X
24When a primary node fails
- If a majority of replica nodes notice a primary
failed - Node with closest ID to primary
- Assumes primary status
- Joins the appropriate scribe groups and
re-announces all tuples
X
P
P
25Load balancing
- Storage space
- Tuples are diverted when storage is close to
maximum capacity - Query processing allocation
- Query load is partitioned among nodes able to
serve a group of queries
26Storage load balancing
- Tuples are diverted when a node is close to
capacity - The space of the tuple content hash is divided
- Deterministic or dependent partitioning
- Alternate nodes are randomly selected in a policy
similar to PAST - Each partial storer joins the appropriate Scribe
groups - Opted not to implement storage balancing
Hashtable(rID) FULL!!
rID
rID
rID
rID
27Query load balancing
- Goal divide a set of queries
- Recall queries are routed to all nodes storing a
matching tuple signature - 1st option partition the query resolution among
all replica storers - Synchronization issues
- 2nd option generate distinct groups of queries
to be resolved by different nodes - Very complex
- Opted not to implement query load balancing
P
rID.q Node 1006
Request group as primary
rID.q Node 1005
rID.q Node 1004
rID.q Node 1001
rID.q Node 1003
28Deletions
- PAST semantics allow for relaxed file removal
- Avoid issues with PAST by removing the tuple
entry in the hash table, not the hashtable itself
In
add,2,2,4
29Tuple persistence
- Linda guarantees tuples will persist indefinitely
- Storage may eventually fill up
- Tuples will be removed in LRU order
- System must come to a consensus as a whole
out
out
out
out
out
out
30Blocking request persistence
- Two issues
- Announcement message never delivered to a client
- Refresh messages sent periodically for blocking
queries - Request message failed
- Periodic transmission of refresh message reduces
the probability of this happening - Could also have timeouts on blocking reads
(violates Linda semantics)
Notice of Insertion
X
Query Message
X
31Security
- Participants in a tuple space must trust each
other to maintain decoupling between sender and
data - However, they may not trust participants in other
tuple spaces - Solution Allow each tuple space to provide a
shared secret to participants for encryption and
integrity (not implemented)
32Agenda
33(No Transcript)
34Backup
35What is Panda?
- Panda is Linda on Pastry
- Linda a generative communication blackboard
- Pastry a peer to peer distributed hash table
overlay - Panda creates a persistent, efficient, scalable
p2p communication space
36Pastry and DHT
- Pastry is a robust, self-organizing network
overlay - Nodes assigned locations in a circular namespace
using SHA-1 hash of the nodes public key - Each128-bit nodeID is unique
- Routing uses nodeIDs as keys to a Distributed
Hash Table (DHT)
37Pastry What a Node Knows
- Each Pastry node has a routing table
- log2b N rows of 2b - 1 entries
- Each entry on a row i, 0 i match of length i with current node
- Routing cost based on a proximity metric can be
calculated for each entry
38Pastry A Routing Example
0
2128 - 1
Route (d46a1c)
39Pastry Fault Tolerance
- Each Pastry node maintains a leaf set
- Largest and smallest adjacent nodeIDs
- Improved fault tolerance when data is replicated
amongst leaf set members
a927dd
a92e1c
a941f1
a9523c
a9a231
40The Linda Model
- Linda uses generative communication for parallel
programming - Persistent objects rather than transient messages
- Processes dont need to know how to contact each
other - Programming language independent
41Linda Tuples
- A Tuple is an ordered list of typed parameters
-
-
-
- Tuples may contain formals (wild cards)
-
42Linda Tuple Space
- Tuples are shared via tuple space
- out(Tuple t) places a tuple t in tuple space
- rd(Tuple t) reads a tuple that matches t
- in(Tuple t) reads and removes a tuple
matching t - eval(Tuple t) executes a code snippet in t,
leaving behind a result tuple
43Panda P2P Linda
- Panda features
- Java API based on LighTS, an existing Java-Linda
API - Distributed tuple spaces using Pastry DHT
- Multiple non-intersecting tuple spaces
- Shared group keys and tuple MACs
44Panda Tuple Spaces DHT
- Multiple, distributed tuple spaces
- Hashing of tuples based on text keys for
efficient inserts and deletes - Best effort persistence deletion policy based on
timestamps and timeout values
45Panda Bibliography
- G. P. Picco. http//lights.sourceforge.net.
- M.Castro, P. Druschel, Y. C. Hu and A Rowstron.
Topology-aware routing in structured peer-to-peer
overlay networks, 2002. - A Rowstron and P. Druschel. Pastry Scalable,
decentralized object location and routing for
large-scale peer-to-peer systems. In Proc.
IFIP/ACM Middleware 2001, Heidelberg, Germany,
Nov. 2001. - A. Rowstron and P. Druschel. Storage management
and caching in PAST, a large-scale, persistent
peer-to-peer storage utility. In Proc. ACM
SOSP01, Banfff, Canada, Oct. 2001. - N. Carriero and D. Gelernter. Linda in Context.
In Communications of the ACM, April 1989.