Title: Probabilistic Location and Routing
1Probabilistic Location and Routing
- INFOCOM 2002
- Sean C. Rhea, John Kubiatowicz
2Outline
- Introduce
- Algorithm Description
- Experimental Setup
- Results
- Future Work
- Related Work
- Conclusion
3Introduction
- Two important challenges
- How should we locate replicas
- How should we route queries to replicas
- Location-independent routing techniques
- CAN, Chord, Pastry, and Tapestry
- Location and routing operation require O(logN)
4Introduction (cont.)
- As the replica approaches the location of the
query source, the performance of the existing
algorithms quickly diverges from optimality - Divergence
- A small amount of mis-routing in the local
area can lead to a large divergence from
optimality, since the optimal path is short to
begin with
5Introduction (cont.)
- Our probabilistic location and routing alg. Is
based on attenuated Bloom filters - It is decentralized
- It is locality aware
- It follows a minimal search path
- It uses constant storage per server
- Attenuated Bloom filters allow us to achieve
- Quickly finding nearby replicas when they exist
- Finding every document even when replicas are
scare
6Introduction (cont.)
7Algorithm Description
- Bloom Filters
- bit-vector of length w
- N different hash function
- False positive
- Width, the number of hash function, cardinality
of the represented set
8Bloom filter
9Attenuated Bloom filter
- Attenuated Bloom filters of depth d is an array
of d normal Bloom filters - We associate each neighbor link with an
attenuated Bloom filter - The first filter in the array summarizes
documents available from that neighbor - The ith Bloom filter is the merger of all Bloom
filter for all of the nodes a distance I through
any path starting with that neighbor link, where
distance is in terms of hops in the overlay
network
10Attenuated Bloom filter
11The Query Algorithm
- To perform a location query, the querying node
examines the 1st level of each of its neighbors
attenuated Bloom filters - If one of the filters matches, it is likely that
the desired data item is only one hop away, and
the query is forwarded to the matching neighbor
closest to the current node in network latency - If no filter matches, the querying node looks for
a match in the 2nd level of every filter
12The Query Algorithm (cont.)
- As before, if a match is found, the query Is
forwarded to the matching neighbor of lowest
latency. - This time, however, it is not the immediate
neighbor who is likely to possess the data item,
but one of its neighbors. - This next neighbor is determined as before, by
examining the attenuated Bloom filters of the
current server
13The Query Algorithm (cont.)
- False positive
- Forward the request to the deterministic alg.
- The query can be returned to the previous server
in the query path (DFS) - Each query in the system contains a list of all
the servers that it has already visited
14The Update Algorithm
- Every server in the system stores both an
attenuated Bloom filter for each outgoing link
(e.g. FAB in fig.3), and a copy of its neighbors
view of the reverse direction - The server calculates the changed bits in its own
filter and in each of the filters its neighbors
maintain - It then send out these bits out to each neighbor
15The Update Algorithm (cont.)
- On receiving such a message, each neighbor
attenuates the bits one level and computes the
changes they will make in each of its own
neighbors filter. - These changes are sent out as well
- Etc
16The Update Algorithm (cont.)
- One problem with this algorithm
- The update will be propagated to some servers
more than once (fig 3 a document was added to
Node D) - false positive rate
- Two distinct update filtering algorithm
- Destination filtering
- Source filtering
- When a deletion causes bits at any level of a
Bloom filter to transform from one to zero, we
must be careful to propagate this deletion to all
appropriate nodes
17Experimental Setup
- We simulated it in conjunction with two different
deterministic algorithms - Home-node location
- a home-node server that keeps a set of pointers
to every replica of the document - Tapestry
- Assumption that every server and document in the
system can be named with a unique,
location-independent identifier
18Tapestry (cont.)
- Node-IDs for the node names and globally unique
identifiers (GUIDs) for the documents - Two major components
- A routing mesh
- A distributed directory service
19Tapestry Routing Mesh
20Publication in Tapestry
21Location in Tapestry
22Simulation Environment
23Simulation Environment (cont.)
- All stub to stub edges are 100Mb/s
- All stub to transit edges are 1.5Mb/s
- All transit to transit edges are 45Mb/s
- In our experiments, we focus on stub to transit
domain bandwidth, since these inter-domain edges
are the most bandwidth constrained in the system
24Experiment Descriptions
- Static experiments
- Dynamic experiments
- Based on whether the set of replicas in the
system changes during the test
25Resultsprobabilistic update algorithm
26Static Experiments
27Static Experiments (cont.)
28Static Experiments (cont.)
29Static Experiments (cont.)
30Static Experiments (cont.)
31Static Experiments (cont.)
32Dynamic Experiments
33Dynamic Experiments (cont.)
34Dynamic Experiments (cont.)
35Future Work
- The design of algorithms to adhere to such
restrictions while producing an overlay network
in a self-organizing manner is thus an important
component of our future work - Since an update to a cache causes Tapestry to
send only O(logN) message, whereas the
probabilistic algorithm must send some amount of
information to every server in its filters
range, using these more advanced algorithms
should only improve the bandwidth consumption of
the probabilistic algorithm relative to Tapestry
36Related Work
- Bloom filters have long been used as a lossy
summary technique. First to combine them into a
compound, topology-aware data structure - In 20, Bloom filters were used to improve the
efficiency of distributed join operations by
filtering elements without consuming network
bandwidth. - In 21, Aoki used Bloom filters to guide
searches through generalized search trees
37Related Work (cont.)
- Both the Summary Cache and Cache Digests use
Bloom filters to summarize the contents of a set
of cooperating web caches - The Secure Discovery Service (SDS) uses Bloom
filters to route queries to appropriate services,
such as printers or scaners
38Conclusion
- The algorithm is based on a new data structure we
call an attenuated Bloom filter - Furthermore, we have shown that our algorithm may
be combined with a deterministic algorithm - Finally, we have demonstrated that