Scalable ContentAddressable Networks - PowerPoint PPT Presentation

About This Presentation

Title:

Scalable ContentAddressable Networks

Description:

Number of Views:38

Avg rating:3.0/5.0

Slides: 17

Provided by: labu373

Learn more at: https://www.eecg.toronto.edu

Category:

Tags: contentaddressable | networks | scalable | torus

Transcript and Presenter's Notes

Title: Scalable ContentAddressable Networks

1
Scalable Content-Addressable Networks

2
High-Level Overview

Hash tables (map keys to values) are heavily used
in building software applications
The concept of a Content-Addressable Network
(CAN) provides hash table-like functionality on
Internet-like scales.
CAN is
Scalable
Robust/Fault-tolerant
Self-organizing
Low-latency

3
Hash Tables and CAN

4
What CAN would do for us

CAN would improve peer-to-peer systems
Napster the process of locating a file is
centralized
Expensive to scale the central repository, single
point of failure
Gnutella decentralized the file location process
(network self-organizes into an application layer
mesh)
Requests for files are done through flooding, not
scalable, may not find content
Conclusion P2P systems need a scalable indexing
mechanism
CAN would improve large data repositories
These systems need efficient insertion and
retrieval
CAN would create large-scale name resolution
services that dont use a naming scheme (ie. Not
DNS)
No more location-dependent naming schemes

5
Basic Operations Performed On CANs

Basic Operations
Insertion (of key,value pairs)
Lookup (of key,value pairs)
Deletion (of key,value pairs)
Each CAN stores
A piece (called a zone) of the entire hash table
Holds information about a small number of
adjacent zones in the table
Routing in a CAN
Done by intermediate CAN nodes towards the CAN
node whose zone contains that key
CAN Design is
Distributed (requires no centralized control or
coordination)
Scalable (nodes hold only a small about of
information that doesnt grow with the network)
Fault-tolerant (nodes can route around failures)
Doesnt require a naming hierarchy
Is entirely Application Layer

6
CAN Design

Centers around a virtual d-dimensional Cartesian
coordinate space on a d-torus
At any time, the entire coordinate space is
dynamically partitioned among all the nodes in
the system
Each node owns a distinct zone

7
CAN Design (2)

To store a pair, key K1 is mapped to P via a
uniform hash function
The pair is then stored at the node that owns the
zone where P lies
To retrieve an entry corresponding to K1, any
node can apply the same hash function to map K1
to P and get the corresponding value
A node learns and maintains the IP addresses of
those nodes that hold adjoining coordinate zones
Efficient routing is critical to a useful CAN

8
Routing in a CAN

Routing in a Content Addressable Networks works
by following the straight line path through the
Cartesian space from source to destination
coordinates.
A CAN node maintains a coordinate routing table
that holds the IP address and virtual coordinate
zone of each of its immediate neighbors in the
coordinate space.
Average Path Length (d/4)(n1/d)
Individual Nodes Have 2d Neighbors
Average Path Length Grows As O(n1/d)

9
Construction of a CAN Overlay

The entire CAN space is divided amongst the nodes
currently in the system
Incremental construction process takes three
steps
The new node finds a node already in the CAN
Using the CAN routing mechanisms, finds a node
whose zone will be split
The neighbors of the split zone must be notified
so that routing can include the new node
Bootstrapping There are CAN bootstrap nodes
associated to a DNS domain name
Node Insertion Affects Only O(number of
dimensions) existing nodes

10
Maintenance of a CAN Overlay

Node Graceful Departure node explicitly hands
over its zone and the associated (key,value)
database to one of its neighbors
Node Abrupt Disappearance An immediate takeover
algorithm ensures one of the failed nodes
neighbors takes over the zone
Under normal conditions, a node sends periodic
update messages to each of its neighbors and a
list of neighbors and their zone coordinates.
Prolonged absence of an update message from a
neighbor signals its failure

11
Design Improvements

12
Design Improvements (2)

Improvement Multi-dimensioned Coordinate Spaces
Increasing the dimensions of the CAN coordinate
space reduces the routing path length and path
latency for a small increase in the size of the
coordinate routing table
Path Length scales as O(d(n1/d))
Fault-tolerance improves
Improvement Multiple Coordinate Spaces (a.k.a.
Multiple Realities)
Maintain multiple independent coordinate spaces
with each node in the system being assigned a
different zone in the coordinate space (each
coordinate space is a reality)
Fault-tolerance improves
Low per-node state (O(d) for a d-dimensional
space)
Short path lengths (O(dn1/d) hops for d
dimensions and n nodes)
Which is better?
Increasing the dimensions

13
Design Improvements (3)

Improvement Better CAN Routing Metrics
Have each node measure the network-level
round-trip-time RTT to each of its neighbors.
Then route messages accordingly.
Favors lower latency paths and avoids
unnecessarily long hops
Improvement Caching and Replication
A CAN node can maintain a cache of the data keys
it recently accessed
A CAN node can replicate the data key at each of
its neighboring nodes
Both schemes need an associated time-to-live
field, to eventually expire from the cache

14
Related Systems

Domain Name System
CANs are more general than the DNS because DNS
closely ties the naming scheme to the manner in
which a name is resolved to an IP address
Peer-to-Peer
A simple example is keys being analogous to a URL
Will improve robustness
Key difference is that content within the CAN can
always be located by any other node because there
is a clear home (point) in the CAN for that
content and every other node knows what the home
is how to reach it

15
Discussion