Title: Geographic Hash Table
1Geographic Hash Table
- S. Ratnasamy, B. Karp, S. Shenker, D. Estrin, R.
Govindan, L. Yin and F. Yu
2Overview
- Data Centric Storage
- Data is stored inside the network each name
corresponds to a location in space - All data with the same name will be stored at the
same sensor network location - E.g an elephant sighting
- Why Data centric Storage?
- Energy efficiency
- Robustness against mobility and node failures
- Scalability
3Keywords and Terminology
- Observation
- ? low-level readings from sensors
- ? e.g. Detailed temperature readings
- Events
- ? Predefined constellations of low-level
observations - ? e.g. temperature greater than 75 F
- Queries
- ?Used to elicit information from sensor network
4Performance MetricTotal Usage /Hotspot Usage
- Use communication as a cost function for energy
consumption - Total Usage
- Total number of packets sent in the
Sensor network - Hotspot Usage
- The maximal number of packets send by a
particular sensor node - Costs used in the evaluation
- Message flooding cost O(n)
- Point-to-point routing cost
- n is the number of nodes
5Alternative Storage Schemes
- External Storage (ES)
- Events propagated and stored at an external
location - Local Storage (LS)
- Events stored locally at the detecting node
- Queries are flooded to all nodes and the events
are sent back - Data Centric Storage (DCS)
- Data for an event stored within the sensor
network - Queries are directed to the node that stores the
data
6External Storage (ES)
External storage
event
7Local Storage (LS)
Queries flooded at all the nodes
event
event
8Why do we need DCS?
- Scalability
- Robustness against Node failures and Node
mobility - To achieve Energy-efficiency
9Design Criterial Scalability Robustness
- Node failures
- Topology changes
- System scale to large number of nodes
- Energy Constraints
- Persistence
- (k,v) pair must remain available to queries,
despite sensor node failures and changes in
sensor network topology - Consistency
- A query k must be routed correctly to a node
where (k,v) pairs are stored if these node
change, then they should do this consisently - Scaling in Database Size
- Topological generality system should scale well
on a large number of topologies
10Assumptions in DCS
- Large Scale networks whose approximate
geographic boundaries are known - Nodes have short range communication and are
within the radio range of several other nodes - Nodes know their own locations by GPS or some
localization scheme - Communication to the outside world takes place by
one or more access points
11Data Centric Storage
- Relevant Data are stored by name at nodes
within the Sensor network - All data with the same general name will be
stored at the same sensor-net node. - e.g. (elephant sightings)
- Queries for data with a particular name are then
sent directly to the node storing those named
data
12Data centric Storage
Elephant Sighting
sourcelass.cs.umass.edu
13Geographic Hash Table
- Events are named with keys and both the storage
and the retrieval are performed using keys - GHT provides (key, value) based associative memory
14Geographic Hash Table Operations
- GHT supports two operations
- ? Put(k,v)-stores v (observed data) according
to the key k - ? Get(k)-retrieve whatever value is
associated with key k - Hash function
- ? Hash the key in to the geographic
coordinates - ? Put() and Get() operations on the same
key k hash k to the same location
15Storing Data in GHT
Put (elephant, data)
(12,24)
Hash (elephant)(12,24)
sourcelass.cs.umass.edu
16Retrieving data in GHT
(12,24)
Hash (elephant)(12,24)
Get (elephant)
17Geographic Hash Table
Node A
Node B
18Algorithms Used By GHT
- Geographic hash Table uses GPSR for
Routing(Greedy Perimeter Stateless Routing) - PEER-TO-PEER look up system
- (data object is associated with key and each
node in the system is responsible for storing a
certain range of keys)
19Algorithm (Contd)
- GPSR- Packets are marked with position of
destinations and each node is aware of its
position - Greedy forwarding algorithm
- Perimeter forwarding algorithm
-
B
B
A
A
20GPSR Right-Hand Rule In Perimeter Forwarding
2
x
z
3
1
y
21Home Node and Home perimeter
- Home node Node geographically nearest to the
destination coordinates of the packet - Serves as the rendezvous point for Get() and
Put() operations on the same key - In GHT packet is not addressed to specific node
but only to a specific location - Use GPSR to find the home node
- only perimeter mode of GPSR to find Home
Perimeter - Home Perimeter perimeter that encloses the
destination - Start from the home node, and use perimeter mode
to make a cycle and return to the home node
22Problems
- Robustness could be affected
- Nodes could move (i.d. of Home node?)
- Node failure can Occur
- Deployment of new Nodes
- Not Scalable
- Storage capacity of the home nodes
- Bottleneck at Home nodes
23Solutions to the problems
- Perimeter refresh protocol
- mostly addresses the robustness issue
- Structured Replication
- address the scalability issue
- how to handle storage of many events
24Perimeter refresh protocol
- Replicates stored data for key k at nodes around
the location to which k hashes - Stores a copy of the key value pair at each node
on the home perimeter - Each node on the perimeter is called a replica
node - How do you ensure consistency persistence
- A node becomes the home node if a packet for a
particular key arrives at that node - The perimeter refresh protocols periodically
sends out refresh packets - After a time period Th generate a refresh packet
that contains the data for that key - Packet forwarded on the home perimeter in the
same way as Get() and Put() - The refresh packet will take a tour of the home
perimeter regardless the changes in the network
topology since the keys insertion - This property maintains the perimeter
25Perimeter Refresh Protocol
- How do you guard against node failures
- When a replica node receives a packet it did not
originate, it caches the data in the refresh and
sets up a takeover timer Tt - Timer is reset each time a refresh from another
node arrives - If the timer expires the replica node initiates a
refresh packet addressed to the keys hashed
location - Note That particular node does not determine a
new home node. The GHT routing causes the refresh
to reach a node home node
26Perimeter Refresh Protocol
E
Replica
Assume key k hashes at location L A is closest
to L so it becomes the home node
Replica
D
L
F
A
home
B
C
27Perimeter Refresh Protocol
E
Replica
D
Replica
Suppose the node A dies
L
F
home
C
Replica
B
Replica
28Time Specifications
- Refresh time (Th)
- Take over time (Tt)
- Death time (Td)
- General rule
- TdgtTh and TtgtTh
- In GHT Td3Th and Tt2Th
-
29Characteristics Of Refresh Packet
- Refresh packet is addressed to the hashed
location of the key - Every (Th) secs the home node will generate
refresh packet - Refresh packet contains the data stored for the
key and routed exactly as get() and put()
operations - Refresh packet always travels along the home
perimeter
30Structured Replication
- Too many events are detected then home node will
become the hotspot of communication. - Structured replication is used to address the
scaling problem - Hierarchical decomposition of the key space
- Event names have a certain hierarchy depth
31Structured Replication
32Structured Replication
- A node that detects a new event, stores that
event to its closest mirror - this is easily computable
- This reduces the storage cost, but increases the
query cost - GHT has to route the queries to all mirror nodes
- Queries are routes recursively
- First route query to the root, then to the first
level and then to the second level mirrors - Structured replication becomes more useful for
frequently detected events
33Evaluation
- Simulation to test if the protocol is functioning
correctly - Done in the ns-2 network simulator using an IEEE
802.11 mac - This is a well known event driven simulator for
ad-hoc networks - Larger scale simulations for the comparative
study where done with a custom simulator
34Comparative Study
- Simulation compares the following schemes
- External Storage (ES)
- Local Storage (LS)
- Normal DCS A query returns a separate message
for each detected event - Summarized DCS(S-DCS) A query returns a single
message regardless of the number of detected
events - Structured Replication DCS (SR_DCS) Assuming an
optimal level of SR - Comparison based on Cost
- Comparison based on Total usage and Hot spot
usage
35Assumptions in comparison
- Asymptotic costs of O(n) for floods and O( n) for
point to point routing - Event locations are distributed randomly
- Event locations are not known in advance
- No more than one query for each event type
- (Q Queries in total)
- Assume access points to be the most heavily used
area of the sensor network
36Comparison based onHot-spot/Total Usage
- n - Number of nodes
- T - Number of Event types
- Q Number Of Event types queried for
- Dtotal Total number of detected events
- DQ- Number of detected events for queries
37DCS TYPES
- Normal DCS Query returns a separate message for
each detected event - Summarized DCS Query returns a single message
regardless of the number of detected events - (usually summary is preferred)
38Comparison Study contd..
ES LS DCS
Total
Hot spot
39Observations from the Comparison
- DCS is preferable only in cases where
- Sensor network is Large
- There are many detected events and not all event
types queried - Dtotalgtgtmax(Dq,Q)
40Simulations
- To check the Robustness of GHT
- To compare the Storage methods in terms of total
and hot spot usage
41Simulation Setup
- ns-2
- Node Density 1node/256m2
- Radio Range 40 m
- Number of Nodes -50,100,150,200
- Mobility Rate -0,0.1,1m/s
- Query generation Rate -2qps
- Event types 20
- Events detected -10/type
- Refresh interval -10 s
42Performance metrics
- Availability of data stored to Queriers
- (In terms of success rate)
- Loads placed on the nodes participating in GHT
(hotspot usage)
43Simulation Results for Robustness
- GHT offers perfect availability of stored events
in static case - It offers high availability when nodes are
subjected to mobility and failures
44Simulation Results under varying Q
Number of nodes is
constant 10000
45Simulation results under varying N
Number of Queries Q 50
46Simulation Results for comparison of 3-storage
methods
- S-DCS have low hot-spot usage under varying Q
- S-DCS is has the lowest hot-spot usage under
varying n
47Conclusion
- Data centric storage entails naming of data and
storing data at nodes within the sensor network - GHT- hashes the key (events) in to geographical
co-ordinates and stores a key-value pair at the
sensor node geographically nearest to the hash - GHT uses Perimeter Refresh Protocol and
structured replication to enhance robustness and
scalability - DCS is useful in large sensor networks and there
are many detected events but not all event types
are Queried