Title: Data Centric Storage using GHT Lecture 13 October 14, 2004 EENG 460a / CPSC 436 / ENAS 960 Networked Embedded Systems
1Data Centric Storage using GHTLecture 13
October 14, 2004EENG 460a / CPSC 436 / ENAS 960
Networked Embedded Systems Sensor Networks
- Andreas Savvides
- andreas.savvides_at_yale.edu
- Office AKW 212
- Tel 432-1275
- Course Website
- http//www.eng.yale.edu/enalab/courses/eeng460a
2 Data centric Storage In Sensornets with
GHT
- S. Ratnasamy, B. Karp, S. Shenker, D. Estrin, R.
Govindan, L. Yin and F. Yu - MONET Special Issue on Sensor Networks, August
2003
3Overview
- Data Centric Storage
- Data is stored inside the network each name
corresponds to a location in space - All data with the same name will be stored at the
same sensor network location - E.g an elephant sighting
- Why Data centric Storage?
- Energy efficiency
- Robustness against mobility and node failures
- Scalability
4Keywords and Terminology
- Observation
- ? low-level readings from sensors
- ? e.g. Detailed temperature readings
- Events
- ? Predefined constellations of low-level
observations - ? e.g. temperature greater than 75 F
- Queries
- ?Used to elicit information from sensor network
5Performance MetricTotal Usage /Hotspot Usage
- Use communication as a cost function for energy
consumption - Total Usage
- Total number of packets sent in the
Sensor network - Hotspot Usage
- The maximal number of packets send by a
particular sensor node - Costs used in the evaluation
- Message flooding cost O(n)
- Point-to-point routing cost
- n is the number of nodes
6Alternative Storage Schemes
- External Storage (ES)
- Events propagated and stored at an external
location - Local Storage (LS)
- Events stored locally at the detecting node
- Queries are flooded to all nodes and the events
are sent back - Data Centric Storage (DCS)
- Data for an event stored within the sensor
network - Queries are directed to the node that stores the
data
7External Storage (ES)
External storage
event
8Local Storage (LS)
Queries flooded at all the nodes
event
event
9Why do we need DCS?
- Scalability
- Robustness against Node failures and Node
mobility - To achieve Energy-efficiency
10Design Criterial Scalability Robustness
- Node failures
- Topology changes
- System scale to large number of nodes
- Energy Constraints
- Persistence
- (k,v) pair must remain available to queries,
despite sensor node failures and changes in
sensor network topology - Consistency
- A query k must be routed correctly to a node
where (k,v) pairs are stored if these node
change, then they should do this consisently - Scaling in Database Size
- Topological generality system should scale well
on a large number of topologies
11Assumptions in DCS
- Large Scale networks whose approximate
geographic boundaries are known - Nodes have short range communication and are
within the radio range of several other nodes - Nodes know their own locations by GPS or some
localization scheme - Communication to the outside world takes place by
one or more access points
12Data Centric Storage
- Relevant Data are stored by name at nodes
within the Sensor network - All data with the same general name will be
stored at the same sensor-net node. - e.g. (elephant sightings)
- Queries for data with a particular name are then
sent directly to the node storing those named
data
13Data centric Storage
Elephant Sighting
sourcelass.cs.umass.edu
14Geographic Hash Table
- Events are named with keys and both the storage
and the retrieval are performed using keys - GHT provides (key, value) based associative memory
15Geographic Hash Table Operations
- GHT supports two operations
- ? Put(k,v)-stores v (observed data) according
to the key k - ? Get(k)-retrieve whatever value is
associated with key k - Hash function
- ? Hash the key in to the geographic
coordinates - ? Put() and Get() operations on the same
key k hash k to the same location
16Storing Data in GHT
Put (elephant, data)
(12,24)
Hash (elephant)(12,24)
sourcelass.cs.umass.edu
17Retrieving data in GHT
(12,24)
Hash (elephant)(12,24)
Get (elephant)
18Geographic Hash Table
Node A
Node B
19Algorithms Used By GHT
- Geographic hash Table uses GPSR for
Routing(Greedy Perimeter Stateless Routing) - PEER-TO-PEER look up system
- (data object is associated with key and each
node in the system is responsible for storing a
certain range of keys)
20Algorithm (Contd)
- GPSR- Packets are marked with position of
destinations and each node is aware of its
position - Greedy forwarding algorithm
- Perimeter forwarding algorithm
-
B
B
A
A
21GPSR Right-Hand Rule In Perimeter Forwarding
2
x
z
3
1
y
22Home Node and Home perimeter
- Home node Node geographically nearest to the
destination coordinates of the packet - Serves as the rendezvous point for Get() and
Put() operations on the same key - In GHT packet is not addressed to specific node
but only to a specific location - Use GPSR to find the home node
- only perimeter mode of GPSR to find Home
Perimeter - Home Perimeter perimeter that encloses the
destination - Start from the home node, and use perimeter mode
to make a cycle and return to the home node
23Problems
- Robustness could be affected
- Nodes could move (i.d. of Home node?)
- Node failure can Occur
- Deployment of new Nodes
- Not Scalable
- Storage capacity of the home nodes
- Bottleneck at Home nodes
24Solutions to the problems
- Perimeter refresh protocol
- mostly addresses the robustness issue
- Structured Replication
- address the scalability issue
- how to handle storage of many events
25Perimeter refresh protocol
- Replicates stored data for key k at nodes around
the location to which k hashes - Stores a copy of the key value pair at each node
on the home perimeter - Each node on the perimeter is called a replica
node - How do you ensure consistency persistence
- A node becomes the home node if a packet for a
particular key arrives at that node - The perimeter refresh protocols periodically
sends out refresh packets - After a time period Th generate a refresh packet
that contains the data for that key - Packet forwarded on the home perimeter in the
same way as Get() and Put() - The refresh packet will take a tour of the home
perimeter regardless the changes in the network
topology since the keys insertion - This property maintains the perimeter
26Perimeter Refresh Protocol
- How do you guard against node failures
- When a replica node receives a packet it did not
originate, it caches the data in the refresh and
sets up a takeover timer Tt - Timer is reset each time a refresh from another
node arrives - If the timer expires the replica node initiates a
refresh packet addressed to the keys hashed
location - Note That particular node does not determine a
new home node. The GHT routing causes the refresh
to reach a node home node
27Perimeter Refresh Protocol
E
Replica
Assume key k hashes at location L A is closest
to L so it becomes the home node
Replica
D
L
F
A
home
B
C
28Perimeter Refresh Protocol
E
Replica
D
Replica
Suppose the node A dies
L
F
home
C
Replica
B
Replica
29Time Specifications
- Refresh time (Th)
- Take over time (Tt)
- Death time (Td)
- General rule
- TdgtTh and TtgtTh
- In GHT Td3Th and Tt2Th
-
30Characteristics Of Refresh Packet
- Refresh packet is addressed to the hashed
location of the key - Every (Th) secs the home node will generate
refresh packet - Refresh packet contains the data stored for the
key and routed exactly as get() and put()
operations - Refresh packet always travels along the home
perimeter
31Structured Replication
- Too many events are detected then home node will
become the hotspot of communication. - Structured replication is used to address the
scaling problem - Hierarchical decomposition of the key space
- Event names have a certain hierarchy depth
32Structured Replication
33Structured Replication
- A node that detects a new event, stores that
event to its closest mirror - this is easily computable
- This reduces the storage cost, but increases the
query cost - GHT has to route the queries to all mirror nodes
- Queries are routes recursively
- First route query to the root, then to the first
level and then to the second level mirrors - Structured replication becomes more useful for
frequently detected events
34Evaluation
- Simulation to test if the protocol is functioning
correctly - Done in the ns-2 network simulator using an IEEE
802.11 mac - This is a well known event driven simulator for
ad-hoc networks - Larger scale simulations for the comparative
study where done with a custom simulator
35Comparative Study
- Simulation compares the following schemes
- External Storage (ES)
- Local Storage (LS)
- Normal DCS A query returns a separate message
for each detected event - Summarized DCS(S-DCS) A query returns a single
message regardless of the number of detected
events - Structured Replication DCS (SR_DCS) Assuming an
optimal level of SR - Comparison based on Cost
- Comparison based on Total usage and Hot spot
usage
36Assumptions in comparison
- Asymptotic costs of O(n) for floods and O( n) for
point to point routing - Event locations are distributed randomly
- Event locations are not known in advance
- No more than one query for each event type
- (Q Queries in total)
- Assume access points to be the most heavily used
area of the sensor network
37Comparison based onHot-spot/Total Usage
- n - Number of nodes
- T - Number of Event types
- Q Number Of Event types queried for
- Dtotal Total number of detected events
- DQ- Number of detected events for queries
38DCS TYPES
- Normal DCS Query returns a separate message for
each detected event - Summarized DCS Query returns a single message
regardless of the number of detected events - (usually summary is preferred)
39Comparison Study contd..
ES LS DCS
Total
Hot spot
40Observations from the Comparison
- DCS is preferable only in cases where
- Sensor network is Large
- There are many detected events and not all event
types queried - Dtotalgtgtmax(Dq,Q)
41Simulations
- To check the Robustness of GHT
- To compare the Storage methods in terms of total
and hot spot usage
42Simulation Setup
- ns-2
- Node Density 1node/256m2
- Radio Range 40 m
- Number of Nodes -50,100,150,200
- Mobility Rate -0,0.1,1m/s
- Query generation Rate -2qps
- Event types 20
- Events detected -10/type
- Refresh interval -10 s
43Performance metrics
- Availability of data stored to Queriers
- (In terms of success rate)
- Loads placed on the nodes participating in GHT
(hotspot usage)
44Simulation Results for Robustness
- GHT offers perfect availability of stored events
in static case - It offers high availability when nodes are
subjected to mobility and failures
45Simulation Results under varying Q
Number of nodes is
constant 10000
46Simulation results under varying N
Number of Queries Q 50
47Simulation Results for comparison of 3-storage
methods
- S-DCS have low hot-spot usage under varying Q
- S-DCS is has the lowest hot-spot usage under
varying n
48Conclusion
- Data centric storage entails naming of data and
storing data at nodes within the sensor network - GHT- hashes the key (events) in to geographical
co-ordinates and stores a key-value pair at the
sensor node geographically nearest to the hash - GHT uses Perimeter Refresh Protocol and
structured replication to enhance robustness and
scalability - DCS is useful in large sensor networks and there
are many detected events but not all event types
are Queried
49REFERENCES
- Deepak Ganesan, Deborah Estrin, John Heidemann,
Dimensions why do we need a new data handling
architecture for sensor networks?, ACM SIGCOMM
Computer Communication Review, Volume 33 Issue
1, January 2003 Scott Shenker, Sylvia
Ratnasamy, Brad Karp, Ramesh Govindan, Deborah
Estrin, Data-centric storage in sensornets, ACM
SIGCOMM Computer Communication Review, Volume 33
Issue 1, January 2003 - Sylvia Ratnasamy, Brad Karp, Scott Shenker,
Deborah Estrin, Ramesh Govindan, Li Yin, Fang Yu,
Data-centric storage in sensornets with GHT, a
geographic hash table, Mobile Networks and
Applications, Volume 8 Issue 4, August 2003 - Chalermek Intanagonwiwat, Ramesh Govindan,
Deborah Estrin, John Heidemann, Fabio Silva,
Directed diffusion for wireless sensor
networking, IEEE/ACM Transactions on Networking
(TON), Volume 11 Issue, February 2003 - R. Govindan, J. M. Hellerstein, W. Hong, S.
Madden, M. Franklin, S. Shenker, The Sensor
Network as a Database, USC Technical Report No.
02-771, September 2002