Data Centric Storage using GHT Lecture 13 October 14, 2004 EENG 460a / CPSC 436 / ENAS 960 Networked Embedded Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Data Centric Storage using GHT Lecture 13 October 14, 2004 EENG 460a / CPSC 436 / ENAS 960 Networked Embedded Systems

Description:

Data Centric Storage using GHT. Lecture 13. October 14, 2004. EENG 460a / CPSC 436 / ENAS 960 ... Data is stored inside the network each name corresponds to a ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 50
Provided by: mbo95
Category:

less

Transcript and Presenter's Notes

Title: Data Centric Storage using GHT Lecture 13 October 14, 2004 EENG 460a / CPSC 436 / ENAS 960 Networked Embedded Systems


1
Data Centric Storage using GHTLecture 13
October 14, 2004EENG 460a / CPSC 436 / ENAS 960
Networked Embedded Systems Sensor Networks
  • Andreas Savvides
  • andreas.savvides_at_yale.edu
  • Office AKW 212
  • Tel 432-1275
  • Course Website
  • http//www.eng.yale.edu/enalab/courses/eeng460a

2
Data centric Storage In Sensornets with
GHT
  • S. Ratnasamy, B. Karp, S. Shenker, D. Estrin, R.
    Govindan, L. Yin and F. Yu
  • MONET Special Issue on Sensor Networks, August
    2003

3
Overview
  • Data Centric Storage
  • Data is stored inside the network each name
    corresponds to a location in space
  • All data with the same name will be stored at the
    same sensor network location
  • E.g an elephant sighting
  • Why Data centric Storage?
  • Energy efficiency
  • Robustness against mobility and node failures
  • Scalability

4
Keywords and Terminology
  • Observation
  • ? low-level readings from sensors
  • ? e.g. Detailed temperature readings
  • Events
  • ? Predefined constellations of low-level
    observations
  • ? e.g. temperature greater than 75 F
  • Queries
  • ?Used to elicit information from sensor network

5
Performance MetricTotal Usage /Hotspot Usage
  • Use communication as a cost function for energy
    consumption
  • Total Usage
  • Total number of packets sent in the
    Sensor network
  • Hotspot Usage
  • The maximal number of packets send by a
    particular sensor node
  • Costs used in the evaluation
  • Message flooding cost O(n)
  • Point-to-point routing cost
  • n is the number of nodes

6
Alternative Storage Schemes
  • External Storage (ES)
  • Events propagated and stored at an external
    location
  • Local Storage (LS)
  • Events stored locally at the detecting node
  • Queries are flooded to all nodes and the events
    are sent back
  • Data Centric Storage (DCS)
  • Data for an event stored within the sensor
    network
  • Queries are directed to the node that stores the
    data

7
External Storage (ES)
External storage
event
8
Local Storage (LS)
Queries flooded at all the nodes
event
event
9
Why do we need DCS?
  • Scalability
  • Robustness against Node failures and Node
    mobility
  • To achieve Energy-efficiency

10
Design Criterial Scalability Robustness
  • Node failures
  • Topology changes
  • System scale to large number of nodes
  • Energy Constraints
  • Persistence
  • (k,v) pair must remain available to queries,
    despite sensor node failures and changes in
    sensor network topology
  • Consistency
  • A query k must be routed correctly to a node
    where (k,v) pairs are stored if these node
    change, then they should do this consisently
  • Scaling in Database Size
  • Topological generality system should scale well
    on a large number of topologies

11
Assumptions in DCS
  • Large Scale networks whose approximate
    geographic boundaries are known
  • Nodes have short range communication and are
    within the radio range of several other nodes
  • Nodes know their own locations by GPS or some
    localization scheme
  • Communication to the outside world takes place by
    one or more access points

12
Data Centric Storage
  • Relevant Data are stored by name at nodes
    within the Sensor network
  • All data with the same general name will be
    stored at the same sensor-net node.
  • e.g. (elephant sightings)
  • Queries for data with a particular name are then
    sent directly to the node storing those named
    data

13
Data centric Storage
Elephant Sighting
sourcelass.cs.umass.edu
14
Geographic Hash Table
  • Events are named with keys and both the storage
    and the retrieval are performed using keys
  • GHT provides (key, value) based associative memory

15
Geographic Hash Table Operations
  • GHT supports two operations
  • ? Put(k,v)-stores v (observed data) according
    to the key k
  • ? Get(k)-retrieve whatever value is
    associated with key k
  • Hash function
  • ? Hash the key in to the geographic
    coordinates
  • ? Put() and Get() operations on the same
    key k hash k to the same location

16
Storing Data in GHT
Put (elephant, data)
(12,24)
Hash (elephant)(12,24)
sourcelass.cs.umass.edu
17
Retrieving data in GHT
(12,24)
Hash (elephant)(12,24)
Get (elephant)
18
Geographic Hash Table
Node A
Node B
19
Algorithms Used By GHT
  • Geographic hash Table uses GPSR for
    Routing(Greedy Perimeter Stateless Routing)
  • PEER-TO-PEER look up system
  • (data object is associated with key and each
    node in the system is responsible for storing a
    certain range of keys)

20
Algorithm (Contd)
  • GPSR- Packets are marked with position of
    destinations and each node is aware of its
    position
  • Greedy forwarding algorithm
  • Perimeter forwarding algorithm

B
B
A
A
21
GPSR Right-Hand Rule In Perimeter Forwarding
2
x
z
3
1
y
22
Home Node and Home perimeter
  • Home node Node geographically nearest to the
    destination coordinates of the packet
  • Serves as the rendezvous point for Get() and
    Put() operations on the same key
  • In GHT packet is not addressed to specific node
    but only to a specific location
  • Use GPSR to find the home node
  • only perimeter mode of GPSR to find Home
    Perimeter
  • Home Perimeter perimeter that encloses the
    destination
  • Start from the home node, and use perimeter mode
    to make a cycle and return to the home node

23
Problems
  • Robustness could be affected
  • Nodes could move (i.d. of Home node?)
  • Node failure can Occur
  • Deployment of new Nodes
  • Not Scalable
  • Storage capacity of the home nodes
  • Bottleneck at Home nodes

24
Solutions to the problems
  • Perimeter refresh protocol
  • mostly addresses the robustness issue
  • Structured Replication
  • address the scalability issue
  • how to handle storage of many events

25
Perimeter refresh protocol
  • Replicates stored data for key k at nodes around
    the location to which k hashes
  • Stores a copy of the key value pair at each node
    on the home perimeter
  • Each node on the perimeter is called a replica
    node
  • How do you ensure consistency persistence
  • A node becomes the home node if a packet for a
    particular key arrives at that node
  • The perimeter refresh protocols periodically
    sends out refresh packets
  • After a time period Th generate a refresh packet
    that contains the data for that key
  • Packet forwarded on the home perimeter in the
    same way as Get() and Put()
  • The refresh packet will take a tour of the home
    perimeter regardless the changes in the network
    topology since the keys insertion
  • This property maintains the perimeter

26
Perimeter Refresh Protocol
  • How do you guard against node failures
  • When a replica node receives a packet it did not
    originate, it caches the data in the refresh and
    sets up a takeover timer Tt
  • Timer is reset each time a refresh from another
    node arrives
  • If the timer expires the replica node initiates a
    refresh packet addressed to the keys hashed
    location
  • Note That particular node does not determine a
    new home node. The GHT routing causes the refresh
    to reach a node home node

27
Perimeter Refresh Protocol
E
Replica
Assume key k hashes at location L A is closest
to L so it becomes the home node
Replica
D
L
F
A
home
B
C
28
Perimeter Refresh Protocol
E
Replica
D
Replica
Suppose the node A dies
L
F
home
C
Replica
B
Replica
29
Time Specifications
  • Refresh time (Th)
  • Take over time (Tt)
  • Death time (Td)
  • General rule
  • TdgtTh and TtgtTh
  • In GHT Td3Th and Tt2Th

30
Characteristics Of Refresh Packet
  • Refresh packet is addressed to the hashed
    location of the key
  • Every (Th) secs the home node will generate
    refresh packet
  • Refresh packet contains the data stored for the
    key and routed exactly as get() and put()
    operations
  • Refresh packet always travels along the home
    perimeter

31
Structured Replication
  • Too many events are detected then home node will
    become the hotspot of communication.
  • Structured replication is used to address the
    scaling problem
  • Hierarchical decomposition of the key space
  • Event names have a certain hierarchy depth

32
Structured Replication
33
Structured Replication
  • A node that detects a new event, stores that
    event to its closest mirror
  • this is easily computable
  • This reduces the storage cost, but increases the
    query cost
  • GHT has to route the queries to all mirror nodes
  • Queries are routes recursively
  • First route query to the root, then to the first
    level and then to the second level mirrors
  • Structured replication becomes more useful for
    frequently detected events

34
Evaluation
  • Simulation to test if the protocol is functioning
    correctly
  • Done in the ns-2 network simulator using an IEEE
    802.11 mac
  • This is a well known event driven simulator for
    ad-hoc networks
  • Larger scale simulations for the comparative
    study where done with a custom simulator

35
Comparative Study
  • Simulation compares the following schemes
  • External Storage (ES)
  • Local Storage (LS)
  • Normal DCS A query returns a separate message
    for each detected event
  • Summarized DCS(S-DCS) A query returns a single
    message regardless of the number of detected
    events
  • Structured Replication DCS (SR_DCS) Assuming an
    optimal level of SR
  • Comparison based on Cost
  • Comparison based on Total usage and Hot spot
    usage

36
Assumptions in comparison
  • Asymptotic costs of O(n) for floods and O( n) for
    point to point routing
  • Event locations are distributed randomly
  • Event locations are not known in advance
  • No more than one query for each event type
  • (Q Queries in total)
  • Assume access points to be the most heavily used
    area of the sensor network

37
Comparison based onHot-spot/Total Usage
  • n - Number of nodes
  • T - Number of Event types
  • Q Number Of Event types queried for
  • Dtotal Total number of detected events
  • DQ- Number of detected events for queries

38
DCS TYPES
  • Normal DCS Query returns a separate message for
    each detected event
  • Summarized DCS Query returns a single message
    regardless of the number of detected events
  • (usually summary is preferred)

39
Comparison Study contd..
ES LS DCS
Total
Hot spot
40
Observations from the Comparison
  • DCS is preferable only in cases where
  • Sensor network is Large
  • There are many detected events and not all event
    types queried
  • Dtotalgtgtmax(Dq,Q)

41
Simulations
  • To check the Robustness of GHT
  • To compare the Storage methods in terms of total
    and hot spot usage

42
Simulation Setup
  • ns-2
  • Node Density 1node/256m2
  • Radio Range 40 m
  • Number of Nodes -50,100,150,200
  • Mobility Rate -0,0.1,1m/s
  • Query generation Rate -2qps
  • Event types 20
  • Events detected -10/type
  • Refresh interval -10 s

43
Performance metrics
  • Availability of data stored to Queriers
  • (In terms of success rate)
  • Loads placed on the nodes participating in GHT
    (hotspot usage)

44
Simulation Results for Robustness
  • GHT offers perfect availability of stored events
    in static case
  • It offers high availability when nodes are
    subjected to mobility and failures

45
Simulation Results under varying Q
Number of nodes is
constant 10000
46
Simulation results under varying N
Number of Queries Q 50
47
Simulation Results for comparison of 3-storage
methods
  • S-DCS have low hot-spot usage under varying Q
  • S-DCS is has the lowest hot-spot usage under
    varying n

48
Conclusion
  • Data centric storage entails naming of data and
    storing data at nodes within the sensor network
  • GHT- hashes the key (events) in to geographical
    co-ordinates and stores a key-value pair at the
    sensor node geographically nearest to the hash
  • GHT uses Perimeter Refresh Protocol and
    structured replication to enhance robustness and
    scalability
  • DCS is useful in large sensor networks and there
    are many detected events but not all event types
    are Queried

49
REFERENCES
  • Deepak Ganesan, Deborah Estrin, John Heidemann,
    Dimensions why do we need a new data handling
    architecture for sensor networks?, ACM SIGCOMM
    Computer Communication Review,  Volume 33 Issue
    1, January 2003    Scott Shenker, Sylvia
    Ratnasamy, Brad Karp, Ramesh Govindan, Deborah
    Estrin, Data-centric storage in sensornets, ACM
    SIGCOMM Computer Communication Review,  Volume 33
    Issue 1, January 2003
  • Sylvia Ratnasamy, Brad Karp, Scott Shenker,
    Deborah Estrin, Ramesh Govindan, Li Yin, Fang Yu,
    Data-centric storage in sensornets with GHT, a
    geographic hash table, Mobile Networks and
    Applications,  Volume 8 Issue 4, August 2003
  • Chalermek Intanagonwiwat, Ramesh Govindan,
    Deborah Estrin, John Heidemann, Fabio Silva,
    Directed diffusion for wireless sensor
    networking, IEEE/ACM Transactions on Networking
    (TON),  Volume 11 Issue, February 2003
  • R. Govindan, J. M. Hellerstein, W. Hong, S.
    Madden, M. Franklin, S. Shenker, The Sensor
    Network as a Database, USC Technical Report No.
    02-771, September 2002
Write a Comment
User Comments (0)
About PowerShow.com