Title: Datacentric view of sensornets: An Overview
1Data-centric view of sensornets An Overview
- Puru Kulkarni
- Vijay Sundaram
- Bhuvan Urgaonkar
2Motivation
- Ubiquitous presence of sensor networks
- Communication, computation, limited storage,
sensing capabilities - Used to sense, actuate, control
- Sensors everywhere Data everywhere!
- Require an infrastructure for data access and
storage
3Overview
- Sensors sense/generate data
- Users/Applications interested in data or some
measure of data - Common user operations are
- Queries and Monitoring
- Actuate and Control
4Typical Queries
- Historical
- What is the average rainfall over past 2 days?
- Current
- What is the current temperate in Rm 226?
- Long Running
- Temperature in Rm 226 over the next 4 hours
every 30 seconds
5Issues
- How to identify relevant sensors?
- Computation vs. Communication tradeoff
- Where to process query?
- inside the sensor network (route query)
- Need new techniques
- at a centralized location (route data)
- Large amounts of data transfer (not efficient)
- Data gathering may not reflect query rate
- How to process query?
- queries on streaming data
6DataSpace Querying and Monitoring Deeply
Networked Collections in Physical SpaceT.
Imielinski and S. Goel, Rutgers University
- Billions of objects populate space
- Each produces and locally stores data
- Location aware
- Can be selectively monitored, queried and
controlled
- Physical world enhanced with data
7Characteristics
- Dataspace
- Data lives on the object
- Users access not only local information but can
navigate entire dataspace - Spatial world divided in 3-D datacubes
- CS Bldg. , street, block etc
- Communication, messaging and computation
techniques for querying and monitoring required
8Querying and Monitoring
- Queries are spatially driven
- Steps
- Identify relevant datacubes
- Identify relevant nodes (dataflocks)
- Datacube directory service
- Aggregation for queries on several datacubes
- e.g. Information about Manhattan taxi cabs
9Architecting DataSpace
- Network as DataSpace engine
- multicast mechanisms
- (each node has an IP address!)
- group membership based on
- physical location
- attribute (temperature, vehicles etc)
- multicast fits selective node addressing criteria
to access relevant data - e.g. what is average temperature in CS Bldg?
- Query reaches only sensors in the CS Bldg
datacube and have the corresponding group address
10Network as DataSpace engine
- Space Handle encodes datacube information
- Subject Handle attributes that are part of a
multicast group - Dataspace address is a IPv6 mutlicast address
E.g. Space handle 224.4.5 Subject
handle 8 Dataspace address 224.4.5.8
11Geographic Routing infrastruture
- Route message based on physical location rather
than IP address - Use GPS coordinates for locations
- Avoids use of multicast for routing queries to
datacubes - Once query reaches a region use mutlicast
12Geographic Routing infrastruture
- Geo-router (routes based on datacube location)
- Geo-node (issue query to nodes in datacube)
- Geo-host (process geographics messages)
- Approach
- Route query to datacube
- Geo-nodes route query within datacube
- mulitcast with a TTL of 1
13- The Sensor Network as a Database
- Govindan, Hellerstein, Hong, Madden, Franklin,
Shenker - Querying the Physical World
- Bonnet, Gehrke, Seshadri
14Sensornet Database architecture
- Given a routing and access mechanism, how to
process queries? - Provide a DB-view to users/apps
- well understood programming interface
- common data operations use computation in network
- help energy-efficiency
- allow users to be unaware of actual network, but
treat it as a database - Sensor Network Data gt Sensor Network Database
15What is required?
- Core DB operations tailored for sensor networks
- Design appropriate building blocks for DB
operations - Join, aggregation, grouping, selection etc
16Sensornet Database Architecutre
- Two important ideas
- in-network implementations of primitive database
query operators such as grouping, aggregation,
and joins - group communication and routing protocols with
possible processing at intermediate nodes
implement the operator in an application
independent way
17Sensornet Database Architecutre
- Relax the semantics of database queries to allow
approximate results - relaxation enables energy-efficient
implementations even given the expected high
level of network dynamics - A sensor network is a proxy for a continuous
realworld phenomenon, and by nature samples that
phenomenon discretely at some rate, with some
degree of error.
18In-network Implementation
- JOIN operator
- selection over cross-product of a pair of tables
- Tuples generated at different nodes might be
joined at a single node - Some JOIN implementations are blocking
- Blocking is infeasible in sensor networks
- tables can contain unbounded streams of data
- amount of memory available is limited
- Need to retool these operations
- Pipelining
- Partitioning
19Non Blocking Pipelinined Joins
- Symmetric hash-join
- Maintains two hash tables (keyed by the column(s)
used for the join) - On an input tuple, looks up matching tuples from
other inputs hash table - Outputs any matching results
- Ripple joins
- Statistically sample the two tables to be joined,
in order to produce a stream of joined tuples - Relative rates at which the two tables are
sampled adapt to match the variance produced by
the data in each - low energy approach to obtain approximate answers
20Partitioning
- Partitioning
- tuples are partitioned based on their join-column
values and redistributed on the fly across
multiple nodes - the work of joining the individual partitions is
done in parallel by each of the nodes - Partitions can be defined by value,
geographically, or by sensor type, and a node (or
nodes) can be designated to perform the join for
the partition
21In-network Implementation
- Aggregation operators
- summarization of a column(s) into a single
numerical value E.g. SUM, COUNT, AVERAGE, MIN,
MAX etc - query flooded in the network and the responses
are routed on the reverse path trees, - results aggregated across several nodes
- E.g to calculate AVERAGE each node returns (SUM,
COUNT) values to parent - Can be a very common operator
22Distributed Sensnet DBs
- How to represent devices in DBs on sensornets?
- ADTs (Abstract Data Types)
- Methods correspond to sensing functionality
- Virtual Relations (VRs) store local data
- Network used for query operations
23Virtual Relation
- VR with attributes as
- Inputs to an ADT (device) function
- Arguments to an ADT function
- Output of the function
- Timestamp of the function
24Virtual Relation
- Some VR properties
- records are never updated or deleted
- is naturally partitioned over the sensnet (each
device takes care of its set of VR records) - What does this mean? a distributed DB
- Records from the VRs (distributed over the
devices) are processed using distributed query
execution plans
25Approximate Results
- Energy-efficiency can be achieved using
approximate aggregates - Uniform sampling
- Tuples are uniformly sampled and the resulting
average is assumed to represent the actual
average - Packet loss might invalidate the statistical
assumptions that these intervals depend on. - Logarithmic sampling
- The number of respondents (or the size of memory
needed for the count) scales logarithmically with
the size of the network - Provides looser error bounds but uses
significantly less memory or communication.
26Complex query evaluation
- R x S x T
- What order to follow?
- (RxS)xT or Rx(SxT) or (RxT)XS
- Decided by query optimizer
- Usually depends on table size
- With Sensernret DB
- Need adaptive policy to route tuples based on
- Energy consumption
- Topology
- Loss rates
27Conclusions
- Explosion of data from sensor networks needs an
infrastructure for access, storage etc - Organizing sensors
- Datacubes
- Other techniques ?
- Identifying relevant sensors is preliminary to
fetch data - Dataspace provided two solutions
- Other approaches ?
28Conclusions
- Sensornets as Distributed DB
- Provide a database view to sensornet data
- Pros
- App development easy
- In-network processing helps resource usage
-
- Cons
- Distributed DB can be difficult
- Requires to retool DB operations for sensornets
- Other approaches?
-
29Representations for Devices Functions
- Internal Representation
- We cant use trad OO DB methods
- - they all demand immediate access
- - with asynchronous quality of sensnets this is
unacceptable
30Overview
- Direction of sensor networks progress
- Small form-factor devices
- On-board computation
- Wireless communication
- Increased sensing capabilities
- Improved OS and networking functionalities
- Prediction
- Every device (gt 1 ) will have some sensor
- Ubiquitous presence of sensor networks
31Overview
- Typical sensor networks usage
- Sense, collect and convey data
- Provides a ubiquitous computing platform
- Applications query/monitor sensed data
- Ecosystem dynamics
- Temperature/weather sensing
- Automobile traffic analysis
- Data-centric network, generated data more
important than node identity
32Requirements
- Addressing
- Identify relevant sensors
- How to access/process data?
- Communicate data and process centrally
- Compute query at node and perform DB operations
- Interface for querying/monitoring and control
33What to do with data?
- Answer queries/give useful info
- How ??
- Centralized approach
- Communicate data
- Store and process all data at central location
(traditional DB approach) - Is all temporal data to be stored?
- Communication overhead?
34What to do with data?
- De-centralized approach
- Communicate query (query routing)
- Required data attribute of node
- Node stores and communicates data to queries
- Processing at node
- Computation overhead
- Computation overhead smaller than communication!
- How to aggregate data?
- How to route queries?
- How to map nodes to addresses for communication
purposes?
35Need for Decentralization
- Centralized (Traditional databases)
- Inefficient use of resources
- Large amounts of data communicated to central
location - All sensors send data all the time
- Dissociates access to device from query load
- Communication more expensive than computation
- Decentralized (Distributed DBs)
- Data on devices
- In-network query processing
36Pipelining Benefits
- Provide streamed partial answers, hence, can
enable query refinement - Schemes like ripple joins form a low energy
approach to obtain approximate answers and can be
used together with sampling