Title: Wireless Sensor Networks and Query Processing
1Wireless Sensor Networks and Query Processing
- WSN Wireless Sensor Networks
- Routing Problems
- Routing Algorithms
- Real-time Query Processing
- Sensor Selection and Data Aggregation
2A typical Wireless Sensor Network
SN
SN
GW
Bluetooth
GW
SN
SN
SN
SN
SN
SN
SN
SN
GW
SN
GW
SN
WLAN
GPRS
Ethernet
- Integration of Sensor Nodes (SN) and Gateways (GW)
3MANET Mobile Ad-hoc Networking
4Why Wireless Sensor Networks ?
- Ease of deployment
- Speed of deployment
- Decreased dependence on infrastructure
- Self-adaptive and self-organizing
- Sensors are cheap devices and can be deployed in
large number - Sensors can work in harsh environment conditions,
i.e., desert - Sensors can work continuously for monitoring and
surveillance purposes - Connected to the rest of the system through a
gateway - From the gateway, various functions and queries
may be submitted into the system to access the
sensor data
5Todays Wireless Sensor Networks (WSN)
- First generation of WSNs is available
- Diverse sensor nodes, several gateways
- Even with special sensors cameras, body
temperature - Basic software
- Routing, energy conservation, management
- Several prototypes for different applications
- Environmental monitoring, industrial automation,
wildlife monitoring - Many see new possibilities for monitoring,
surveillance, protection - Sensor networks as a cheap and flexible new
meansfor surveillance (i.e., security) - Monitoring and protection of goods
- Chemicals, food, vehicles (car parks), machines,
containers, - Large application area besides military
- Law enforcement, disaster recovery, industry,
private homes,
6Mobile ad-hoc networks (MANET)
- Network without infrastructure
- Use components of participants for networking
- Examples
- Single-hop All partners max. one hop apart
- Bluetooth piconet, PDAs in a room,gaming
devices - Multi-hop Cover larger distances, circumvent
obstacles - Bluetooth scatternet, police network,
car-to-car networks - MANET (Mobile Ad-hoc Networking) group
- Dynamic network topology
- Mobile nodes
7Many Variations
- Fully Symmetric Environment
- All nodes have identical capabilities and
responsibilities - Asymmetric Capabilities
- Transmission ranges and radios may differ
- Battery life at different nodes may differ
- Processing capacity may be different at different
nodes - Speed of movement (fixed and mobile)
- Asymmetric Responsibilities
- Only some nodes may route packets
- Some nodes may act as leaders of nearby nodes
(e.g., cluster head)
8Many Variations
- Traffic characteristics may differ in different
mobile ad hoc networks - Bit rate
- Timeliness constraints
- Reliability requirements
- Unicast / multicast / geocast
- May co-exist (and co-operate) with an
infrastructure-based network
9Many Variations
- Mobility patterns may be different
- People sitting at an airport lounge
- New York taxi cabs
- Kids playing
- Military movements
- Mobility characteristics
- Speed
- Predictability
- Direction of movement
- Pattern of movement
- Uniformity (or lack thereof) of mobility
characteristics among different nodes
10Wireless Sensor Networks Challenges
- Long-lived, autonomous networks
- Use environmental energy sources
- Embed and forget
- Self-healing
- Self-configuring networks
- Routing
- Data aggregation
- Localization
- Managing wireless sensor networks
- Tools for access and programming
- Update distribution
- Scalability, Quality of Service
11Routing Problem
- Routing finding a route to send data from the
source to the destination - Highly dynamic network topology
- Device mobility plus varying channel quality
- Separation and merging of networks possible
- Asymmetric connections possible
N6
N7
N6
N7
N1
N1
N2
N3
N2
N3
N4
N4
N5
N5
time t1
time t2
good link weak link
Changing topology
12Mobile Ad Hoc Networks
- May need to traverse multiple links to reach a
destination
13Mobile Ad Hoc Networks
- Mobility causes route changes
14Routing Problems
- Asymmetric links
- A path from node A to B does not implies that
node B can use the same path to send packet to
node A - Redundant links
- Multiple paths from A to B, which one is the best
one (minimizing the number of hops count) and
should be chosen - Interference
- Collision, neighboring nodes send packets at the
same time - Collision -gt retransmission (MAC)
- Dynamic topology
- Changing link quality due to movement
- Need to find a new path every short period of
time. The old one does not work - Update of path information in the intermediate
nodes - No nodes have a complete information of the
status of all the nodes in the system - Transmission delay is changing
- Difficult for loading balancing and traffic
control
15Routing Problems
- Routing Problem
- To find a route to connect the source node (S) to
the destination node (D) through a sequence of
relay nodes - The route may just for a one time connection or
for a period of time (continuous monitoring) - Issues in routing algorithms
- Minimize message overhead (no. of messages)
- On-demand algorithms
- Minimize the searching delay
- Table-driven algorithms
- Route maintenance
- Minimize energy consumption rate
- Power-aware routing algorithms (choosing high
energy nodes as relay nodes - Switching some of the mobile hosts to doze mode
to conserve energy
16Routing Methods
- Two types of routing algorithms
- On-demand protocols (reactive)
- A route is searched upon the receipt a connection
request - Table-driven protocols (proactive)
- The topology of the whole network is maintained
- When a connection is needed, the source node can
select the route from its memory directly
17Routing Methods
- Latency of route discovery
- Proactive protocols may have lower latency since
routes are maintained at all times - Reactive protocols may have higher latency
because a route from X to Y will be found only
when X attempts to send to Y - Overhead of route discovery/maintenance
- Reactive protocols may have lower overhead since
routes are determined only if needed - Proactive protocols can (but not necessarily)
result in higher overhead due to continuous route
updating - Which approach achieves a better trade-off
depends on the traffic and mobility patterns
18Routing Algorithms for Ad Hoc Networks
- Flooding
- Dynamic Source Routing (DSR)
- Location-Aided Routing (LAR)
- Power-Aware Routing (PAR)
- Least Interference Routing (LIR)
19Flooding for Data Delivery
- Sender S broadcasts data packet P to all its
neighbors - Each node receiving P forwards P to its neighbors
- Sequence numbers used to avoid the possibility of
forwarding the same packet more than once - Packet P reaches destination D provided that D is
reachable from sender S - Node D does not forward the packet
20Flooding for Data Delivery
Y
Represents that connected nodes are within each
others transmission range
Z
S
E
F
B
C
M
L
J
A
G
H
D
K
I
N
Represents a node that has received packet P
21Flooding for Data Delivery
Y
Represents transmission of packet P
Broadcast transmission
Z
S
E
F
B
C
M
L
J
A
G
H
D
K
I
N
Represents a node that receives packet P for the
first time
22Flooding for Data Delivery
Y
Z
S
E
F
B
C
M
L
J
A
G
H
D
K
I
N
- Node H receives packet P from two neighbors
- potential for collision
23Flooding for Data Delivery
Y
Z
S
E
F
B
C
M
L
J
A
G
H
D
K
I
N
- Node C receives packet P from G and H, but does
not forward - it again, because node C has already forwarded
packet P once
24Flooding for Data Delivery
Y
Z
S
E
F
B
C
M
L
J
A
G
H
D
K
I
N
- Nodes J and K both broadcast packet P to node D
- Since nodes J and K are hidden from each other,
their - transmissions may collide
- gt Packet P may not be delivered to node
D at all, - despite the use of flooding
25Flooding for Data Delivery
Y
Z
S
E
F
B
C
M
L
J
A
G
H
D
K
I
N
- Node D does not forward packet P, because node D
- is the intended destination of packet P
26Flooding for Data Delivery
Y
Z
S
E
F
B
C
M
L
J
A
G
H
D
K
I
N
- Flooding completed
- Nodes unreachable from S do not receive packet P
(e.g., node Z) - Nodes for which all paths from S go through the
destination D - also do not receive packet P (example node N)
27Flooding for Data Delivery
Y
Z
S
E
F
B
C
M
L
J
A
G
H
D
K
I
N
- Flooding may deliver packets to too many nodes
- (in the worst case, all nodes reachable from
sender - may receive the packet)
28Flooding for Data Delivery Advantages
- Simplicity
- May be more efficient than other protocols when
the rate of information transmission is low
enough that the overhead of explicit route
discovery/maintenance incurred by other protocols
is relatively higher - This scenario may occur, for instance, when nodes
transmit small data packets relatively
infrequently, and many topology changes occur
between consecutive packet transmissions - Potentially higher reliability of data delivery
- Because packets may be delivered to the
destination on multiple paths
29Flooding for Data Delivery Disadvantages
- Potentially, very high overhead
- Data packets may be delivered to too many nodes
who do not need to receive them - Potentially lower reliability of data delivery
- Flooding uses broadcasting -- hard to implement
reliable broadcast delivery without significantly
increasing overhead - In our example, nodes J and K may transmit to
node D simultaneously, resulting in loss of the
packet - In this case, destination would not receive the
packet at all
30Flooding of Control Packets
- Many protocols perform (potentially limited)
flooding of control packets, instead of data
packets - The control packets are used to discover routes
- Discovered routes are subsequently used to send
data packet(s) - Overhead of control packet flooding is amortized
over data packets transmitted between consecutive
control packet floods
31Dynamic Source Routing (DSR)
- In DSR, it consists of two steps
- route discovery a node tries to discover a route
to a destination if it has to send something to
its destination - route maintenance if a node detects the current
route has changed, it needs to find a new route - In route discovery, if node S wants to send a
packet to node D, but does not know a route to D,
node S initiates a route discovery (small size
message) - Source node S floods Route Request (RREQ)
- Each node appends own identifier when forwarding
RREQ - If a node has already received the request, it
will drop the request
32Route Discovery in DSR
Y
Z
S
E
F
B
C
M
L
J
A
G
H
D
K
I
N
Represents a node that has received RREQ for D
from S
33Route Discovery in DSR
Y
Broadcast transmission
Z
S
S
E
F
B
C
M
L
J
A
G
H
D
K
I
N
X,Y Represents list of identifiers appended
to RREQ
Represents transmission of RREQ
34Route Discovery in DSR
Y
Z
S
S,E
E
F
B
C
M
L
J
A
G
S,C
H
D
K
I
N
- Node H receives packet RREQ from two neighbors
- potential for collision
35Route Discovery in DSR
Y
Z
S
E
F
S,E,F
B
C
M
L
J
A
G
H
D
K
S,C,G
I
N
- Node C receives RREQ from G and H, but does not
forward - it again, because node C has already forwarded
RREQ once
36Route Discovery in DSR
Y
Z
S
E
F
S,E,F,J
B
C
M
L
J
A
G
H
D
K
I
N
S,C,G,K
- Nodes J and K both broadcast RREQ to node D
- Since nodes J and K are hidden from each other,
their - transmissions may collide
37Route Discovery in DSR
Y
Z
S
E
S,E,F,J,M
F
B
C
M
L
J
A
G
H
D
K
I
N
- Node D does not forward RREQ, because node D
- is the intended target of the route discovery
38Route Discovery in DSR
- Destination D on receiving the first RREQ, sends
a Route Reply (RREP) - RREP is sent on a route obtained by reversing the
route appended to received RREQ - RREP includes the route from S to D on which RREQ
was received by node D
39Route Reply in DSR
Y
Z
S
RREP S,E,F,J,D
E
F
B
C
M
L
J
A
G
H
D
K
I
N
Represents RREP control message
40Route Reply in DSR
- Route Reply can be sent by reversing the route in
Route Request (RREQ) only if links are guaranteed
to be bi-directional - To ensure this, RREQ should be forwarded only if
it received on a link that is known to be
bi-directional - If unidirectional (asymmetric) links are allowed,
then RREP may need a route discovery for S from
node D - Unless node D already knows a route to node S
- If a route discovery is initiated by D for a
route to S, then the Route Reply is piggybacked
on the Route Request from D
41Dynamic Source Routing (DSR)
- Node S on receiving RREP, caches the route
included in the RREP - When node S sends a data packet to D, the entire
route is included in the packet header - Hence the name source routing
- Intermediate nodes use the source route included
in a packet to determine to whom a packet should
be forwarded
42Data Delivery in DSR
Y
Z
DATA S,E,F,J,D
S
E
F
B
C
M
L
J
A
G
H
D
K
I
N
Packet header size grows with route length
43Dynamic Source Routing Advantages
- Routes maintained only between nodes who need to
communicate - reduces overhead of route maintenance
- Route caching can further reduce route discovery
overhead - A single route discovery may yield many routes to
the destination, due to intermediate nodes
replying from local caches
44Dynamic Source Routing Disadvantages
- Packet header size grows with route length due to
source routing - Flood of route requests may potentially reach all
nodes in the network - Care must be taken to avoid collisions between
route requests propagated by neighboring nodes - Insertion of random delays before forwarding RREQ
- Increased contention if too many route replies
come back due to nodes replying using their local
cache - Route Reply Storm problem
- Reply storm may be eased by preventing a node
from sending RREP if it hears another RREP with a
shorter route
45Enhancement to routing
- There may be multiple route from the source node
to the destination node. How to choose the route? - Interference
- The number of neighboring nodes
- If the number of neighboring nodes is larger, the
probability of having conflict in transmission is
higher. Therefore, more re-transmission and
greater waste in bandwidth - Energy level of the intermediate nodes
- Eliminate those nodes with energy level below a
threshold value - Location area
- Estimate the possible region of the destination
node - Broadcast the packets to the estimated region
- i.e., LAR
46Location-Aided Routing (LAR)
- Exploits location information to limit scope of
route request flood - Location information may be obtained using GPS
- Expected Zone is determined as a region that is
expected to hold the current location of the
destination - Expected region determined based on potentially
old location information, and knowledge of the
destinations speed - Route requests limited to a Request Zone that
contains the Expected Zone and location of the
sender node
47Expected Zone in LAR
X last known location of node D, at time
t0 Y location of node D at current time
t1, unknown to node S r (t1 - t0) estimate
of Ds speed
X
r
Y
Expected Zone
48Request Zone in LAR
Network Space
Request Zone
B
X
S
r
A
Y
49LAR
- Only nodes within the request zone forward route
requests - Node A does not forward RREQ, but node B does
(see previous slide) - Request zone explicitly specified in the route
request - Each node must know its physical location to
determine whether it is within the request zone - If route discovery using the smaller request zone
fails to find a route, the sender initiates
another route discovery (after a timeout) using a
larger request zone - the larger request zone may be the entire network
- Rest of route discovery protocol similar to DSR
50Energy-aware routing
- Only sensors with sufficient energy forward data
for other nodes - Example Routing via nodes with enough solar
power is considered for free
51System Monitoring and Surveillance
- Wireless sensor systems
- Needs to monitor the occurrences of (simple)
events in the system environment - I.e., When the temperature is higher than 50C
- I.e., The max and min pressure in a day
- Complex events
- The occurrences of multiple simple events at the
same time - The maximum temperatures of two rooms when the
pressure is higher than 1000mmHg - The light intensity at the arrival time of a bird
52System Monitoring
- Continuous monitoring queries (CMQs)
- Submit to monitor the events occurring in the
system environment for a period of time Begin
time and end time - A condition is defined. Once the condition is
satisfied, an alert is sent to the user - Based on the attributes defined in the condition,
a set of data items are identified as input to
the query - Access to a set of data items (pre-defined)
- The data items are generated by sensor nodes
distributed in the system environment - Sensor nodes
- Each sensor node may be installed multiple
sensors to capture different signals of the
system environment - Fixed sampling frequency
- Communicate through low bandwidth wireless
network
53In-Network Processing
- Processing of queries (two approaches)
- (1) Send sensor data to a centralized server for
processing - (2) Process the queries at the sensor nodes
- In-networking processing
- A query is divided into a set of sub-queries
- Each sub-query is processed at the sensor node
(called participating nodes) which is responsible
for generating its required data items - A coordinator node (one of the participating
sensor node) is responsible for aggregating the
results from the sensor nodes - Example get the average temperature of sensors A
to D from now for then 10 min if they are higher
than 100F - No need to transmit large volume of data to a
centralized server for processing - Issues routing and aggregation
54System Architecture
- MSPU Mobile sensor processing units
- Base Station connecting with MSPU through a
wireless link - Back-end server maintains a database, and
provides an interface for submitting CMQ and
displaying query results including performance
statistics
55Continuous Monitoring Query
- CMQi consists of a set of sub-queries, SCMQi,1,
SCMQi,2, SCMQi,n defined according to the
distribution of the required nodes of the query - One of the nodes is the coordinator node and the
others are participating nodes - Each sub-query contains a selection condition to
process on the sensor data from its node - A CMQ contains an aggregation condition for
execution, - i.e., to have the results from all the
sub-queries - Calculating the maximum value requires at least
two inputs
56Execution of CMQs
- Step 1
- Evaluation on the sensor data items generated by
a sensor node using the selection condition
defined in the sub-query - Step 2
- Sending sub-query evaluation results to the
coordinator node for evaluation if the
aggregation conditions have been satisfied - Report the query result to the client as a
function of time during the activation period of
the query
57Execution of CMQs
58An Example of a CMQ
- Get the maximum temperature of Sensors A and B
from now if they are higher than 100F on until 15
min later - CMQ1 (SCMQ1,1 , SCMQ1,2, Operation1, 1200,
1215) - SCMQ1,1 If temperature T1 of sensor data from
MSPU1 gt 100F, return the temperature - SCMQ1,2 If temperature T2 of sensor data from
MSPU2 gt 100F, return the temperature - Aggregate condition1 The output from both
SCMQ1,1 and SCMQ1,2 are data values - Aggregate operation1 IF T1 gt T2, return T1 ELSE
return T2
59Temporal Consistency
- The sensor nodes follow their pre-set frequency
(period) to generate sensor data values - A sensor data value is invalid if the new version
is generated - Data version X is valid if creation time of x
generation period gt current time - The main purpose of a CMQ is to monitor system
environment - Not to miss the occurrences of any such events
- Require continuous evaluation on sensor data
- Ensure that all results generated from the CMQ
are correct (consistent with the real situation
in the monitoring environment) - Require each evaluation on temporally consistent
data such that they are valid at the same time
point
60Temporal Inconsistency Problem
- If MSPU1 is assigned to be the coordinator node,
MSPU2 will forward its sub-query results to MSPU1
- Due to communication delay, the set of sub-query
results from MSPU2 received by MSPU1 will be
shifted by the transmission delay - The generated query results may become incorrect
- Incorrect light intensity at the arrival time of
a bird - Incorrect maximum temperature of the two rooms
61Temporal Inconsistency Problem
62Temporal Inconsistency Problem
- Time-stamping technique
- Using time-stamp to label the validity of a data
version - From lower valid time (LVT) to upper valid time
(UVT) - Relative consistency
- The intersection of the validity intervals of all
the accessed data items is non-empty -
- The data versions are not too old (currency
requirement) - Buffering
- The coordinator node buffer the received
sub-query results - Evaluation follows the relative consistency
requirement
63Aggregation Problem
- How to aggregate the sub-query results from the
participating nodes? - Objectives
- To minimize the aggregation cost (data
communication cost) - Fault tolerance to message loss
- Minimize the processing cost at the coordinator
node - Centralized aggregation
- Select a coordinator node which is close to all
the participating nodes
64Periodic Pushing
- The latest generated sub-query results form a
message and are forwarded to the coordinator node
every fixed submission period - Each message contains several sensor data
versions of a data item to minimize the message
overhead - Results are time-stamped to indicate their
validity intervals - Evaluation at the coordinator node follows the
time-stamps by searching the received data at the
buffer - Message loss can easily be detected
65Periodic Pushing
66Conditional Pushing
- Aims to reduce the sizes of data versions for
aggregation - The scheme is the same as periodic pushing except
the data values are compressed - Successive data versions with the same value are
compressed - Although the redundancy in data values within a
message is eliminated, the redundancy in
successive messages cannot be eliminated - The amount of bandwidth saved depends on how the
data values changes from the sensor node
67Conditional Pushing
68Sequential Pushing
- The sensor nodes of a CMQ are assumed to be close
to each other and they can directly communicate
with each other - The submission of sub-query results is triggered
by a triggering node which is one of the
participating nodes of the CMQ - The determination of which participating node to
be the triggering node is based on which one has
the least number of satisfied results in
evaluation - The pushing of sub-query results follows a
sequential order according to the evaluation
results - Partial processing of the operation, which is
originally to be performed at the coordinator
node, is performed on its way
69Sequential Pushing
70Sequential Pushing
- Due to dynamic properties of sensor data, the
probability of satisfying the condition in a
sub-query at a node may change with time - Reordering of the nodes
- Assigns the false node to be the first node in
the sequence. - All the nodes following the false node will
remain in the same relative order to each other. - All the nodes in front of the false node remain
in their original relative order. They rejoin the
node sequence by putting them after the last node
of the original sequence
71Sequential Pushing
72SeqPush Vs Centralized Scheme
- The total number of messages is normally smaller
especially for the case where the probability of
satisfying the aggregation conditions in all the
sub-queries is not high - The processing workload at the coordinator node
is lower as the participating nodes are
responsible for partial computation of the
aggregation operation - The processing cost in searching for relatively
consistent sensor data values will be lower due
to a false result from a sub-query
73References
- Schiller 8.3
- David B. Johnson and David A. Maltz, Dynamic
Source Routing in Ad Hoc Wireless Networks
(DSR), in Mobile Computing, 1996. - Young-Bae Ko and Nitin H. Vaidya, Location-aided
Routing (LAR) in Mobile Ad Hoc Networks, in
Proceedings of 1998 ACM International Conference
on Mobile Computing and Networking - Y. Yao and J. E. Gehrke, Query Processing in
Sensor Networks, in Proceedings of the First
Biennial Conference on Innovative Data Systems
Research (CIDR 2003), Asilomar, California,
January 2003