Data Dissemination

About This Presentation

Title:

Data Dissemination

Description:

... to read different sets of data objects (spatial and temporal properties) ... Spatial and temporal properties (queries at similar time and from the clients at ... – PowerPoint PPT presentation

Number of Views:94

Avg rating:3.0/5.0

Slides: 58

Provided by: CIT788

Category:

more less

Transcript and Presenter's Notes

Title: Data Dissemination

1
Data Dissemination

Data Dissemination Problems
Basic Dissemination Methods Push and Pull
Broadcast Disk (for read-only transactions)
Basic Schemes for Data Broadcast
The Hybrid (Push Pull) Approach
Temporal Consistency and Currency
Broadcasting Consistent Data
Multi-version Data Broadcast
Update First with Order (UFO)

2
Proactive Services
2. Infrared sensor detects users ID
Users ID
1. User enters room wearing
3. Display responds
Hello Roy
an active badge
to user
Infrared
Fr. Dollimore
Firstly, when a new object enters a smart space,
the list of services providing in the space have
to be downloaded to the object . How to delivery
the information to the new object? Secondly, the
object may generate requests to be supported by
other objects within the space. How to process
the requests? Thirdly, the object may have
submitted a query/CQ to monitor the status of the
space. How to support the execution of the
query/CQ?
3
Distributed Computing Query Processing
Strategies

Query shipping
The server (the service provider) maintains the
latest versions of data objects
Queries from clients are sent to a server for
processing
Query results are returned to the clients
Data shipping
Clients send data requests to the server (the
service provider)
The server returns the requested data objects to
the clients
The queries are processed by the clients

4
Query Shipping Vs. Data Shipping
Query Shipping
2. Query Processing
1. Requests
Client
Server
3. Results
Downlink channel
Uplink channel
3. Query Processing
1. Data requests
Client
Server
2. Data
Data Shipping
What are the tradeoffs? Transmission
overhead/Processing cost/ Scalability/processing
delay? Application and system characteristics,
i.e., data size, number of queries, etc.
5
Query Shipping Vs. Data Shipping

Which one is more suitable to pervasive
computing?
A large number of moving objects submit
(location-dependent) queries to access to
different types of real-time information
Monitor of real-time sensor data using the
in-network processing approach
Lack of powerful nodes at the device (network)
level
Mobile networks
Low bandwidth
Asymmetric bandwidth uplink bandwidth ltlt
downlink bandwidth
Transaction types
Mostly read-only transactions (queries)
Sensor readings of environment
Continuous queries with a begin and end time for
event detection (i.e., navigation, tracing)
Location-dependent queries at different
locations request to read different sets of data
objects (spatial and temporal properties)

6
Data Dissemination Problems

Data Shipping How to provide the required data
items to a large number of queries (from moving
objects) for execution?
Note Since they are read-only operations, no
need to update any data items at the server.
Reading (detect events)gt responses

Broadcast data to clients through a mobile network
7
Performance Objectives in Data Dissemination

Workload. To minimize the total data transmission
workload
Waiting delay. Some queries may have a deadline
on their completion time. Meeting the deadline is
important (to respond to the critical events
occurred in the system environment)
Tune in time (conserve energy)
The clients may not know whether their required
data will be shipped
A client may sleep and then weak-up to tune in
the broadcast channel to get its required data
item (avoid continuously monitor the broadcast
channel for data items)
Currency. Since most of the queries are asking
the real-time information of environment (i.e.,
temperature, traffic condition, etc.), the data
items provided to a query have to be the latest
versions (not out-dated)
Consistency. To ensure consistency (correctness)
of data items provided to a query (Temporal
consistency two data items reporting the status
of the environment at the same time point).
Otherwise, incorrect results may be generated

8
Data Dissemination MethodsPull Vs Push

Using data shipping for processing of mobile
queries
Scalability (Not to process queries at the server
and serve each data request one by one). The
arrival rate of data requests could be very high
due to large number of mobile clients
Suitable for monitoring and surveillance
applications (continuous queries). Why?
How can data shipping be applied to in-network
processing?
Pure Pull (on demand)
Clients explicitly (periodically for data
monitoring) send data requests to a server
through the uplink channel
The server returns the requested data items to
the clients through the downlink channel
Design problem
What is the pulling period (based on the dynamic
properties of the data). Pulling Period Vs.
transmission workload
Scalability problem (although the server does not
need to process the queries, it needs to serve
the data requests one by one)

9
Data Dissemination Method - Pulling
Point to point communication
Requests
Client 1
Server
Maintain a request queue and process the data
requests one by one
Data
Client 2
. . . .
Requests
Data
Client n
10
Data Dissemination MethodsPull Vs Push

Pure Push (broadcast)
Data shipping with prediction
To predict what the data requirements of the
queries are
Spatial and temporal properties (queries at
similar time and from the clients at similar
locations have similar data requirements)
The server defines a broadcast schedule, i.e.,
based on the popularity of data items (identify
the hot items using previous access statistics)
A broadcast schedule is a sequence of data items
to be broadcast by the server
The server repetitively broadcasts data according
in a broadcast schedule to a client population
without receiving any data requests
Clients monitors the broadcast channel and
retrieve the data items which they need as they
appear in the broadcast channels
Application of Push (data broadcast)
Listening to radio and watching TV
Information feeds such as stock quotes, sport
tickets, electronic newsletters, traffic and
weather information, cable TV

11
Data Dissemination Method Pushing
Broadcast data to all the clients (One to many
communication)
Broadcast Sever
The clients monitor the broadcast channel for
their needed data items
Server
Select data for broadcast
12
Comparison Pull Vs. Push (Pull)

Pull or Push? Which one is more suitable to
pervasive computing applications? It depends on
.
Pull requires a higher demand on uplink bandwidth
Sending pulling requests and returning data
Each data transmission can only serve one query
No waste in bandwidth although the bandwidth for
serving each request is higher
All the transmitted data are needed by clients
In Pull, a query knows when its required data
items will come (approximately, why?). The
clients play an active role. They tell the server
what they want
The workload at the server and the network
depends on the arrival rate of requests and
number of clients. If there are many data
requests, the waiting time will be very longer.
Missing their completion time?
Arrival rate A Service rate C
Utilization C/A
Queue length U/(1-U)
Assuming Poisson arrival and exponential service
time

13
Comparison Pull Vs. Push (Push)

Some data items are not wanted by any queries (a
waste in bandwidth)
It is only a prediction
Push is suitable for disseminating data items to
a large number of clients (more scalable) with
similar data requirements (hot data items)
One data push could meet the data requirements of
multiple queries (if the prediction is correct)
I.e. Many clients may want to know the latest
traffic condition at cross harbor tunnel
I.e. TV broadcast Vs. video on-demand
In Push, the total broadcast workload is
determined by server (I.e., the pushing rate,
number of data items to be pushing in each
second)
The server may introduce a delay in between two
broadcast schedule to reduce the broadcast
workload
Push is suitable to systems with small database
and small size data
Push is suitable to systems where the access
probability of data are non-even (hot data vs.
cold data)

14
Design Problems in Using Push

How to define the broadcast schedule?
What is the length of a broadcast schedule?
(Number of data items) (All items in the
database?)
The access to required data items is sequential
(a query consists of several read operations and
the operations are processed one after one.)
Clients need continuous listening to the
downlink channel
How to reduce the listening time to conserve
energy? Doze and weak-up mode of operations

15
Broadcast Index

To reduce the monitoring (tune-in) time, an index
is defined before each broadcast schedule starts
A broadcast schedule consists of two parts
A header index and a sequence of buckets of data
items (one bucket one data item, assuming same
size items)
The broadcast index indicates the broadcast
schedule of the data items in a broadcast cycle
From the index, a client can calculate the
broadcast time of its required data items
(current time position of data item in the
broadcast schedule x the time to broadcast a data
item)
Read the broadcast index, and then sleep until
the required data item is going to be broadcast

16
Broadcast Index
Broadcast Schedule
Index
i
Size/ Broadcast bandwidth
Tune-in
Tune-in
sleep
1 M
2 M
3 M
4 M
5 M
Size
17
Broadcast Schedules

A broadcast schedule is a sequence of data items
(bucket)
When a broadcast schedule is finished, the next
schedule will be defined and then be started
immediately (or after a fixed delay)
The use of different methods to define the
broadcast schedule affects the waiting time for
data items
Two types of (read-only) queries
Each query consists of a set of read operations
Unordered The operations can be executed in ANY
order depending on the arrivals of their required
data items
Ordered The operations have a predefined
execution sequence. I.e., Query i consists of two
operations, Readi(x) and Readi(y). It may be
defined that operation Read(x) has to be
completed before Read(y) can be started
The response time of a query is the time interval
from its generation time to the time when it
receives all its required data items (ignoring
the processing time of the last item)

18
Broadcast Schedules

The waiting time for a data item depends on
The length of a schedule
The position of the data item in a schedule
To minimize the mean waiting time of queries
Hot data items (popular data items) should be
broadcast with a higher frequency

Read(i)
Read(j)
Read(k)
Read(i) Read(j) Read(k)
Query x
Query x
19
Broadcast Schedules
Client 1
Client 2
Broadcast Sever
. . . .
Server
Broadcast Schedule
Client n
Index
20
Basic Schemes for Data Broadcast

Flat Disk, Skew Disk and Multi-Disk
Flat Disk (if it is difficult to identify the
hot items)
A broadcast schedule consists of all the items in
the database
In each broadcast cycle, all the data items in
the database will be broadcast one after one
until the end of the database (cycle). Then the
next cycle will be started from the first item
The time to complete one broadcast cycle equals
to the time to broadcast all data items in the
database
It is suitable for small databases, i.e.,
broadcast of stock items (currently we have about
1000 stock items)
Not scalable and not suitable for large database
systems and multimedia broadcast
The waiting time of a query for its required data
items depends on the size of the database and
their sizes
Mean waiting time for a data item is half cycle
length
What will be the waiting time for multiple data
items?

21
Flat Disk Schedule

All the data items in the database (A, B and C)
are broadcast with the same frequency

Could it be?
Unordered the operations can be performed in any
order, i.e., calculation of mean
Mean waiting time T/2
T is the time to finish one broadcast cycle
How about for queries with ordered operations?

22
Skewed broadcast

Some data items are identified to be hot data
items
Hot data items should be broadcast with a higher
frequency since they are more likely waiting by
queries
In skew broadcast, a broadcast schedule consists
a sub-set of hot data items in the database
How to define the length of a broadcast schedule
and how to choose the data items to be include in
a broadcast schedule?
Order the data items according to their access
probabilities which are calculated using previous
access statistics reported from the clients
(Some) Mobile clients may be requested to
generate a access report periodically to report
the broadcast server (i.e., the market survey of
a product)

23
Skewed broadcast

Design issue
Size of a broadcast schedule
Calculation of access probability for each item

Access Probability
Select to broadcast
Broadcast schedule size
Increase in access probability
24
Multi-Disk Schedule

Divide the data items in the database into
several groups based on their hot/cold properties
(access probability)
Each group forms a flat disk and the items in the
same disk have the same broadcast frequency
Note that the size of each group needs not to be
the same
The broadcast of data items in the same disk is
sequential, i.e., like a flat disk
Different disks have different broadcast
frequencies
Multiple broadcast disks gt Multi-Disk
Changing the disk speeds changes their broadcast
frequency
How to define the broadcast frequencies and the
schedule?
Using the average access probability of the group
of data items

25
Multi-Disk Schedule

Design issue
Calculation of access probability for each item
Grouping of data items
Assigning broadcast frequency

Access Probability
G3
G4
G5
G2
G1
Increase in access probability
26
Multiple Disk Schedule

Multiple disks of different sizes and speeds are
superimposed on the broadcast channel
Data item A is a hot data. Its broadcast
frequency is higher than B and C
Could it be?
What is the difference?
How to interleave the broadcast of cold/hot data
items so that the inter-arrival time between two
different instances of the same data item matches
the clients needs
What is the length of a broadcast schedule in
multi-disk?

27
Data Dissemination Methods

On-demand (Pull) broadcast
Clients send data requests to the server using
the uplink channel (if uplink bandwidth is
available)
Server defines the broadcast schedule based on
the received client requests and the access
probability of the data items
Hybrid using both Push and Pull
The down-link channel is divided into two parts
Some of the bandwidth is reserved for sending
data items to clients on demand
Some of the bandwidth is for data broadcast
following the broadcast schedule
How much bandwidth should be reserved for
pulling?
How to interleave the service to push and pull?
Suitable for queries which need to access
multiple data items
Data requests are only sent after waiting for a
long time
Using on-demand for cold items (data items in
slow disks)

28
Push and Pull Broadcast Schedules

Pre-defined broadcast frequency for each group of
data items according to applications and access
statistics
How to divide the bandwidth between broadcast
schedule and on demand schedule?
Access statistics
Periodic collection of access statistics from
mobile clients
Scheduling of on-demand requests
FCFS
Earliest deadline first (each query is assigned a
deadline for completion)
Longest waiting time first (the deadline
intervals of the queries are different)

29
Broadcast Schedules
Broadcast Schedule
push
pull
Client 1
Skew disk
Broadcasting
Client 2
Client 3
Client n
On demand data requests
Prioritization
30
Currency and Consistency in Data Broadcast

A query may require to read a set of data items
with pre-defined sequence
The definition of a transaction
Consists of a sequence of primitive operations
embraced between a begin and end markers
The operations may be ordered or unordered
(precedence constraints)

R(x) R(z)
C R(y)
Partial Order R(x) and R(y) may execute
concurrently or in any order
31
Execution Order and Data Broadcast

The constraints in execution of the operations in
a query can greatly increase the waiting time for
data items. Why?
The waiting time for completing a query depends
on both the broadcast schedule and the execution
orders of the operations in a query
Since the operation Read(z) cannot be performed
before Read(x) and Read(y), it cannot (does not
know) read z from the broadcast channel if it has
not obtained data item x
For the worst case, the waiting time is nC (C the
time to complete one broadcast cycle and n is the
number of items)
The problem will be more serious when we consider
two additional issues in data dissemination
currency and consistency

32
Meeting Currency Requirement

Update transactions are performed at the database
server to maintain the freshness of the data
items in the database (update streams)
Sensors periodic generation
Location update based on the adopted update
generation method, i.e., speed-dead reckoning
Data conflicts may occur between update
transactions and mobile queries
Update transactions are performed at database
server to maintain the freshness of data objects
in the database
Reading of data objects (by queries) are occurred
concurrently

33
Meeting Consistency Requirement

Definition data conflict two transactions have
a data conflict if the first one reads a data
object and second one updates the same object
before the commit (completion) of the first one
How to resolve data conflicts in a database
system?
The conflict cannot be detected by locking or
using the conventional concurrency control
methods
Distributed concurrency control problem
But, the overhead for locking in a wireless
network is too heavy
How to resolve the disconnection problem after
granting a lock to a client program
Data conflicts in transaction execution may
result in inconsistent data accesses
Generate incorrect results from the transactions

34
Broadcast Schedules
Client 1
Client 2
Broadcast Sever
. . . .
Server
updates
Client n
Index
Broadcast Schedule
35
Concurrent ExecutionInconsistent Retrieval
Problem
Transaction T Bank Withdraw ( A, 100 ) Bank
Deposit ( B, 100)
Transaction U Bank BranchTotal ()
balance A.Read () 200 A.Write (balance
100) 100
balance A.Read () 100 balance balance
B.Read () 300
balance B.Read () 200 B.Write (balance
100) 300
36
Correct Execution of Transactions

Schedule shows the execution orders of the
operations of a set transactions (update and
mobile transactions)
Serial execution (schedule)
Execute transactions one after one
The next transaction starts only after the
previous one has been committed or aborted
If we have two transactions, we may two different
serial schedules, I.e., T1 then T2, and T2 then
T1
Always maintain database consistency since all
transactions start from a consistent database
state
Serial equivalence (serializable)
Transactions are executed concurrently but the
result is equivalent to that of a serial schedule
of the same set of transactions (which serial
schedule? Any one)

37
Serial Execution
Transaction T BankWithdraw ( A, 100
) BankDeposit ( B, 100)
Transaction U BankBranchTotal ()
balance A.Read () 200 A.Write (balance
100) 100 balance B.Read () 200 B.Write
(balance 100) 300
balance A.Read () 100 balance balance
B.Read () 300 balance balance C.Read ()
400 .
38
Serial Equivalence
Transaction T BankWithdraw ( A, 4
) BankDeposit ( B, 4)
Transaction U BankWithdraw ( C, 3
) BankDeposit ( B, 3)
balance A.Read () 100 A.Write (balance
4) 96
balance C.Read () 300 C.Write (balance
3) 297
balance B.Read () 200 B.Write (balance
4) 204
balance B.Read () 204 B.Write (balance
3) 207
39
Consistency in Data Broadcast

How to determine the correctness in transaction
execution? I.e., under which situation the
conflict is harmful
Look at the execution order of the conflicting
operations in a schedule
Serialization graph (SG) each edge Ti ? Tj in a
SG means that at least one of Tis operations
precede and conflict with one of Tjs operations
At the client, a query consists a read operation
to read a data item x
At the server, an update transaction wants to
update x
Serializability theorem
A schedule is serializable iff SG(H) is acyclic

40
Consistency in Data Broadcast

Example 1 Data conflict between an MT and an
update transaction
Suppose update transaction, U, updates data item
d5 and then data item d2, and an MT wants to read
d2 and d5. Remember the update is performed at
the server and MT is executed at a mobile client.
If the schedule is
Server broadcasts d2
MT reads d2
U updates d5 d2
Server broadcasts d5
MT reads d5
The MT may observe inconsistent data values. The
serialization graph is cyclic such as MT -gt U -gt
MT and is non-serializable
The reason is that the MT reads a data item, d2,
which is in conflict with U before the update
from U and it reads a conflicting data item, d5,
after the update from U

41
Consistency in Data Broadcast

Example 2 An MT conflicts with two (or more)
update transactions
Even though the serialization order between an
update transaction and a mobile transaction is
acyclic, the final serialization graph can still
be cyclic due to transitive dependencies.
Suppose there are two updates U1 and U2 such that
U1 updates d2 and then d1, and U2 updates d1 and
then d5. If the schedule is
Broadcast transaction (BT) broadcasts d2
MT reads d2
U1 updates d2 d1
U2 updates d1 d5
Broadcast transaction (BT) broadcasts d5
MT reads d5
The serialization graph is cyclic such as U2 -gt
MT -gt U1 -gt U2

42
How to resolve this problem?

The conventional methods for concurrency control
is not suitable
Multiversion Data Broadcast
For flat disk only
Update with Order First
For flat disk, skew disk and multi-disk

43
Multi-Version Data Broadcast

Multi-version data broadcast
Broadcast multiple versions of a data item
(current version previous versions). How many
versions?
A Push-based method
No uplink data requests
Do not need to set any lock or to inform the
database server before accessing any data items
Maintains multiple versions for each data item
Each new update create a new version and the old
versions are still maintained in the system

44
Multi-Version Data Broadcast

Providing a consistent view to queries by batch
updates
The updates on data items are batch until the end
of a broadcast cycle even they arrive in the
middle of a broadcast cycle
During updates, the broadcast of data items is
suspended
The version number indicates at which cycle-end
the version is created
Even with no update, a new version is created
using the old version at the end of each
broadcast cycle
After the completion of the batch of updates, the
database is consistent and each newly created
data version is assigned a cycle number as its
version number
Accessing data versions in MV
If a query wants to access to a data object, it
will get the latest version of the data object
for its first read operation from the broadcast
cycle
The subsequent read operations of the query will
read the data objects with the same version
number of the first operation

45
Multi-Version Data Broadcast

How many versions to be broadcast?
In MV, it is assumed that each query has a
maximum life-span and no query exists in the
system longer than the life-span (L)
The life-span can be considered as the deadline
interval of a query. Start time deadline
interval deadline
After the deadline, the query will be aborted.
Why?
The maximum life-span of the queries together
with the time required for completing a broadcast
(BC) is used to calculate the number of versions
and the versions to be broadcast in a cycle for a
data item
L/BC
Assuming the use of flat disk
Why? What will be the problem if a skew disk is
used?

46
Multi-Version Data Broadcast

Why is data consistent guaranteed in MV?
The update and broadcast of data objects are NOT
interleaved
The view provided in each broadcast cycle is a
CONSISTENT view at the start time of the
broadcast cycle. What is the definition of a
transaction?
It is a consistent view since there is no
incomplete transactions in the system (partially
completed) at that time point
Remember if a transaction starts from an
consistent view, the database is consistent when
it is completed (assuming a concurrency control
method (i.e., 2PL) to resolve the data conflict
problem among the conflicting transactions

47
Multi-Version Data Broadcast

MV data broadcast

48
Multi-Version Data Broadcast

MV can be applied for accessing cached data
objects
The clients may maintain the previous versions of
data items at their caches and the same rule for
accessing broadcast data is used for accessing
cached items
The multi-version method is very useful for
systems where the mobile clients are frequently
disconnected from mobile network

49
Multi-Version Data Broadcast

Consistency Vs. Currency
Although MV broadcast can ensure consistency of
data objects provided to a mobile query, the
currency of data objects is sacrificed
Why? Delays (and even skipping) in processing
updates (batch updates)
The latest version of a data object to be
broadcast in a cycle is the last version before
the start of the broadcast cycle (how about the
others)
Each data object has to be broadcast at least
once in each cycle (flat disk). What will happen
if not?
Multiple version broadcast overhead
Point consistency Vs. interval consistency
MV provides a consistent view of the database
between the start time and end time of a query
How about the problem of continuous queries which
want to generate results continuously for an
interval? Some updates are skipped means some
events are ignored

50
Update-first with Order (UFO)

UFO is another algorithm to ensure data
consistency for mobile queries
In UFO, instead of detecting data conflicts
between mobile queries and update transactions,
it checks data conflicts between a broadcast
transaction and an update transaction
The broadcast schedule is modeled as a
transaction (BT)
The length of a BT is defined as the max life
time of a mobile query
The basic principle of the UFO algorithm is to
ensure that if data conflicts occur between a BT
and an update transaction, the serialization
order between them will always be U -gt BT
Since mobile queries (MT) read data items from
broadcast transactions, their serialization
orders are always BT -gt MT
Serialization order between the update
transactions and the mobile queries will always
be U -gt MT and serializable

51
Update-first with Order (UFO)

The execution of an update transaction (at
server) is divided into two phases the execution
phase and update phase
During the execution phase, an update transaction
is executed and data conflicts with other update
transactions will be resolved according to the
adopted concurrency control protocol
The updates of data items are written in a
private workspace of the transaction during the
execution phase
When all operations of an update transaction have
been executed, it enters the update phase
Permanent updates to the database is performed by
copying the new values from the private workspace
into the database
During the update phase, the broadcast of data
items is stopped (BT always observes a consistent
database)

52
Update-first with Order (UFO)

Before an update transaction starts its update
phase, the system detects data conflict between
the update transaction and the broadcast
transactions in the current and previous
broadcast cycles
At the start time of the update phase, the set of
data items to be updated by the update
transaction will be known as all its operations
have been completed
At the same time, the set of the data items to be
read (broadcast) by a broadcast transaction is
also known as it is resulted from a broadcast
algorithm
The two sets of data items will be compared. If
they are overlapped, there is a data conflict
The conflicting item will be rebroadcast
The overhead (re-broadcast) depends on the
conflict probability

53
Update-first with Order (UFO)

BT for any current broadcast
cycle i
OBT the set of data items of broadcast
transaction, BT
OU the set of data items of update
transaction, U
BA x OBT OU x is already broadcast when
U arrives
Before the permanent update starts, the following
algorithm is performed
If OBT OU
Then BT and U have no dependency
Else
If BA
Then the serialization order is U -gt BT
Else
For each data item i BA
re-broadcast data item i
Next
the serialization order is U -gt BT
End If
End If

54
Update-first with Order (UFO)
55
UFO Example

The broadcast transaction (BT) broadcasts d2
MT reads d2
Compare the data sets of U and BT
U updates d5
U updates d2
BT re-broadcast d2
MT reads the most updated value of d2
BT continue it process and broadcasts d5
MT reads d5
The serialization graph is acyclic such as U-gt MT

56
MV and UFO

Relative consistency problem
MV accessing data with the same versions
UFO assigning time-stamps to data versions to
indicate their validity. The checking is then
following the requirement of relative consistency
Ordered transaction problem
MV More versions are needed to be included since
the query life-span is longer
UFO Need to restart a query if the arrival order
is different from the access order of the data
items
The restart cost can be minimized by caching
previously accessed data items