Title: Data Dissemination
1Data Dissemination
- Data dissemination Problems
- Basic Data Dissemination Methods Push and Pull
- Broadcast Disk (for read-only transactions)
- Basic Schemes for Data Broadcast
- The Hybrid (Push Pull) Approach
- Temporal Consistency and Currency
- Broadcasting Consistent Data
- Multi-version Data Broadcast
- Update First with Order (UFO)
2Proactive Services
2. Infrared sensor detects users ID
Users ID
1. User enters room wearing
3. Display responds
Hello Roy
an active badge
to user
Infrared
Fr. Dollimore
Firstly, when a new object enters a smart space,
the list of services providing in the space have
to be downloaded to the object . How to delivery
the information to the new object? Secondly, the
object may generate requests to be supported by
other objects within the space. How to process
the requests? Thirdly, the object may have
submitted a query/CQ to monitor the status of the
space. How to support the execution of the
query/CQ?
3Distributed Computing Processing Strategies
- Query shipping
- The server (the service provider) maintains the
latest versions of data objects - Queries from clients are sent to a server for
processing - Query results are returned to the clients
- Data shipping
- Clients send data requests to the server (the
service provider) - The server returns the requested data objects to
the clients - The queries are processed by the clients
4Query Shipping Vs. Data Shipping
Query Shipping
2. Query Processing
1. Requests
Client
Server
3. Results
Downlink channel
Uplink channel
3. Query Processing
1. Data requests
Client
Server
2. Data
Data Shipping
What are the tradeoffs? Transmission
overhead/Processing cost/ Scalability/processing
delay? Application and system characteristics,
i.e., data size, number of queries, etc.
5Query Shipping Vs. Data Shipping
- Which one is more suitable to pervasive
computing? - A large number of moving objects submit
(location-dependent) queries to access to
different types of real-time information - Monitor of real-time sensor data using the
in-network processing approach - Lack of powerful nodes at the device (network)
level - Mobile networks
- Low bandwidth
- Asymmetric bandwidth uplink bandwidth ltlt
downlink bandwidth - Transaction types
- Mostly read-only transactions (queries)
- Sensor readings of environment
- Continuous queries with a begin and end time for
event detection (i.e., navigation, tracing) - Location-dependent queries at different
locations request to read different sets of data
objects (spatial and temporal properties)
6Data Dissemination Problems
- Data Shipping How to provide the required data
items to a large number of queries (from moving
objects) for execution? - Note Since they are read-only operations, no
need to update any data items at the server.
Reading (detect events)gt responses
Broadcast data to clients through a mobile network
7Data Dissemination Problems
- Performance Objectives
- Workload. To minimize the total data transmission
workload (mobile network) - Waiting delay. Some queries may have a deadline
on their completion time. Meeting the deadline is
important (to respond to the critical events
occurred in the system environment) - Tune in time (conserve energy)
- The clients may not know whether their required
data will be shipped - A client may sleep and then weak-up to tune in
the broadcast channel to get its required data
item (avoid continuously monitor the broadcast
channel for data items) - Currency. Since most of the queries are asking
the real-time information of environment (i.e.,
temperature, traffic condition, etc.), the data
items provided to a query have to be the latest
versions (not out-dated) - Consistency. To ensure consistency (correctness)
of data items provided to a query (Temporal
consistency two data items reporting the status
of the environment at the same time point).
Otherwise, incorrect results may be generated
8Data Dissemination MethodsPull Vs Push
- Using data shipping for processing of mobile
queries - Scalability (Not to process queries at the server
and serve each data request one by one). The
arrival rate of data requests could be very high
due to large number of mobile clients - Suitable for monitoring and surveillance
applications (continuous queries). Why? - How can data shipping be applied to in-network
processing? - Pure Pull (on demand)
- Clients explicitly (periodically for data
monitoring) send data requests to a server
through the uplink channel - The server returns the requested data items to
the clients through the downlink channel - Design problem
- What is the pulling period (based on the dynamic
properties of the data). Pulling Period Vs.
transmission workload - Scalability problem (although the server does not
need to process the queries, it needs to serve
the data requests one by one)
9Data Dissemination Method - Pulling
Point to point communication
Requests
Client 1
Server
Maintain a request queue and process the data
requests one by one
Data
Client 2
. . . .
Requests
Data
Client n
10Data Dissemination MethodsPull Vs Push
- Pure Push (broadcast)
- Data shipping with prediction
- To predict what the data requirements of the
queries are - Spatial and temporal properties (queries at
similar time and from the clients at similar
locations have similar data requirements) - The server defines a broadcast schedule, i.e.,
based on the popularity of data items (identify
the hot items using previous access statistics) - A broadcast schedule is a sequence of data items
to be broadcast by the server - The server repetitively broadcasts data according
in a broadcast schedule to a client population
without receiving any data requests - Clients monitors the broadcast channel and
retrieve the data items which they need as they
appear in the broadcast channels - Application of Push (data broadcast)
- Listening to radio and watching TV
- Information feeds such as stock quotes, sport
tickets, electronic newsletters, traffic and
weather information, cable TV
11Data Dissemination Method Pushing
Broadcast data to all the clients (One to many
communication)
Broadcast Sever
The clients monitor the broadcast channel for
their needed data items
Server
Select data for broadcast
12Comparison Pull Vs. Push (Pull)
- Pull or Push? Which one is more suitable to
pervasive computing applications? - Pull requires a higher demand on uplink bandwidth
- Sending pulling requests and returning data
- Each data transmission can only serve one query
- No waste in bandwidth although the bandwidth for
serving each request is higher - All the transmitted data are needed by clients
- In Pull, a query knows when its required data
items will come (approximately, why?). The
clients play an active role. They tell the server
what they want - The workload at the server and the network
depends on the arrival rate of requests and
number of clients. If there are many data
requests, the waiting time will be very longer.
Missing their completion time? - Arrival rate A Service rate C
- Utilization C/A
- Queue length U/(1-U)
- Assuming Poisson arrival and exponential service
time
13Comparison Pull Vs. Push (Push)
- Some data items are not wanted by any queries (a
waste in bandwidth) - It is only a prediction
- Push is suitable for disseminating data items to
a large number of clients (more scalable) with
similar data requirements (hot data items) - One data push could meet the data requirements of
multiple queries (if the prediction is correct) - I.e. Many clients may want to know the latest
traffic condition at cross harbor tunnel - I.e. TV broadcast Vs. video on-demand
- In Push, the total broadcast workload is
determined by server (I.e., the pushing rate,
number of data items to be pushing in each
second) - The server may introduce a delay in between two
broadcast schedule to reduce the broadcast
workload - Push is suitable to systems with small database
and small size data - Push is suitable to systems where the access
probability of data are non-even (hot data vs.
cold data)
14Design Problems in Using Push
- How to define the broadcast schedule?
- What is the length of a broadcast schedule?
(Number of data items) (All items in the
database?) - The access to required data items is sequential
(a query consists of several read operations and
the operations are processed one after one.) - Clients need continuous listening to the
downlink channel - How to reduce the listening time to conserve
energy? Doze and weak-up mode of operations
15Broadcast Index
- To reduce the monitoring (tune-in) time, an index
is defined before each broadcast schedule starts - A broadcast schedule consists of two parts
- A header index and a sequence of buckets of data
items (one bucket one data item, assuming same
size items) - The broadcast index indicates the broadcast
schedule of the data items in a broadcast cycle - From the index, a client can calculate the
broadcast time of its required data items
(current time position of data item in the
broadcast schedule x the time to broadcast a data
item) - Read the broadcast index, and then sleep until
the required data item is going to be broadcast
16Broadcast Index
Broadcast Schedule
Index
i
Size/ Broadcast bandwidth
Tune-in
Tune-in
sleep
1 M
2 M
3 M
4 M
5 M
Size
17Broadcast Schedules
- A broadcast schedule is a sequence of data items
(bucket) - When a broadcast schedule is finished, the next
schedule will be defined and then be started
immediately (or after a fixed delay) - The use of different methods to define the
broadcast schedule affects the waiting time for
data items - Two types of (read-only) queries
- Each query consists of a set of read operations
- Unordered The operations can be executed in ANY
order depending on the arrivals of their required
data items - Ordered The operations have a predefined
execution sequence. I.e., Query i consists of two
operations, Readi(x) and Readi(y). It may be
defined that operation Read(x) has to be
completed before Read(y) can be started - The response time of a query is the time interval
from its generation time to the time when it
receives all its required data items (ignoring
the processing time of the last item)
18Broadcast Schedules
- The waiting time for a data item depends on
- The length of a schedule
- The position of the data item in a schedule
- To minimize the mean waiting time of queries
- Hot data items (popular data items) should be
broadcast with a higher frequency
Read(i)
Read(j)
Read(k)
Read(i) Read(j) Read(k)
Query x
Query x
19Broadcast Schedules
Client 1
Client 2
Broadcast Sever
. . . .
Server
Broadcast Schedule
Client n
Index
20Basic Schemes for Data Broadcast
- Flat Disk, Skew Disk and Multi-Disk
- Flat Disk (if it is difficult to identify the
hot items) - A broadcast schedule consists of all the items in
the database - In each broadcast cycle, all the data items in
the database will be broadcast one after one
until the end of the database (cycle). Then the
next cycle will be started from the first item - The time to complete one broadcast cycle equals
to the time to broadcast all data items in the
database - It is suitable for small databases, i.e.,
broadcast of stock items (currently we have about
1000 stock items) - Not scalable and not suitable for large database
systems and multimedia broadcast - The waiting time of a query for its required data
items depends on the size of the database and
their sizes - Mean waiting time for a data item is half cycle
length - What will be the waiting time for multiple data
items?
21Flat Disk Schedule
- All the data items in the database (A, B and C)
are broadcast with the same frequency
- Could it be?
- Unordered the operations can be performed in any
order, i.e., calculation of mean - Mean waiting time T/2
- T is the time to finish one broadcast cycle
- How about for queries with ordered operations?
22Skewed broadcast
- Some data items are identified to be hot data
items - Hot data items should be broadcast with a higher
frequency since they are more likely waiting by
queries - In skew broadcast, a broadcast schedule consists
a sub-set of hot data items in the database - How to define the length of a broadcast schedule
and how to choose the data items to be include in
a broadcast schedule? - Order the data items according to their access
probabilities which are calculated using previous
access statistics reported from the clients - (Some) Mobile clients may be requested to
generate a access report periodically to report
the broadcast server (i.e., the market survey of
a product)
23Skewed broadcast
- Design issue
- Size of a broadcast schedule
- Calculation of access probability for each item
Access Probability
Select to broadcast
Broadcast schedule size
Increase in access probability
24Multi-Disk Schedule
- Divide the data items in the database into
several groups based on their hot/cold properties
(access probability) - Each group forms a flat disk and the items in the
same disk have the same broadcast frequency - Note that the size of each group needs not to be
the same - The broadcast of data items in the same disk is
sequential, i.e., like a flat disk - Different disks have different broadcast
frequencies - Multiple broadcast disks gt Multi-Disk
- Changing the disk speeds changes their broadcast
frequency - How to define the broadcast frequencies and the
schedule? - Using the average access probability of the group
of data items
25Multi-Disk Schedule
- Design issue
- Calculation of access probability for each item
- Grouping of data items
- Assigning broadcast frequency
Access Probability
G3
G4
G5
G2
G1
Increase in access probability
26Multiple Disk Schedule
- Multiple disks of different sizes and speeds are
superimposed on the broadcast channel - Data item A is a hot data. Its broadcast
frequency is higher than B and C - Could it be?
- What is the difference?
- How to interleave the broadcast of cold/hot data
items so that the inter-arrival time between two
different instances of the same data item matches
the clients needs - What is the length of a broadcast schedule in
multi-disk?
27Multiple Disk Schedule
- Order the data items from hottest to coldest
- Partition the list into multiple ranges, called
disks. Each disk consists of data items with
similar popularity. Let the number of disk be
num_disk - Choose the relative frequency of broadcast for
each disk based on their relative popularity - Cluster the items in each disk into smaller units
called chunks num_chunk(i) max_chunks/rel_freq(
i), where max_chunks is the lesat common multiple
of relative frequencies - Create broadcast schedule as follows
- For i 0 to max_chunks 1
- For j 1 to n
- k i mod num_chunks(j)
- Broadcast chunk Cj,k
-
28Data Dissemination Methods
- On-demand (Pull) broadcast
- Clients send data requests to the server using
the uplink channel (if uplink bandwidth is
available) - Server defines the broadcast schedule based on
the received client requests and the access
probability of the data items - Hybrid using both Push and Pull
- The down-link channel is divided into two parts
- Some of the bandwidth is reserved for sending
data items to clients on demand - Some of the bandwidth is for data broadcast
following the broadcast schedule - How much bandwidth should be reserved for
pulling? - How to interleave the service to push and pull?
- Suitable for queries which need to access
multiple data items - Data requests are only sent after waiting for a
long time - Using on-demand for cold items (data items in
slow disks)
29Push and Pull Broadcast Schedules
- Pre-defined broadcast frequency for each group of
data items according to applications and access
statistics - How to divide the bandwidth between broadcast
schedule and on demand schedule? - Access statistics
- Periodic collection of access statistics from
mobile clients - Scheduling of on-demand requests
- FCFS
- Earliest deadline first (each query is assigned a
deadline for completion) - Longest waiting time first (the deadline
intervals of the queries are different)
30Broadcast Schedules
Broadcast Schedule
push
pull
Client 1
Skew disk
Broadcasting
Client 2
Client 3
Client n
On demand data requests
Prioritization
31Currency and Consistency in Data Broadcast
- A query may require to read a set of data items
with pre-defined sequence - The definition of a transaction
- Consists of a sequence of primitive operations
embraced between a begin and end markers - The operations may be ordered or unordered
(precedence constraints)
R(x) R(z)
C R(y)
Partial Order R(x) and R(y) may execute
concurrently or in any order
32Execution Order and Data Broadcast
- The constraints in execution of the operations in
a query can greatly increase the waiting time for
data items. Why? - The waiting time for completing a query depends
on both the broadcast schedule and the execution
orders of the operations in a query - Since the operation Read(z) cannot be performed
before Read(x) and Read(y), it cannot (does not
know) read z from the broadcast channel if it has
not obtained data item x - For the worst case, the waiting time is nC (C the
time to complete one broadcast cycle and n is the
number of items) - The problem will be more serious when we consider
two additional issues in data dissemination
currency and consistency
33Meeting Currency Requirement
- Update transactions are performed at the database
server to maintain the freshness of the data
items in the database (update streams) - Sensors periodic generation
- Location update based on the adopted update
generation method, i.e., speed-dead reckoning - Temporal Consistency (Currency)
- At time t1, data item x is updated to 100
- At time t2, x is updated to 200 and the previous
value is invalid - Any new query after t2 should not read x 100
- Temporal Consistency How well the data objects
maintained in a database models the actual state
of a changing (dynamic) environment - In principle, we prefer no transaction (query) is
allowed to access to a data item which is invalid
(out-dated) - What is the meaning of out-dated item? A new
version has been created? Depends on the data
generation method? Depends on application
requirement - TC consists of two parts absolute and relative
consistency - Absolute consistency validity of individual data
item (base item) - Relative consistency the consistency amongst a
group of data items
34Temporal Consistency
- The value of a data item changes with time
continuously, I.e., the temperature and location
of a moving object (stream data) - Note in real cases, it is impossible to have the
instantaneous value of many real objects due to
continuous changes in value and delay in
installing updates - Approximate solution If the value of a data item
is close (within an acceptable value from
application view), the data version may still be
considered to be valid - A data item is absolute consistent (fresh) if it
timely (or approximately) reflects the current
state of an external object that the data item
models - The validity of a data item may be defined by an
absolute validity interval (avi) (life-span of a
data value) - When its avi expires, a new value is needed to
refresh the data item - No transaction is allowed to access out-dated
(stale) data item (absolutely inconsistent)
35Temporal Consistency
- I.e., Start_time (xi) AVIx gt current time
- Start_time is the creation time of the version
- We may use a time bound, upper valid time (UVT)
and lower valid time (LVT) to label the validity
interval of a data item (data version) - LVT of Xi start time of Xi
- UVT of Xi LVT of Xi AVIx
- How to define AVI for a data object?
- Different data objects may have different AVIs
- Based on the accuracy requirement and maximum
rate of change - The sampling (update) period should be smaller
than AVI
36Data Streams
Each version is created by a new update
xi
xi2
xi3
x
y
z
Time (later versions)
37Absolute Consistency Example
update
So, a new update is needed to be installed
AVIx
x becomes stale
xi
update
AVIy
yj
y becomes stale
Time
38Broadcast Schedule and Abs Consistency
- Query Q1 wants to read x and y and then performs
a computation (i.e, to compare the maximum of x
and y) - Q1 gets x from the broadcast channel at time t1
- If Q1 gets y at time t2 and t2 gt t1 avi(x),
then it needs to get x again since x is invalid
at t2 - To meet absolute consistency, the time difference
between getting the first data item and the last
data items lt avi of the first data object - Problem If a query wants to access to multiple
data objects from the broadcast channel, its
waiting time could be long. The above condition
may easily be violated
39Absolute Consistency Problem
update
AVI
xi
xi1
AVI
yj
x becomes stale
Read xi and read yj
Time
40Relative Consistency
- A set of data items is relatively consistent if
they are temporally correlated with each other,
i.e., representing the status of the entities at
the same time point - The set of data items accessed by a transaction
have to be relatively consistent. Otherwise, it
is observing information from different time
points - I.e., The calculation of the best path to the
destination using real-time traffic data - A query may be allowed to access to stale data
objects provided that they are relatively
consistent and are not too old - Definition of relative consistency Given a set
of data versions V from different data items, the
versions in V are relatively consistent if, -
- where VI(xi) LVT(xi), UVT(xi)
41Relative Consistency Example
update
update
x1
x2
update
update
y1
y2
Time
RC1
RC1 x1 y1 correct RC x1 y2 incorrect
42Relative Consistency
- Relative consistency is less restrictive
comparing with absolute consistency - If Q1 gets y at time t2 and t2 lt t1 avi(x),
then it does not need to get x again if y is
valid within the interval t1 to t1 avi(x) - This checking can be performed by comparing their
validity intervals - Note If a query observes absolute consistency,
its accessed data items are also relative
consistent - Of course, we also need to associate a currency
requirement in addition to relative consistency
requirement - The latest consistency point should not be older
than a certain time (currency threshold) from the
current time - I.e., when an intruder is reported, the detection
time should be within 30 seconds of the detection
43Meeting Consistency Requirement
- Data conflicts may occur between update
transactions and mobile queries - Update transactions are performed at database
server to maintain the freshness of data objects
in the database - Reading of data objects (by queries) are occurred
concurrently - Definition data conflict two transactions have
a data conflict if the first one reads a data
object and second one updates the same object
before the commit (completion) of the first one - How to resolve data conflicts in a database
system? - The conflict cannot be detected by locking or
using the conventional concurrency control
methods - Distributed concurrency control problem
- But, the overhead for locking in a wireless
network is too heavy - How to resolve the disconnection problem after
granting a lock to a client program - Data conflicts in transaction execution may
result in inconsistent data accesses - Generate incorrect results from the transactions
44Broadcast Schedules
Client 1
Client 2
Broadcast Sever
. . . .
Server
updates
Client n
Index
Broadcast Schedule
45Concurrent ExecutionInconsistent Retrieval
Problem
Transaction T Bank Withdraw ( A, 100 ) Bank
Deposit ( B, 100)
Transaction U Bank BranchTotal ()
balance A.Read () 200 A.Write (balance
100) 100
balance A.Read () 100 balance balance
B.Read () 300
balance B.Read () 200 B.Write (balance
100) 300
46Correct Execution of Transactions
- Schedule shows the execution orders of the
operations of a set transactions (update and
mobile transactions) - Serial execution (schedule)
- Execute transactions one after one
- The next transaction starts only after the
previous one has been committed or aborted - If we have two transactions, we may two different
serial schedules, I.e., T1 then T2, and T2 then
T1 - Always maintain database consistency since all
transactions start from a consistent database
state - Serial equivalence (serializable)
- Transactions are executed concurrently but the
result is equivalent to that of a serial schedule
of the same set of transactions (which serial
schedule? Any one)
47Serial execution
Transaction T BankWithdraw ( A, 100
) BankDeposit ( B, 100)
Transaction U BankBranchTotal ()
balance A.Read () 200 A.Write (balance
100) 100 balance B.Read () 200 B.Write
(balance 100) 300
balance A.Read () 100 balance balance
B.Read () 300 balance balance C.Read ()
400 .
48Serial equivalence
Transaction T BankWithdraw ( A, 4
) BankDeposit ( B, 4)
Transaction U BankWithdraw ( C, 3
) BankDeposit ( B, 3)
balance A.Read () 100 A.Write (balance
4) 96
balance C.Read () 300 C.Write (balance
3) 297
balance B.Read () 200 B.Write (balance
4) 204
balance B.Read () 204 B.Write (balance
3) 207
49Consistency in Data Broadcast
- How to determine the correctness in transaction
execution? I.e., under which situation the
conflict is harmful - Look at the execution order of the conflicting
operations in a schedule - Serialization graph (SG) each edge Ti ? Tj in a
SG means that at least one of Tis operations
precede and conflict with one of Tjs operations - At the client, a query consists a read operation
to read a data item x - At the server, an update transaction wants to
update x - Serializability theorem
- A schedule is serializable iff SG(H) is acyclic
50Consistency in Data Broadcast
- Example 1 Data conflict between an MT and an
update transaction - Suppose update transaction, U, updates data item
d5 and then data item d2, and an MT wants to read
d2 and d5. Remember the update is performed at
the server and MT is executed at a mobile client.
If the schedule is - Server broadcasts d2
- MT reads d2
- U updates d5 d2
- Server broadcasts d5
- MT reads d5
- The MT may observe inconsistent data values. The
serialization graph is cyclic such as MT -gt U -gt
MT and is non-serializable - The reason is that the MT reads a data item, d2,
which is in conflict with U before the update
from U and it reads a conflicting data item, d5,
after the update from U
51Consistency in Data Broadcast
- Example 2 An MT conflicts with two (or more)
update transactions - Even though the serialization order between an
update transaction and a mobile transaction is
acyclic, the final serialization graph can still
be cyclic due to transitive dependencies. - Suppose there are two updates U1 and U2 such that
U1 updates d2 and then d1, and U2 updates d1 and
then d5. If the schedule is - Broadcast transaction (BT) broadcasts d2
- MT reads d2
- U1 updates d2 d1
- U2 updates d1 d5
- Broadcast transaction (BT) broadcasts d5
- MT reads d5
- The serialization graph is cyclic such as U2 -gt
MT -gt U1 -gt U2
52How to resolve this problem?
- The conventional methods for concurrency control
is not suitable - Multiversion Data Broadcast
- For flat disk only
- Update with Order First
- For flat disk, skew disk and multi-disk
53Multi-version Data Broadcast
- Multi-version data broadcast
- Broadcast multiple versions of a data item
(current version previous versions). How many
versions? - A Push-based method
- No uplink data requests
- Do not need to set any lock or to inform the
database server before accessing any data items - Maintains multiple versions for each data item
- Each new update create a new version and the old
versions are still maintained in the system
54Multi-version Data Broadcast
- Providing a consistent view to queries by batch
updates - The updates on data items are batch until the end
of a broadcast cycle even they arrive in the
middle of a broadcast cycle - During updates, the broadcast of data items is
suspended - The version number indicates at which cycle-end
the version is created - Even with no update, a new version is created
using the old version at the end of each
broadcast cycle - After the completion of the batch of updates, the
database is consistent and each newly created
data version is assigned a cycle number as its
version number - Accessing data versions in MV
- If a query wants to access to a data object, it
will get the latest version of the data object
for its first read operation from the broadcast
cycle - The subsequent read operations of the query will
read the data objects with the same version
number of the first operation
55Multi-version Data Broadcast
- How many versions to be broadcast?
- In MV, it is assumed that each query has a
maximum life-span and no query exists in the
system longer than the life-span (L) - The life-span can be considered as the deadline
interval of a query. Start time deadline
interval deadline - After the deadline, the query will be aborted.
Why? - The maximum life-span of the queries together
with the time required for completing a broadcast
(BC) is used to calculate the number of versions
and the versions to be broadcast in a cycle for a
data item - L/BC
- Assuming the use of flat disk
- Why? What will be the problem if a skew disk is
used?
56Multi-version Data Broadcast
- Why is data consistent guaranteed in MV?
- The update and broadcast of data objects are NOT
interleaved - The view provided in each broadcast cycle is a
CONSISTENT view at the start time of the
broadcast cycle. What is the definition of a
transaction? - It is a consistent view since there is no
incomplete transactions in the system (partially
completed) at that time point - Remember if a transaction starts from an
consistent view, the database is consistent when
it is completed (assuming a concurrency control
method (i.e., 2PL) to resolve the data conflict
problem among the conflicting transactions
57Multi-version Data Broadcast
58Broadcast Organization in MV
- A set of multiversion data to be broadcast can be
represented as a two dimensional array version
numbers (Vno) and data ids (Did) - Dvali,k v means that the k-version of the
i-data item is equal to v - Data items can appear in any order
- Versions appear in descending order with the most
recent version appearing in the left most column
59Broadcast Organization in MV
- Organization of a broadcast schedule in MV can be
either horizontal or vertical - Horizontal broadcast
- Broadcasts all versions (with different Vno) of a
data item with a particular Did first, and then
the next Did - Vertical broadcast
- Broadcasts all data items (with different Did)
having a particular version (Vno), and then all
data items with the next version - The broadcast organization affects the client
access time - If users require different versions of a
particular data, horizontal broadcast may be
better - If users need the most recent data, vertical
broadcast may be better
60Compressed Organization
- Same value appears in different versions is
inefficient - If Dvali,k Dvali,k-1 v, we may merge them
into a single version to reduce broadcast overhead
61Compressed Organization
- Compressed horizontal broadcast
- 1x3 8x2 5 6 1x1 2 5 4x2
- Compressed vertical broadcast
- 1x3 8x2 6 5 1x1 4x2 5 2
62Multi-version Data Broadcast
- MV can be applied for accessing cached data
objects - The clients may maintain the previous versions of
data items at their caches and the same rule for
accessing broadcast data is used for accessing
cached items - The multi-version method is very useful for
systems where the mobile clients are frequently
disconnected from mobile network
63Multi-version Data Broadcast
- Consistency Vs. Currency
- Although MV broadcast can ensure consistency of
data objects provided to a mobile query, the
currency of data objects is sacrificed - Why? Delays (and even skipping) in processing
updates (batch updates) - The latest version of a data object to be
broadcast in a cycle is the last version before
the start of the broadcast cycle (how about the
others) - Each data object has to be broadcast at least
once in each cycle (flat disk). What will happen
if not? - Multiple version broadcast overhead
- Point consistency Vs. interval consistency
- MV provides a consistent view of the database
between the start time and end time of a query - How about the problem of continuous queries which
want to generate results continuously for an
interval? Some updates are skipped means some
events are ignored
64Update-first with Order (UFO)
- UFO is another algorithm to ensure data
consistency for mobile queries - In UFO, instead of detecting data conflicts
between mobile queries and update transactions,
it checks data conflicts between a broadcast
transaction and an update transaction - The broadcast schedule is modeled as a
transaction (BT) - The length of a BT is defined as the max life
time of a mobile query - The basic principle of the UFO algorithm is to
ensure that if data conflicts occur between a BT
and an update transaction, the serialization
order between them will always be U -gt BT - Since mobile queries (MT) read data items from
broadcast transactions, their serialization
orders are always BT -gt MT - Serialization order between the update
transactions and the mobile queries will always
be U -gt MT and serializable
65Update-first with Order (UFO)
- The execution of an update transaction (at
server) is divided into two phases the execution
phase and update phase - During the execution phase, an update transaction
is executed and data conflicts with other update
transactions will be resolved according to the
adopted concurrency control protocol - The updates of data items are written in a
private workspace of the transaction during the
execution phase - When all operations of an update transaction have
been executed, it enters the update phase - Permanent updates to the database is performed by
copying the new values from the private workspace
into the database - During the update phase, the broadcast of data
items is stopped (BT always observes a consistent
database)
66Update-first with Order (UFO)
- Before an update transaction starts its update
phase, the system detects data conflict between
the update transaction and the broadcast
transactions in the current and previous
broadcast cycles - At the start time of the update phase, the set of
data items to be updated by the update
transaction will be known as all its operations
have been completed - At the same time, the set of the data items to be
read (broadcast) by a broadcast transaction is
also known as it is resulted from a broadcast
algorithm - The two sets of data items will be compared. If
they are overlapped, there is a data conflict - The conflicting item will be rebroadcast
- The overhead (re-broadcast) depends on the
conflict probability
67Update-first with Order (UFO)
- BT for any current broadcast
cycle i - OBT the set of data items of broadcast
transaction, BT - OU the set of data items of update
transaction, U - BA x OBT OU x is already broadcast when
U arrives - Before the permanent update starts, the following
algorithm is performed - If OBT OU
- Then BT and U have no dependency
- Else
- If BA
- Then the serialization order is U -gt BT
- Else
- For each data item i BA
- re-broadcast data item i
- Next
- the serialization order is U -gt BT
- End If
- End If
68Update-first with Order (UFO)
69UFO Example
- The broadcast transaction (BT) broadcasts d2
- MT reads d2
- Compare the data sets of U and BT
- U updates d5
- U updates d2
- BT re-broadcast d2
- MT reads the most updated value of d2
- BT continue it process and broadcasts d5
- MT reads d5
- The serialization graph is acyclic such as U-gt MT
70MV and UFO
- Relative consistency problem
- MV accessing data with the same versions
- UFO assigning time-stamps to data versions to
indicate their validity. The checking is then
following the requirement of relative consistency - Ordered transaction problem
- MV More versions are needed to be included since
the query life-span is longer - UFO Need to restart a query if the arrival order
is different from the access order of the data
items - The restart cost can be minimized by caching
previously accessed data items
71Cache Data Management
- Clients may maintain data at cache to reduce the
access delay (access data items both from cache
or from broadcast cycle) - Which data items should be maintained at client
cache? - Hot data item? May not be the best choice since
they may be broadcast more frequent than cold
data items - Need to consider the size (since the cache size
is very limited), the access probability of a
data item by the mobile client, and the broadcast
frequency of the data item - Data items with longer valid time (longer update
period) - Other cache data management problems
- cache replacement using FIFO, least recently
used with consideration on the locality of the
transactions and the data validity length
72(No Transcript)
73Cache Data Management
- How to maintain coherency of data items at cache?
(Similar to the mutual consistency problem in
replicated databases) - Data coherency the difference between the value
of a data item at the client cache and the value
of the data item maintained at the server - Methods
- Auto-refreshment whenever a data item is being
broadcast and the same item is found at the
cache, the old version at the cache will be
replaced by the version from the broadcast cycle - Push when a latest version has created, the
server sends it to the clients which have cached
the data items (needs to maintain the caching
status information but can provide a closer
coherency). The server needs to maintain the
caching statues of the clients - Pull when the version is too old, ask the server
to send the latest one (Pull). How to determine?
Periodic? Ages? Higher communication overhead but
more fault tolerance since no state information
74Cache Data Management
- Validation of cached data coherency by
invalidation report - Compare the time-stamp (creation time) of the
version at client with the latest one at the
server - Invalidation reports have to be generated from
the server periodically (report period) to
validate the consistency of the data items (to
ensure that they are the most updated versions) - Before a mobile query is allowed to commit, it
has to check against the invalidation report to
ensure that all its accessed data are coherency
with the data items at the server - How to minimize the waiting time of a mobile
query?
75Cache Data Validation
- When to send an invalidation report? Frequency?
- High frequency high broadcast overhead
- What should be included in the report? Data item
Id and time-stamps - To deal with the problem of network
disconnection, the report length may be a
multiple of the report period - Report period is the period for sending a report
- Report length is the time frame covered by a
report
76References
- Schiller Mobile Communications, Ch 6.1 and 6.2
- Ozsu Principles of Distributed Database Systems,
Ch 16.4 - Michael J. Franklin, Stanley B. Zdonik "Data In
Your Face" Push Technology in Perspective.
SIGMOD Conference 1998 516-519 - Pitoura, E. and Chrysanthis, P.K., Scalable
Processing of Read-Only Transactions in Broadcast
Push, in Proceedings of International Conference
on Distributed Systems, May 1999.