Data Dissemination - PowerPoint PPT Presentation

1 / 76
About This Presentation
Title:

Data Dissemination

Description:

... to read different sets of data objects (spatial and temporal properties) ... Spatial and temporal properties (queries at similar time and from the clients at ... – PowerPoint PPT presentation

Number of Views:471
Avg rating:3.0/5.0
Slides: 77
Provided by: CIT788
Category:

less

Transcript and Presenter's Notes

Title: Data Dissemination


1
Data Dissemination
  • Data dissemination Problems
  • Basic Data Dissemination Methods Push and Pull
  • Broadcast Disk (for read-only transactions)
  • Basic Schemes for Data Broadcast
  • The Hybrid (Push Pull) Approach
  • Temporal Consistency and Currency
  • Broadcasting Consistent Data
  • Multi-version Data Broadcast
  • Update First with Order (UFO)

2
Proactive Services
2. Infrared sensor detects users ID
Users ID
1. User enters room wearing
3. Display responds
Hello Roy
an active badge
to user
Infrared
Fr. Dollimore
Firstly, when a new object enters a smart space,
the list of services providing in the space have
to be downloaded to the object . How to delivery
the information to the new object? Secondly, the
object may generate requests to be supported by
other objects within the space. How to process
the requests? Thirdly, the object may have
submitted a query/CQ to monitor the status of the
space. How to support the execution of the
query/CQ?
3
Distributed Computing Processing Strategies
  • Query shipping
  • The server (the service provider) maintains the
    latest versions of data objects
  • Queries from clients are sent to a server for
    processing
  • Query results are returned to the clients
  • Data shipping
  • Clients send data requests to the server (the
    service provider)
  • The server returns the requested data objects to
    the clients
  • The queries are processed by the clients

4
Query Shipping Vs. Data Shipping
Query Shipping
2. Query Processing
1. Requests
Client
Server
3. Results
Downlink channel
Uplink channel
3. Query Processing
1. Data requests
Client
Server
2. Data
Data Shipping
What are the tradeoffs? Transmission
overhead/Processing cost/ Scalability/processing
delay? Application and system characteristics,
i.e., data size, number of queries, etc.
5
Query Shipping Vs. Data Shipping
  • Which one is more suitable to pervasive
    computing?
  • A large number of moving objects submit
    (location-dependent) queries to access to
    different types of real-time information
  • Monitor of real-time sensor data using the
    in-network processing approach
  • Lack of powerful nodes at the device (network)
    level
  • Mobile networks
  • Low bandwidth
  • Asymmetric bandwidth uplink bandwidth ltlt
    downlink bandwidth
  • Transaction types
  • Mostly read-only transactions (queries)
  • Sensor readings of environment
  • Continuous queries with a begin and end time for
    event detection (i.e., navigation, tracing)
  • Location-dependent queries at different
    locations request to read different sets of data
    objects (spatial and temporal properties)

6
Data Dissemination Problems
  • Data Shipping How to provide the required data
    items to a large number of queries (from moving
    objects) for execution?
  • Note Since they are read-only operations, no
    need to update any data items at the server.
    Reading (detect events)gt responses

Broadcast data to clients through a mobile network
7
Data Dissemination Problems
  • Performance Objectives
  • Workload. To minimize the total data transmission
    workload (mobile network)
  • Waiting delay. Some queries may have a deadline
    on their completion time. Meeting the deadline is
    important (to respond to the critical events
    occurred in the system environment)
  • Tune in time (conserve energy)
  • The clients may not know whether their required
    data will be shipped
  • A client may sleep and then weak-up to tune in
    the broadcast channel to get its required data
    item (avoid continuously monitor the broadcast
    channel for data items)
  • Currency. Since most of the queries are asking
    the real-time information of environment (i.e.,
    temperature, traffic condition, etc.), the data
    items provided to a query have to be the latest
    versions (not out-dated)
  • Consistency. To ensure consistency (correctness)
    of data items provided to a query (Temporal
    consistency two data items reporting the status
    of the environment at the same time point).
    Otherwise, incorrect results may be generated

8
Data Dissemination MethodsPull Vs Push
  • Using data shipping for processing of mobile
    queries
  • Scalability (Not to process queries at the server
    and serve each data request one by one). The
    arrival rate of data requests could be very high
    due to large number of mobile clients
  • Suitable for monitoring and surveillance
    applications (continuous queries). Why?
  • How can data shipping be applied to in-network
    processing?
  • Pure Pull (on demand)
  • Clients explicitly (periodically for data
    monitoring) send data requests to a server
    through the uplink channel
  • The server returns the requested data items to
    the clients through the downlink channel
  • Design problem
  • What is the pulling period (based on the dynamic
    properties of the data). Pulling Period Vs.
    transmission workload
  • Scalability problem (although the server does not
    need to process the queries, it needs to serve
    the data requests one by one)

9
Data Dissemination Method - Pulling
Point to point communication
Requests
Client 1
Server
Maintain a request queue and process the data
requests one by one
Data
Client 2
. . . .
Requests
Data
Client n
10
Data Dissemination MethodsPull Vs Push
  • Pure Push (broadcast)
  • Data shipping with prediction
  • To predict what the data requirements of the
    queries are
  • Spatial and temporal properties (queries at
    similar time and from the clients at similar
    locations have similar data requirements)
  • The server defines a broadcast schedule, i.e.,
    based on the popularity of data items (identify
    the hot items using previous access statistics)
  • A broadcast schedule is a sequence of data items
    to be broadcast by the server
  • The server repetitively broadcasts data according
    in a broadcast schedule to a client population
    without receiving any data requests
  • Clients monitors the broadcast channel and
    retrieve the data items which they need as they
    appear in the broadcast channels
  • Application of Push (data broadcast)
  • Listening to radio and watching TV
  • Information feeds such as stock quotes, sport
    tickets, electronic newsletters, traffic and
    weather information, cable TV

11
Data Dissemination Method Pushing
Broadcast data to all the clients (One to many
communication)
Broadcast Sever
The clients monitor the broadcast channel for
their needed data items
Server
Select data for broadcast
12
Comparison Pull Vs. Push (Pull)
  • Pull or Push? Which one is more suitable to
    pervasive computing applications?
  • Pull requires a higher demand on uplink bandwidth
  • Sending pulling requests and returning data
  • Each data transmission can only serve one query
  • No waste in bandwidth although the bandwidth for
    serving each request is higher
  • All the transmitted data are needed by clients
  • In Pull, a query knows when its required data
    items will come (approximately, why?). The
    clients play an active role. They tell the server
    what they want
  • The workload at the server and the network
    depends on the arrival rate of requests and
    number of clients. If there are many data
    requests, the waiting time will be very longer.
    Missing their completion time?
  • Arrival rate A Service rate C
  • Utilization C/A
  • Queue length U/(1-U)
  • Assuming Poisson arrival and exponential service
    time

13
Comparison Pull Vs. Push (Push)
  • Some data items are not wanted by any queries (a
    waste in bandwidth)
  • It is only a prediction
  • Push is suitable for disseminating data items to
    a large number of clients (more scalable) with
    similar data requirements (hot data items)
  • One data push could meet the data requirements of
    multiple queries (if the prediction is correct)
  • I.e. Many clients may want to know the latest
    traffic condition at cross harbor tunnel
  • I.e. TV broadcast Vs. video on-demand
  • In Push, the total broadcast workload is
    determined by server (I.e., the pushing rate,
    number of data items to be pushing in each
    second)
  • The server may introduce a delay in between two
    broadcast schedule to reduce the broadcast
    workload
  • Push is suitable to systems with small database
    and small size data
  • Push is suitable to systems where the access
    probability of data are non-even (hot data vs.
    cold data)

14
Design Problems in Using Push
  • How to define the broadcast schedule?
  • What is the length of a broadcast schedule?
    (Number of data items) (All items in the
    database?)
  • The access to required data items is sequential
    (a query consists of several read operations and
    the operations are processed one after one.)
  • Clients need continuous listening to the
    downlink channel
  • How to reduce the listening time to conserve
    energy? Doze and weak-up mode of operations

15
Broadcast Index
  • To reduce the monitoring (tune-in) time, an index
    is defined before each broadcast schedule starts
  • A broadcast schedule consists of two parts
  • A header index and a sequence of buckets of data
    items (one bucket one data item, assuming same
    size items)
  • The broadcast index indicates the broadcast
    schedule of the data items in a broadcast cycle
  • From the index, a client can calculate the
    broadcast time of its required data items
    (current time position of data item in the
    broadcast schedule x the time to broadcast a data
    item)
  • Read the broadcast index, and then sleep until
    the required data item is going to be broadcast

16
Broadcast Index
Broadcast Schedule
Index
i
Size/ Broadcast bandwidth
Tune-in
Tune-in
sleep
1 M
2 M
3 M
4 M
5 M
Size
17
Broadcast Schedules
  • A broadcast schedule is a sequence of data items
    (bucket)
  • When a broadcast schedule is finished, the next
    schedule will be defined and then be started
    immediately (or after a fixed delay)
  • The use of different methods to define the
    broadcast schedule affects the waiting time for
    data items
  • Two types of (read-only) queries
  • Each query consists of a set of read operations
  • Unordered The operations can be executed in ANY
    order depending on the arrivals of their required
    data items
  • Ordered The operations have a predefined
    execution sequence. I.e., Query i consists of two
    operations, Readi(x) and Readi(y). It may be
    defined that operation Read(x) has to be
    completed before Read(y) can be started
  • The response time of a query is the time interval
    from its generation time to the time when it
    receives all its required data items (ignoring
    the processing time of the last item)

18
Broadcast Schedules
  • The waiting time for a data item depends on
  • The length of a schedule
  • The position of the data item in a schedule
  • To minimize the mean waiting time of queries
  • Hot data items (popular data items) should be
    broadcast with a higher frequency

Read(i)
Read(j)
Read(k)
Read(i) Read(j) Read(k)
Query x
Query x
19
Broadcast Schedules
Client 1
Client 2
Broadcast Sever
. . . .
Server
Broadcast Schedule
Client n
Index
20
Basic Schemes for Data Broadcast
  • Flat Disk, Skew Disk and Multi-Disk
  • Flat Disk (if it is difficult to identify the
    hot items)
  • A broadcast schedule consists of all the items in
    the database
  • In each broadcast cycle, all the data items in
    the database will be broadcast one after one
    until the end of the database (cycle). Then the
    next cycle will be started from the first item
  • The time to complete one broadcast cycle equals
    to the time to broadcast all data items in the
    database
  • It is suitable for small databases, i.e.,
    broadcast of stock items (currently we have about
    1000 stock items)
  • Not scalable and not suitable for large database
    systems and multimedia broadcast
  • The waiting time of a query for its required data
    items depends on the size of the database and
    their sizes
  • Mean waiting time for a data item is half cycle
    length
  • What will be the waiting time for multiple data
    items?

21
Flat Disk Schedule
  • All the data items in the database (A, B and C)
    are broadcast with the same frequency
  • Could it be?
  • Unordered the operations can be performed in any
    order, i.e., calculation of mean
  • Mean waiting time T/2
  • T is the time to finish one broadcast cycle
  • How about for queries with ordered operations?

22
Skewed broadcast
  • Some data items are identified to be hot data
    items
  • Hot data items should be broadcast with a higher
    frequency since they are more likely waiting by
    queries
  • In skew broadcast, a broadcast schedule consists
    a sub-set of hot data items in the database
  • How to define the length of a broadcast schedule
    and how to choose the data items to be include in
    a broadcast schedule?
  • Order the data items according to their access
    probabilities which are calculated using previous
    access statistics reported from the clients
  • (Some) Mobile clients may be requested to
    generate a access report periodically to report
    the broadcast server (i.e., the market survey of
    a product)

23
Skewed broadcast
  • Design issue
  • Size of a broadcast schedule
  • Calculation of access probability for each item

Access Probability
Select to broadcast
Broadcast schedule size
Increase in access probability
24
Multi-Disk Schedule
  • Divide the data items in the database into
    several groups based on their hot/cold properties
    (access probability)
  • Each group forms a flat disk and the items in the
    same disk have the same broadcast frequency
  • Note that the size of each group needs not to be
    the same
  • The broadcast of data items in the same disk is
    sequential, i.e., like a flat disk
  • Different disks have different broadcast
    frequencies
  • Multiple broadcast disks gt Multi-Disk
  • Changing the disk speeds changes their broadcast
    frequency
  • How to define the broadcast frequencies and the
    schedule?
  • Using the average access probability of the group
    of data items

25
Multi-Disk Schedule
  • Design issue
  • Calculation of access probability for each item
  • Grouping of data items
  • Assigning broadcast frequency

Access Probability
G3
G4
G5
G2
G1
Increase in access probability
26
Multiple Disk Schedule
  • Multiple disks of different sizes and speeds are
    superimposed on the broadcast channel
  • Data item A is a hot data. Its broadcast
    frequency is higher than B and C
  • Could it be?
  • What is the difference?
  • How to interleave the broadcast of cold/hot data
    items so that the inter-arrival time between two
    different instances of the same data item matches
    the clients needs
  • What is the length of a broadcast schedule in
    multi-disk?

27
Multiple Disk Schedule
  • Order the data items from hottest to coldest
  • Partition the list into multiple ranges, called
    disks. Each disk consists of data items with
    similar popularity. Let the number of disk be
    num_disk
  • Choose the relative frequency of broadcast for
    each disk based on their relative popularity
  • Cluster the items in each disk into smaller units
    called chunks num_chunk(i) max_chunks/rel_freq(
    i), where max_chunks is the lesat common multiple
    of relative frequencies
  • Create broadcast schedule as follows
  • For i 0 to max_chunks 1
  • For j 1 to n
  • k i mod num_chunks(j)
  • Broadcast chunk Cj,k

28
Data Dissemination Methods
  • On-demand (Pull) broadcast
  • Clients send data requests to the server using
    the uplink channel (if uplink bandwidth is
    available)
  • Server defines the broadcast schedule based on
    the received client requests and the access
    probability of the data items
  • Hybrid using both Push and Pull
  • The down-link channel is divided into two parts
  • Some of the bandwidth is reserved for sending
    data items to clients on demand
  • Some of the bandwidth is for data broadcast
    following the broadcast schedule
  • How much bandwidth should be reserved for
    pulling?
  • How to interleave the service to push and pull?
  • Suitable for queries which need to access
    multiple data items
  • Data requests are only sent after waiting for a
    long time
  • Using on-demand for cold items (data items in
    slow disks)

29
Push and Pull Broadcast Schedules
  • Pre-defined broadcast frequency for each group of
    data items according to applications and access
    statistics
  • How to divide the bandwidth between broadcast
    schedule and on demand schedule?
  • Access statistics
  • Periodic collection of access statistics from
    mobile clients
  • Scheduling of on-demand requests
  • FCFS
  • Earliest deadline first (each query is assigned a
    deadline for completion)
  • Longest waiting time first (the deadline
    intervals of the queries are different)

30
Broadcast Schedules
Broadcast Schedule
push
pull
Client 1
Skew disk
Broadcasting
Client 2
Client 3
Client n
On demand data requests
Prioritization
31
Currency and Consistency in Data Broadcast
  • A query may require to read a set of data items
    with pre-defined sequence
  • The definition of a transaction
  • Consists of a sequence of primitive operations
    embraced between a begin and end markers
  • The operations may be ordered or unordered
    (precedence constraints)

R(x) R(z)
C R(y)
Partial Order R(x) and R(y) may execute
concurrently or in any order
32
Execution Order and Data Broadcast
  • The constraints in execution of the operations in
    a query can greatly increase the waiting time for
    data items. Why?
  • The waiting time for completing a query depends
    on both the broadcast schedule and the execution
    orders of the operations in a query
  • Since the operation Read(z) cannot be performed
    before Read(x) and Read(y), it cannot (does not
    know) read z from the broadcast channel if it has
    not obtained data item x
  • For the worst case, the waiting time is nC (C the
    time to complete one broadcast cycle and n is the
    number of items)
  • The problem will be more serious when we consider
    two additional issues in data dissemination
    currency and consistency

33
Meeting Currency Requirement
  • Update transactions are performed at the database
    server to maintain the freshness of the data
    items in the database (update streams)
  • Sensors periodic generation
  • Location update based on the adopted update
    generation method, i.e., speed-dead reckoning
  • Temporal Consistency (Currency)
  • At time t1, data item x is updated to 100
  • At time t2, x is updated to 200 and the previous
    value is invalid
  • Any new query after t2 should not read x 100
  • Temporal Consistency How well the data objects
    maintained in a database models the actual state
    of a changing (dynamic) environment
  • In principle, we prefer no transaction (query) is
    allowed to access to a data item which is invalid
    (out-dated)
  • What is the meaning of out-dated item? A new
    version has been created? Depends on the data
    generation method? Depends on application
    requirement
  • TC consists of two parts absolute and relative
    consistency
  • Absolute consistency validity of individual data
    item (base item)
  • Relative consistency the consistency amongst a
    group of data items

34
Temporal Consistency
  • The value of a data item changes with time
    continuously, I.e., the temperature and location
    of a moving object (stream data)
  • Note in real cases, it is impossible to have the
    instantaneous value of many real objects due to
    continuous changes in value and delay in
    installing updates
  • Approximate solution If the value of a data item
    is close (within an acceptable value from
    application view), the data version may still be
    considered to be valid
  • A data item is absolute consistent (fresh) if it
    timely (or approximately) reflects the current
    state of an external object that the data item
    models
  • The validity of a data item may be defined by an
    absolute validity interval (avi) (life-span of a
    data value)
  • When its avi expires, a new value is needed to
    refresh the data item
  • No transaction is allowed to access out-dated
    (stale) data item (absolutely inconsistent)

35
Temporal Consistency
  • I.e., Start_time (xi) AVIx gt current time
  • Start_time is the creation time of the version
  • We may use a time bound, upper valid time (UVT)
    and lower valid time (LVT) to label the validity
    interval of a data item (data version)
  • LVT of Xi start time of Xi
  • UVT of Xi LVT of Xi AVIx
  • How to define AVI for a data object?
  • Different data objects may have different AVIs
  • Based on the accuracy requirement and maximum
    rate of change
  • The sampling (update) period should be smaller
    than AVI

36
Data Streams
Each version is created by a new update
xi
xi2
xi3
x
y
z
Time (later versions)
37
Absolute Consistency Example
update
So, a new update is needed to be installed
AVIx
x becomes stale
xi
update
AVIy
yj
y becomes stale
Time
38
Broadcast Schedule and Abs Consistency
  • Query Q1 wants to read x and y and then performs
    a computation (i.e, to compare the maximum of x
    and y)
  • Q1 gets x from the broadcast channel at time t1
  • If Q1 gets y at time t2 and t2 gt t1 avi(x),
    then it needs to get x again since x is invalid
    at t2
  • To meet absolute consistency, the time difference
    between getting the first data item and the last
    data items lt avi of the first data object
  • Problem If a query wants to access to multiple
    data objects from the broadcast channel, its
    waiting time could be long. The above condition
    may easily be violated

39
Absolute Consistency Problem
update
AVI
xi
xi1
AVI
yj
x becomes stale
Read xi and read yj
Time
40
Relative Consistency
  • A set of data items is relatively consistent if
    they are temporally correlated with each other,
    i.e., representing the status of the entities at
    the same time point
  • The set of data items accessed by a transaction
    have to be relatively consistent. Otherwise, it
    is observing information from different time
    points
  • I.e., The calculation of the best path to the
    destination using real-time traffic data
  • A query may be allowed to access to stale data
    objects provided that they are relatively
    consistent and are not too old
  • Definition of relative consistency Given a set
    of data versions V from different data items, the
    versions in V are relatively consistent if,
  • where VI(xi) LVT(xi), UVT(xi)

41
Relative Consistency Example
update
update
x1
x2
update
update
y1
y2
Time
RC1
RC1 x1 y1 correct RC x1 y2 incorrect
42
Relative Consistency
  • Relative consistency is less restrictive
    comparing with absolute consistency
  • If Q1 gets y at time t2 and t2 lt t1 avi(x),
    then it does not need to get x again if y is
    valid within the interval t1 to t1 avi(x)
  • This checking can be performed by comparing their
    validity intervals
  • Note If a query observes absolute consistency,
    its accessed data items are also relative
    consistent
  • Of course, we also need to associate a currency
    requirement in addition to relative consistency
    requirement
  • The latest consistency point should not be older
    than a certain time (currency threshold) from the
    current time
  • I.e., when an intruder is reported, the detection
    time should be within 30 seconds of the detection

43
Meeting Consistency Requirement
  • Data conflicts may occur between update
    transactions and mobile queries
  • Update transactions are performed at database
    server to maintain the freshness of data objects
    in the database
  • Reading of data objects (by queries) are occurred
    concurrently
  • Definition data conflict two transactions have
    a data conflict if the first one reads a data
    object and second one updates the same object
    before the commit (completion) of the first one
  • How to resolve data conflicts in a database
    system?
  • The conflict cannot be detected by locking or
    using the conventional concurrency control
    methods
  • Distributed concurrency control problem
  • But, the overhead for locking in a wireless
    network is too heavy
  • How to resolve the disconnection problem after
    granting a lock to a client program
  • Data conflicts in transaction execution may
    result in inconsistent data accesses
  • Generate incorrect results from the transactions

44
Broadcast Schedules
Client 1
Client 2
Broadcast Sever
. . . .
Server
updates
Client n
Index
Broadcast Schedule
45
Concurrent ExecutionInconsistent Retrieval
Problem
Transaction T Bank Withdraw ( A, 100 ) Bank
Deposit ( B, 100)
Transaction U Bank BranchTotal ()
balance A.Read () 200 A.Write (balance
100) 100
balance A.Read () 100 balance balance
B.Read () 300
balance B.Read () 200 B.Write (balance
100) 300
46
Correct Execution of Transactions
  • Schedule shows the execution orders of the
    operations of a set transactions (update and
    mobile transactions)
  • Serial execution (schedule)
  • Execute transactions one after one
  • The next transaction starts only after the
    previous one has been committed or aborted
  • If we have two transactions, we may two different
    serial schedules, I.e., T1 then T2, and T2 then
    T1
  • Always maintain database consistency since all
    transactions start from a consistent database
    state
  • Serial equivalence (serializable)
  • Transactions are executed concurrently but the
    result is equivalent to that of a serial schedule
    of the same set of transactions (which serial
    schedule? Any one)

47
Serial execution
Transaction T BankWithdraw ( A, 100
) BankDeposit ( B, 100)
Transaction U BankBranchTotal ()
balance A.Read () 200 A.Write (balance
100) 100 balance B.Read () 200 B.Write
(balance 100) 300
balance A.Read () 100 balance balance
B.Read () 300 balance balance C.Read ()
400 .
48
Serial equivalence
Transaction T BankWithdraw ( A, 4
) BankDeposit ( B, 4)
Transaction U BankWithdraw ( C, 3
) BankDeposit ( B, 3)
balance A.Read () 100 A.Write (balance
4) 96
balance C.Read () 300 C.Write (balance
3) 297
balance B.Read () 200 B.Write (balance
4) 204
balance B.Read () 204 B.Write (balance
3) 207
49
Consistency in Data Broadcast
  • How to determine the correctness in transaction
    execution? I.e., under which situation the
    conflict is harmful
  • Look at the execution order of the conflicting
    operations in a schedule
  • Serialization graph (SG) each edge Ti ? Tj in a
    SG means that at least one of Tis operations
    precede and conflict with one of Tjs operations
  • At the client, a query consists a read operation
    to read a data item x
  • At the server, an update transaction wants to
    update x
  • Serializability theorem
  • A schedule is serializable iff SG(H) is acyclic

50
Consistency in Data Broadcast
  • Example 1 Data conflict between an MT and an
    update transaction
  • Suppose update transaction, U, updates data item
    d5 and then data item d2, and an MT wants to read
    d2 and d5. Remember the update is performed at
    the server and MT is executed at a mobile client.
    If the schedule is
  • Server broadcasts d2
  • MT reads d2
  • U updates d5 d2
  • Server broadcasts d5
  • MT reads d5
  • The MT may observe inconsistent data values. The
    serialization graph is cyclic such as MT -gt U -gt
    MT and is non-serializable
  • The reason is that the MT reads a data item, d2,
    which is in conflict with U before the update
    from U and it reads a conflicting data item, d5,
    after the update from U

51
Consistency in Data Broadcast
  • Example 2 An MT conflicts with two (or more)
    update transactions
  • Even though the serialization order between an
    update transaction and a mobile transaction is
    acyclic, the final serialization graph can still
    be cyclic due to transitive dependencies.
  • Suppose there are two updates U1 and U2 such that
    U1 updates d2 and then d1, and U2 updates d1 and
    then d5. If the schedule is
  • Broadcast transaction (BT) broadcasts d2
  • MT reads d2
  • U1 updates d2 d1
  • U2 updates d1 d5
  • Broadcast transaction (BT) broadcasts d5
  • MT reads d5
  • The serialization graph is cyclic such as U2 -gt
    MT -gt U1 -gt U2

52
How to resolve this problem?
  • The conventional methods for concurrency control
    is not suitable
  • Multiversion Data Broadcast
  • For flat disk only
  • Update with Order First
  • For flat disk, skew disk and multi-disk

53
Multi-version Data Broadcast
  • Multi-version data broadcast
  • Broadcast multiple versions of a data item
    (current version previous versions). How many
    versions?
  • A Push-based method
  • No uplink data requests
  • Do not need to set any lock or to inform the
    database server before accessing any data items
  • Maintains multiple versions for each data item
  • Each new update create a new version and the old
    versions are still maintained in the system

54
Multi-version Data Broadcast
  • Providing a consistent view to queries by batch
    updates
  • The updates on data items are batch until the end
    of a broadcast cycle even they arrive in the
    middle of a broadcast cycle
  • During updates, the broadcast of data items is
    suspended
  • The version number indicates at which cycle-end
    the version is created
  • Even with no update, a new version is created
    using the old version at the end of each
    broadcast cycle
  • After the completion of the batch of updates, the
    database is consistent and each newly created
    data version is assigned a cycle number as its
    version number
  • Accessing data versions in MV
  • If a query wants to access to a data object, it
    will get the latest version of the data object
    for its first read operation from the broadcast
    cycle
  • The subsequent read operations of the query will
    read the data objects with the same version
    number of the first operation

55
Multi-version Data Broadcast
  • How many versions to be broadcast?
  • In MV, it is assumed that each query has a
    maximum life-span and no query exists in the
    system longer than the life-span (L)
  • The life-span can be considered as the deadline
    interval of a query. Start time deadline
    interval deadline
  • After the deadline, the query will be aborted.
    Why?
  • The maximum life-span of the queries together
    with the time required for completing a broadcast
    (BC) is used to calculate the number of versions
    and the versions to be broadcast in a cycle for a
    data item
  • L/BC
  • Assuming the use of flat disk
  • Why? What will be the problem if a skew disk is
    used?

56
Multi-version Data Broadcast
  • Why is data consistent guaranteed in MV?
  • The update and broadcast of data objects are NOT
    interleaved
  • The view provided in each broadcast cycle is a
    CONSISTENT view at the start time of the
    broadcast cycle. What is the definition of a
    transaction?
  • It is a consistent view since there is no
    incomplete transactions in the system (partially
    completed) at that time point
  • Remember if a transaction starts from an
    consistent view, the database is consistent when
    it is completed (assuming a concurrency control
    method (i.e., 2PL) to resolve the data conflict
    problem among the conflicting transactions

57
Multi-version Data Broadcast
  • MV data broadcast

58
Broadcast Organization in MV
  • A set of multiversion data to be broadcast can be
    represented as a two dimensional array version
    numbers (Vno) and data ids (Did)
  • Dvali,k v means that the k-version of the
    i-data item is equal to v
  • Data items can appear in any order
  • Versions appear in descending order with the most
    recent version appearing in the left most column

59
Broadcast Organization in MV
  • Organization of a broadcast schedule in MV can be
    either horizontal or vertical
  • Horizontal broadcast
  • Broadcasts all versions (with different Vno) of a
    data item with a particular Did first, and then
    the next Did
  • Vertical broadcast
  • Broadcasts all data items (with different Did)
    having a particular version (Vno), and then all
    data items with the next version
  • The broadcast organization affects the client
    access time
  • If users require different versions of a
    particular data, horizontal broadcast may be
    better
  • If users need the most recent data, vertical
    broadcast may be better

60
Compressed Organization
  • Same value appears in different versions is
    inefficient
  • If Dvali,k Dvali,k-1 v, we may merge them
    into a single version to reduce broadcast overhead

61
Compressed Organization
  • Compressed horizontal broadcast
  • 1x3 8x2 5 6 1x1 2 5 4x2
  • Compressed vertical broadcast
  • 1x3 8x2 6 5 1x1 4x2 5 2

62
Multi-version Data Broadcast
  • MV can be applied for accessing cached data
    objects
  • The clients may maintain the previous versions of
    data items at their caches and the same rule for
    accessing broadcast data is used for accessing
    cached items
  • The multi-version method is very useful for
    systems where the mobile clients are frequently
    disconnected from mobile network

63
Multi-version Data Broadcast
  • Consistency Vs. Currency
  • Although MV broadcast can ensure consistency of
    data objects provided to a mobile query, the
    currency of data objects is sacrificed
  • Why? Delays (and even skipping) in processing
    updates (batch updates)
  • The latest version of a data object to be
    broadcast in a cycle is the last version before
    the start of the broadcast cycle (how about the
    others)
  • Each data object has to be broadcast at least
    once in each cycle (flat disk). What will happen
    if not?
  • Multiple version broadcast overhead
  • Point consistency Vs. interval consistency
  • MV provides a consistent view of the database
    between the start time and end time of a query
  • How about the problem of continuous queries which
    want to generate results continuously for an
    interval? Some updates are skipped means some
    events are ignored

64
Update-first with Order (UFO)
  • UFO is another algorithm to ensure data
    consistency for mobile queries
  • In UFO, instead of detecting data conflicts
    between mobile queries and update transactions,
    it checks data conflicts between a broadcast
    transaction and an update transaction
  • The broadcast schedule is modeled as a
    transaction (BT)
  • The length of a BT is defined as the max life
    time of a mobile query
  • The basic principle of the UFO algorithm is to
    ensure that if data conflicts occur between a BT
    and an update transaction, the serialization
    order between them will always be U -gt BT
  • Since mobile queries (MT) read data items from
    broadcast transactions, their serialization
    orders are always BT -gt MT
  • Serialization order between the update
    transactions and the mobile queries will always
    be U -gt MT and serializable

65
Update-first with Order (UFO)
  • The execution of an update transaction (at
    server) is divided into two phases the execution
    phase and update phase
  • During the execution phase, an update transaction
    is executed and data conflicts with other update
    transactions will be resolved according to the
    adopted concurrency control protocol
  • The updates of data items are written in a
    private workspace of the transaction during the
    execution phase
  • When all operations of an update transaction have
    been executed, it enters the update phase
  • Permanent updates to the database is performed by
    copying the new values from the private workspace
    into the database
  • During the update phase, the broadcast of data
    items is stopped (BT always observes a consistent
    database)

66
Update-first with Order (UFO)
  • Before an update transaction starts its update
    phase, the system detects data conflict between
    the update transaction and the broadcast
    transactions in the current and previous
    broadcast cycles
  • At the start time of the update phase, the set of
    data items to be updated by the update
    transaction will be known as all its operations
    have been completed
  • At the same time, the set of the data items to be
    read (broadcast) by a broadcast transaction is
    also known as it is resulted from a broadcast
    algorithm
  • The two sets of data items will be compared. If
    they are overlapped, there is a data conflict
  • The conflicting item will be rebroadcast
  • The overhead (re-broadcast) depends on the
    conflict probability

67
Update-first with Order (UFO)
  • BT for any current broadcast
    cycle i
  • OBT the set of data items of broadcast
    transaction, BT
  • OU the set of data items of update
    transaction, U
  • BA x OBT OU x is already broadcast when
    U arrives
  • Before the permanent update starts, the following
    algorithm is performed
  • If OBT OU
  • Then BT and U have no dependency
  • Else
  • If BA
  • Then the serialization order is U -gt BT
  • Else
  • For each data item i BA
  • re-broadcast data item i
  • Next
  • the serialization order is U -gt BT
  • End If
  • End If

68
Update-first with Order (UFO)
69
UFO Example
  • The broadcast transaction (BT) broadcasts d2
  • MT reads d2
  • Compare the data sets of U and BT
  • U updates d5
  • U updates d2
  • BT re-broadcast d2
  • MT reads the most updated value of d2
  • BT continue it process and broadcasts d5
  • MT reads d5
  • The serialization graph is acyclic such as U-gt MT

70
MV and UFO
  • Relative consistency problem
  • MV accessing data with the same versions
  • UFO assigning time-stamps to data versions to
    indicate their validity. The checking is then
    following the requirement of relative consistency
  • Ordered transaction problem
  • MV More versions are needed to be included since
    the query life-span is longer
  • UFO Need to restart a query if the arrival order
    is different from the access order of the data
    items
  • The restart cost can be minimized by caching
    previously accessed data items

71
Cache Data Management
  • Clients may maintain data at cache to reduce the
    access delay (access data items both from cache
    or from broadcast cycle)
  • Which data items should be maintained at client
    cache?
  • Hot data item? May not be the best choice since
    they may be broadcast more frequent than cold
    data items
  • Need to consider the size (since the cache size
    is very limited), the access probability of a
    data item by the mobile client, and the broadcast
    frequency of the data item
  • Data items with longer valid time (longer update
    period)
  • Other cache data management problems
  • cache replacement using FIFO, least recently
    used with consideration on the locality of the
    transactions and the data validity length

72
(No Transcript)
73
Cache Data Management
  • How to maintain coherency of data items at cache?
    (Similar to the mutual consistency problem in
    replicated databases)
  • Data coherency the difference between the value
    of a data item at the client cache and the value
    of the data item maintained at the server
  • Methods
  • Auto-refreshment whenever a data item is being
    broadcast and the same item is found at the
    cache, the old version at the cache will be
    replaced by the version from the broadcast cycle
  • Push when a latest version has created, the
    server sends it to the clients which have cached
    the data items (needs to maintain the caching
    status information but can provide a closer
    coherency). The server needs to maintain the
    caching statues of the clients
  • Pull when the version is too old, ask the server
    to send the latest one (Pull). How to determine?
    Periodic? Ages? Higher communication overhead but
    more fault tolerance since no state information

74
Cache Data Management
  • Validation of cached data coherency by
    invalidation report
  • Compare the time-stamp (creation time) of the
    version at client with the latest one at the
    server
  • Invalidation reports have to be generated from
    the server periodically (report period) to
    validate the consistency of the data items (to
    ensure that they are the most updated versions)
  • Before a mobile query is allowed to commit, it
    has to check against the invalidation report to
    ensure that all its accessed data are coherency
    with the data items at the server
  • How to minimize the waiting time of a mobile
    query?

75
Cache Data Validation
  • When to send an invalidation report? Frequency?
  • High frequency high broadcast overhead
  • What should be included in the report? Data item
    Id and time-stamps
  • To deal with the problem of network
    disconnection, the report length may be a
    multiple of the report period
  • Report period is the period for sending a report
  • Report length is the time frame covered by a
    report

76
References
  • Schiller Mobile Communications, Ch 6.1 and 6.2
  • Ozsu Principles of Distributed Database Systems,
    Ch 16.4
  • Michael J. Franklin, Stanley B. Zdonik "Data In
    Your Face" Push Technology in Perspective.
    SIGMOD Conference 1998 516-519
  • Pitoura, E. and Chrysanthis, P.K., Scalable
    Processing of Read-Only Transactions in Broadcast
    Push, in Proceedings of International Conference
    on Distributed Systems, May 1999.
Write a Comment
User Comments (0)
About PowerShow.com