Title: Reading Consistent and Current Data
1Reading Consistent and Current Data Off the Air
Indian Institute of Technology, Bombay University
of Mass. Amherst
2Outline
- broadcast environments
- reading current and consistent data
- development of suitable
- correctness criteria
- mechanisms for disseminating (control) data
- efficient and scalable solutions
- exploiting caches
- time-constrained broadcasting
3Broadcast Data Dissemination
- business data, e.g., Vitria, Tibco
- election coverage data
- stock related data
- traffic information
- sportscasts, e.g., Praja
- Datatacycle
- Broadcast disks
Data Server
4Example E-auctions
- numerous potential clients
- clients must have access to current and
consistent database state - only a small fraction actually offer bids
- asymmetric communication medium -- broadcast
current state of auction -- clients
offer bids using low bandwidth uplinks,
reduce client transmissions
5Cyclic Broadcasts
100 Mbps
6 Time Constrained Broadcasts
7Client-Server Communication Model
- Asymmetric communication environments
- Server periodically broadcasts data to clients
using high bandwidth broadcast links - Clients listen to the broadcast to fetch data
- Clients communicate with the server
using low bandwidth upstream links - Update handling
- Transactions at clients update data and send them
to the server - Server resolves update conflicts, commits
updates, and broadcasts updates to clients
through broadcast links
8Mutually Consistent Reads
TrBegin
TrEnd
time (broadcast cycles)
Are x, y, and z mutually consistent?
9Serializability in Bcast Env.
- Serializability a global property
- dynamic conflict resolution gt excessive comm.
- e.g., locking
- acquiring read locks by client transactions
- server swamped with lock requests
- client uses precious uplink bandwidth
- avoid potential conflicts
- clients must be conservative
- unilaterally disallow certain correct
executions - unnecessary aborts
10Schedules
11Serialization Orders
T2 T4
T4T1T2
T2T3T4
But, global history is not serializable
12Serializability?
all read-only transactions 1. required to see the
same serial order of update
transactions -- even if executing at different
clients 2. required to be serializable w.r.t.
all the update transactions -- even if
updates do not affect the values read
unnecessary and inappropriate
13Broadcast Data Requirements
- Mutual consistency
- -- server maintains
- mutually consistent data
- -- clients read mutually consistent data
- Currency
- -- clients see data that is current
14 A Sufficient Criterion
- All update transactions are serializable.
- Each read-only transaction is serializable
with respect to the update transactions it
(directly or indirectly) reads from. -
15 A Sufficient Criterion
- All update transactions are serializable.
- Each read-only transaction is serializable
with respect to the update transactions it
(directly or indirectly) reads from. -
external consistency Weihl 87 update
consistency Bober and Carey 92
16Schedule is Correct
T2T4
T4T1T2
T2T3T4
Even though global history is not serializable
17Implications
If clients know update schedule
read-only transactions need not contact the
server. gt addresses the primary problems with
serializability
18Possible Concerns?
Will transactions executing at the same client
see different serial orders of update
transactions?
- T1 followed by T2
- because of mutual consistency and currency,
- T2s reads will be consistent and current
relative to T1's - T1 concurrent with T2
- can see different update orders
- only if the updates are unrelated
19Schedule is Correct
T2T4
T4T1T2
T2T3T4
Even though global history is not serializable
20Outline
- broadcast environments
- reading current and consistent data
- development of correctness criteria
- mechanisms
- performance results
- exploiting caches
- time-constrained broadcasts
21Relationships
Mutual Consistency
View Serializability
Conflict Serializability
22The Approach
1. Update trs at the server are conflict
serializable. 2. Each read-only transaction
is serializable with respect to the update
transactions it (directly or indirectly) reads
from. affect( T) transactions that affect
what T reads i.e., transactions T
directly or indirectly reads from. for every
read-only transaction T, serialization graph
consisting of only the trs in T
U affect( T ) is acyclic.
23The Algorithm F-Matrix
- the server functionality
- the client functionality
- the nature of the control information
- transmitted from the server to the clients
- to help clients determine consistency of values
read - the client read-only transaction validation
protocol
24Server Functionality
Ensures the conflict serializability of
all update transactions During each cycle server
broadcasts 1. the latest committed values
of all data items at the beginning of the
cycle. 2. a control matrix Incrementally
maintains the control matrix -- as updates occur
25Client Functionality
26Control Matrix Intuition
27Control Matrix
Objects n objects all
initialized at cycle 0 C n x n control
matrix C(x,y) max( cycle in which T
commits ), where T affects the latest
committed value of y
and also writes to x
28Control Matrix
29Precond. for Consistent Reads
T previously read x from broadcast cycle b
RT set of (x ,b) pairs C is the matrix at
the beginning of current cycle
read y iff read-condition(y) holds
forall (x,b) in RT, C(x,y) lt b i.e.,
no transaction that affected y wrote x
after t read x
30Is Schedule Correct?
RT (x,1)
RT (x,1),(y,3) C(x,y) 0
ok
RT (x,2),(y,2)
31Is Schedule Correct?
T is currently reading y T had read x No
tr that affected y changed x after T read
it
RT (x,1)
RT (x,1),(y,3) C(x,y) 2
ok
RT (x,2),(y,2)
32Control Matrix - Overheads
- maxcycles maximum number of cycles that a read
tr - could span
- need to store only cycle numbers from 0 to
maxcycles - perform modulo (maxcycles 1) arithmetic
- Transmitting the matrix
- n2 x log(maxcycles) bits per broadcast cycle
- if object size is small, this overhead can be
significant. -
- transmit column j right after object obj.
33Smaller Control Matrix
- partition objects into groups
- control matrix n x numgroups
- SC(x,s) max y in s C(x, y)
- updating an object in s update to any
object in s - fewer entries to transmit compared to C
group2
group 1
read-condition(y) forall (x , b) in RT
SC(i , s) lt b
T is currently reading y T had read x No tr
that affected any object in y s group
changed x after T read it
34Group Size
- increasing size of group gt more
unnecessary conflicts - reducing size of group gt increased
control information overhead. - n groups gt F-Matrix
- one group gt Datacycle
- achieves serializability
- read-condition for Datacycle
35R-Matrix
- To achieve Mutual Consistency
Read condition objects previously read have
not been updated by other transactions
or the object being read has not been
updated since the beginning of the transaction
36Schedule is Correct
objects previously read have not been updated
by other transactions or the object
being read has not been updated
Not acceptable by Datacycle -- accepted by
R-Matrix
37Hardware Support
- a bit could be set by hardware if any of the
previously read values of a transaction are
changed. - a read is disallowed if
- the bit is set and
- if the object being read has been changed
during or after the cycle in which the
first read operation was performed by the tr
38Outline
- broadcast environments
- reading current and consistent data
- development of correctness criteria
- mechanisms
- performance results
- exploiting caches
- time-constrained broadcasts
39Simulated System
- broadcast medium bandwidth -- 64 Kbits/s
- time unit - time to broadcast one bit
- timestamp size 8 bit
- object size one KByte
- Control matrix Overheads
- Datacycle and R-matrix -- 0.1
- with 300 objects, F-Matrix -- 23
40Parameters
- Client Transaction Length 4
- Server Transaction Length 8
- Transaction Rate at Server 1 in 2.5 x 105
bit-units - Number of Objects in Database 300
- Size of Objects in Database 1 KB
- Server Read Operation Probability 0.5
- Client Inter-Operation Delay 64K bit-units
- Client Inter-Transaction Delay 128K bit-units
- Client Restart Delay 0 bit-units
- Timestamp Size 8 bits
41Effect of Client Tr. Length
F-Matrix -- has best perf. -- scales very well
42Effect of Server Tr length
- Longer server transactions
- more updates at the server for each cycle
- response time increases
- F-Matrix
- little increase in response time
- scalable
43Effect of Server Tr rate
- F-Matrix
- does not degrade
210 220 230
240 250
Transaction Rate (in 1 per 1000X bits)
44Effect of Number of Objects
- objects increases
- probability of trs accessing an object decreases
- length of cycle increases
- control information increases
- server transactions per cycle increases
- increases conflicts at server.
- response times increase
- F-Matrix
- displays the best response times
- has least rate of increase
- effect similar for size increases
45Summary of Results
- F-Matrix gt R-Matrix gt Datacycle
- weaker abort condition leads to better response
times - response time scalability
- F-Matrix is highly scalable with respect to
- client / server transaction length
- server transaction rate
- In many cases F-Matrix very close to
F-Matrix-ideal
46Enhancements
- Concurrency control granularity
- Finer granularity results in higher concurrency
- Matrix not scalable for finer granularity due to
overheads resulting from matrix transmission - Control information transmission
- Reduce transmitted information if matrix is
sparse - Use appropriate indices to speed up clients
access to concurrency control information
47Outline
- broadcast environments
- reading current and consistent data
- development of suitable
- correctness criteria
- mechanisms for disseminating (control) data
- performance results
- exploiting caches
- time-constrained broadcasts
48Client Caches - Serializability
- server maintains read/write locks
- server aware of client cache contents
- invalidations / propagations to clients
- scalability problems
- else, client read old data
- compromises currency / consistency
49Client Caches - C Matrix
- transmit only s over the previous C matrix
transmission. - but
- client has to store the previous transmitted C
matrix - client should listen to the last broadcast of
the C matrix -- and the subsequent
s - increases usage of scarce client
resources (battery power)
50Caches - Weak Currency
- suppose data needs to be current to only within
D time units D gt broadcast cycle time - read from the client cache
- data removed from cache as soon as not
current - ensuring mutually consistent reads
- store columns of the control matrix corresponding
to the data items cached - along with the cycle at which the data items were
cached.
51Time-constrained Broadcasts
- Meet data temporal coherence
- improve currency of data read by clients
- attach validity interval for each broadcast data
object - Meet time constraints of requests
- Periodically transmit
- frequently accessed data
- hot data
- rest, on demand
52Summary
- Need for mutual consistency currency
- Efficient mechanism - F-matrix
- R-matrix is a low overhead alternative
- Exploiting weak consistency, caches
- Temporally-coherent broadcasts
53Related Work
- Broadcast Environments
- Datacycle Herman
- supports serializability
- Broadcast Disks Acharya
- consistency not considered
- ProMotion Chrysanthis
- assumes caches