Shetal Shah, IITB - PowerPoint PPT Presentation

1 / 60

About This Presentation

Title:

Shetal Shah, IITB

Description:

Rice price changes by Rs. 10 compared to previous day ... Heavy polling for stringent coherence requirement or highly dynamic data ... – PowerPoint PPT presentation

Number of Views:67

Avg rating:3.0/5.0

Slides: 61

Provided by: cseIi3

Category:

more less

Transcript and Presenter's Notes

Title: Shetal Shah, IITB

1
Dissemination of Dynamic Data Semantics,
Algorithms, and Performance
Shetal Shah, IITB
Modified by Ajinkya Joshi For CS 632
2
More and more of the informationwe consumeis
dynamically constructed
3
Buying a camera? Track auctions
4
Dynamic Data

Data gathered by (wireless sensor) networks
Sensors that monitor light, humidity, pressure,
and heat
Network traffic passing via switches
Sports Scores
Score changes by 5 points
Financials
Rice price changes by Rs. 10 compared to previous
day
Total value of stock portfolio exceeds 10,000

5
Continual Queries
A CQ is a standing query coupled with a
trigger/select condition CQ stock_monitor
SELECT stock_price FROM quotes WHEN
stock_price prev_stock_price gt 0.5 CQ
RFP_tracker SELECT project_name, contact_info
FROM RFP_DB WHERE skill_set_required ?
available_skills Not every change at a source
leads to a change in the result of the query
6
Generic Architecture
wired host
Network
Network
sensors
servers
Proxies /caches /Data aggreators
mobile host
Data sources
End-hosts
7
Where should the queries execute ?

At clients
cant optimize across clients, links
At source (where changes take place)
Advantages
Minimum number of refresh messages, high fidelity
Main challenge
Scalability
Multiple sources hard to handle
At Data Aggregators -- DAs/proxies -- placed at
edge of network
Advantages
Allows scalability through consolidation,
Multiple data sources
Main challenge
Need mechanisms for maintaining data consistency
at DAs

8
Coherency of Dynamic Data

Strong coherency
The client and source always in sync with each
other
Strong coherency is expensive!
Relax strong coherency ? - coherency
Time domain ?t coherency
The client is never out of sync with the source
by morethan ?t time units
eg Traffic data not stale by more than a minute
Value domain ?v - coherency
The difference in the data values at the client
and the source bounded by ?v at all times
eg Only interested in temperature changes larger
than 1 degree

9
Coherency Requirement (c )
temperature, max incoherency 1 degree
10
Data/Query Value at client
at Server
T
Bounds
Violation
11
Source pushes interesting changes
Achieves ?v - coherency Keeps network
overhead minimum -- poor scalability (has to
maintain state and keep connections open)
User
Source
DA
push
push
12
Pull interesting changes
Server
Repository
User
Pull

Pull after
Time to Live (TTL)
Time To Next Refresh (TTR / TNR)
Can be implemented using the HTTP protocol
Stateless and hence is generally scalable with
respect to state space and computation
Need to estimate when a change of interest will
happen
Heavy polling for stringent coherence requirement
or highly dynamic data
Network overheads higher than for Push

13
Complementary Properties
14
Dynamic Content Distribution Networks
To create a scalable content dissemination
network (CDN) for streaming/dynamic data.
15
Dissemination Network Example
Data Set p, q, r Max Clients 2
A
B
D
C
16
Challenges I

Given the data and coherency needs of
repositories,
how should repositories cooperate to satisfy
these needs?
How should repositories refresh the data such
that
coherency requirements of dependents are
satisfied?
How to make repository network resilient to
failures?
VLDB02, VLDB03, IEEE TKDE

17
Challenges - II

Given the data and the coherency available at
repositories in the network,
how to assign clients to repositories?
Given the data and coherency needs of clients in
the network,
what data should reside in each repository
and at what coherency?
If the client requirements keep changing,
how and when should the repositories be
reorganized ?

RTSS 2004, VLDB 2005
18
Dynamics along three axes

Data is dynamic, i.e., data changes rapidly and
unpredictably
Data items that a client is interested in
also change dynamically
Network is dynamic, nodes come and go

19
Data Dissemination
20
Data Dissemination

Different users have different coherency req
for the same data item.
Coherency requirement at a repository should be
at least as stringent as that of the dependents.
Repositories disseminate only changes of
interest.

A
B
D
C
Client
21
Condition for Data dissemination

P will send the update to Q only if -

Is this condition sufficient?
22
Data dissemination -- must be done with care
1
1
1.2
1
1.4
1
1.4
1.5
1.7
should prevent missed updates!
23
Source Based Dissemination Algorithm

For each data item, source maintains
unique coherency requirements of repositories
the last update sent for that coherency
For every change,
source finds the maximum coherency
for which it must be disseminated
tags the change with that coherency
disseminates (changed data, tag)

24
Source Based Dissemination Algorithm
1
1
1.2
1
1.4
1.5
1.7
1.5
1.5
25
Repository Based Dissemination Algorithm

A repository P sends changes of interest to the
dependent Q if
26
Repository BasedDissemination Algorithm
1
1
1
1
1
1.2
1.4
1.4
1.4
1.5
1.7
27
Building the content distribution network
Choose parents for repositories such that
overall fidelity observed by the repositories is
high ---reduce communication and computational
delays..
28
If parents are not chosen judiciously

It may result in
Uneven distribution of load on repositories.
Increase in the number of messages in the system.

A
B
C
D
Increase in loss in fidelity!
29
LeLA

Looks for position of Q level by level
Each level as load controller node
For each repository on that level it calculates
preference factor
Smaller the preference factor better is the
chance of a repository to become parent

30
Preference factor

Data availability factor
Computational delay factor
Communication delay factor
Preference factor

31
DiTA

Repository N needs data item x
If the source has available push connections,
or the source is the only node
in the dissemination tree for x
N is made the child of the source
Else
repository is inserted in most suitable subtree
where
Ns ancestors have more stringent coherency
requirements
N is closest to the root

32
Most Suitable Subtree?

l smallest level in the subtree with coherency
requirement less stringent than Ns.
d communication delay from the root of the
subtree to N.
smallest (l x d ) most suitable subtree.

Essentially, minimize communication
and computational delays!
33
Example
Initially the network consists of the source.
34
Example
D requests service of q with coherency
requirement 0.2
35
Example
D requests service of q with coherency
requirement 0.2
36
Comparison of LeLA and DiTA
LeLA- Each node does more work, DiTA High
communication cost
37
Resiliency
38
Handling Failures in the Network

Need to detect permanent/transient failures in
the network and to recover from them
Resiliency is obtained by adding redundancy
Without redundancy,
failures ? loss in fidelity
Adding redundancy can increase cost
? possible loss of fidelity!
Handle failures such that
cost of adding resiliency is low!

39
Passive/Active Failure Handling

Passive failure detection
Parent sends Im alive messages at the end of
every time interval.
what should the time interval be?
Active failure handling
Always be prepared for failures.
For example 2 repositories can serve the same
data item at the same coherency to a child.
This means lots of work
? greater loss in fidelity.

40
Middle Path
Let repository R want data item x with coherency
c.
A backup parent B is found for each data item
that the repository needs
P
c
At what coherency should B serve R ?
R
41
If a parent fails

Detection Child gets two consecutive updates
from the backup parent with no updates from the
parent

B
c
k x c

Recovery Backup parent is asked to serve at
coherency c till we get an update from the parent

R
42
Adding Resiliency to DiTA

A sibling of P is chosen as the backup parent of
R.
If P fails,
A serves B with coherency c
? change is local.
If P has no siblings, a sibling of nearest
ancestor is chosen.
Else the source is made the backup parent.

A
B
c
k x c
R
43
Markov Analysis for k

Assumptions
Data changes as a random walk along the line
The probability of an increase is the same as
that of a decrease
No assumptions made about the unit of change or
time taken for a change

Expected misses for any k lt 2 k2 2
for k 2, expected misses lt 6
44
Experimental Methodology

Physical network 4 servers, 600 routers,
100 repositories
Communication delay 20-30 ms
Computation delay 3-5 ms
Real stock traces 100-1000
Time duration of observations 10,000 s
Tight coherency range 0.01 to 0.05
loose coherency range 0.5 to 0.99

45
Failure and Recovery Modelling
Trend for time between failure

Failures and recovery modeled based on trends
observed in practice
Analysis of link failures in an IP backbone by
G. Iannaccone et al
Internet Measurement Workshop 2002

Recovery10 gt 20 min 40 gt 1
min lt 20 min 50 lt 1 min
46
In the Presence of Failures, Varying Recovery
Times
Addition of resiliency does improve fidelity.
47
In the Presence of Failures, Varying Data Items
Increasing Load
Fidelity improves with addition of resiliency
even for large number of data items.
48
In the Absence of Failures
Increasing Load
Often, fidelity improves with addition of
resiliency, even in the absence of failures!
49

Delay

50
Source of delay

Queuing delay
Time delay between arrival of update and start of
processing
Processing delay
Check delay
Data coherency requirement are checked
Computation delay
Computing data to be pushed and actual pushing it

51
What is our goal?

Aim is to improve average fidelity over all
repositories
This can be achieved using
Better filering of updates
Better scheduling of dissiminations

52
Better filtering of updates

For every dependent repository maintains
Coherency requirement
Last pushed value
New value is pushed if it differs by Cr
This creates a window with
Lower bound lb Last pushed value cr
Upper bound ub Last pushed value cr

53
Cr as ordering parameter?
(10,10.6)

Is it possible to use Cr as ordering parameter?

Source
10.3
A
B
(9.5,10.5)
(9.7,10.3)
(9.5,10.5)
Last value pushed - 10
10
Last value pushed -
Next update is 10.55. Can Cr still act as
ordering parameter?
54
Restriction on updates

In order to use Cr as ordering parameters, some
restrictions on updates are needed
If c1 lt c2 gt
L2 lt l1 and u2 gt u1
To satisfy these ineualities pseudo update value
is used

55
Update with pseudo value
(9.9,10.5)
10.3
Source
10.2
A
B
(9.5,10.5)
(9.7,10.3)
(9.5,10.5)
Last value pushed - 10
10
Last value pushed -
56
How to calculate pseudo update?
10.4
10.1
Next update - 10.55
If v lt (lb ci ) then pseudo val lbci else if
v gt (ub - ci ) then pseudo val lb - ci Else
pseudo val v
57
Better Scheduling