Shetal Shah, IITB - PowerPoint PPT Presentation

About This Presentation
Title:

Shetal Shah, IITB

Description:

Rice price changes by Rs. 10 compared to previous day ... Heavy polling for stringent coherence requirement or highly dynamic data ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 61
Provided by: cseIi3
Category:
Tags: iitb | shah | shetal

less

Transcript and Presenter's Notes

Title: Shetal Shah, IITB


1
Dissemination of Dynamic Data Semantics,
Algorithms, and Performance
Shetal Shah, IITB
Modified by Ajinkya Joshi For CS 632
2
More and more of the informationwe consumeis
dynamically constructed
3
Buying a camera? Track auctions
4
Dynamic Data
  • Data gathered by (wireless sensor) networks
  • Sensors that monitor light, humidity, pressure,
    and heat
  • Network traffic passing via switches
  • Sports Scores
  • Score changes by 5 points
  • Financials
  • Rice price changes by Rs. 10 compared to previous
    day
  • Total value of stock portfolio exceeds 10,000

5
Continual Queries
A CQ is a standing query coupled with a
trigger/select condition CQ stock_monitor
SELECT stock_price FROM quotes WHEN
stock_price prev_stock_price gt 0.5 CQ
RFP_tracker SELECT project_name, contact_info
FROM RFP_DB WHERE skill_set_required ?
available_skills Not every change at a source
leads to a change in the result of the query
6
Generic Architecture
wired host
Network
Network
sensors
servers
Proxies /caches /Data aggreators
mobile host
Data sources
End-hosts
7
Where should the queries execute ?
  • At clients
  • cant optimize across clients, links
  • At source (where changes take place)
  • Advantages
  • Minimum number of refresh messages, high fidelity
  • Main challenge
  • Scalability
  • Multiple sources hard to handle
  • At Data Aggregators -- DAs/proxies -- placed at
    edge of network
  • Advantages
  • Allows scalability through consolidation,
    Multiple data sources
  • Main challenge
  • Need mechanisms for maintaining data consistency
    at DAs

8
Coherency of Dynamic Data
  • Strong coherency
  • The client and source always in sync with each
    other
  • Strong coherency is expensive!
  • Relax strong coherency ? - coherency
  • Time domain ?t coherency
  • The client is never out of sync with the source
    by morethan ?t time units
  • eg Traffic data not stale by more than a minute
  • Value domain ?v - coherency
  • The difference in the data values at the client
    and the source bounded by ?v at all times
  • eg Only interested in temperature changes larger
    than 1 degree

9
Coherency Requirement (c )
temperature, max incoherency 1 degree
10
Data/Query Value at client
at Server
T
Bounds
Violation
11
Source pushes interesting changes
Achieves ?v - coherency Keeps network
overhead minimum -- poor scalability (has to
maintain state and keep connections open)
User
Source
DA
push
push
12
Pull interesting changes
Server
Repository
User
Pull
  • Pull after
  • Time to Live (TTL)
  • Time To Next Refresh (TTR / TNR)
  • Can be implemented using the HTTP protocol
  • Stateless and hence is generally scalable with
    respect to state space and computation
  • Need to estimate when a change of interest will
    happen
  • Heavy polling for stringent coherence requirement
    or highly dynamic data
  • Network overheads higher than for Push

13
Complementary Properties
14
Dynamic Content Distribution Networks
To create a scalable content dissemination
network (CDN) for streaming/dynamic data.
15
Dissemination Network Example
Data Set p, q, r Max Clients 2
A
B
D
C
16
Challenges I
  • Given the data and coherency needs of
    repositories,
  • how should repositories cooperate to satisfy
    these needs?
  • How should repositories refresh the data such
    that
  • coherency requirements of dependents are
    satisfied?
  • How to make repository network resilient to
    failures?
  • VLDB02, VLDB03, IEEE TKDE

17
Challenges - II
  • Given the data and the coherency available at
    repositories in the network,
  • how to assign clients to repositories?
  • Given the data and coherency needs of clients in
    the network,
  • what data should reside in each repository
  • and at what coherency?
  • If the client requirements keep changing,
  • how and when should the repositories be
    reorganized ?

RTSS 2004, VLDB 2005
18
Dynamics along three axes
  • Data is dynamic, i.e., data changes rapidly and
    unpredictably
  • Data items that a client is interested in
  • also change dynamically
  • Network is dynamic, nodes come and go

19
Data Dissemination
20
Data Dissemination
  • Different users have different coherency req
    for the same data item.
  • Coherency requirement at a repository should be
    at least as stringent as that of the dependents.
  • Repositories disseminate only changes of
    interest.

A
B
D
C
Client
21
Condition for Data dissemination
  • P will send the update to Q only if -

Is this condition sufficient?
22
Data dissemination -- must be done with care
1
1
1.2
1
1.4
1
1.4
1.5
1.7
should prevent missed updates!
23
Source Based Dissemination Algorithm
  • For each data item, source maintains
  • unique coherency requirements of repositories
  • the last update sent for that coherency
  • For every change,
  • source finds the maximum coherency
  • for which it must be disseminated
  • tags the change with that coherency
  • disseminates (changed data, tag)

24
Source Based Dissemination Algorithm
1
1
1.2
1
1.4
1.5
1.7
1.5
1.5
25
Repository Based Dissemination Algorithm

A repository P sends changes of interest to the
dependent Q if
26
Repository BasedDissemination Algorithm
1
1
1
1
1
1.2
1.4
1.4
1.4
1.5
1.7
27
Building the content distribution network
Choose parents for repositories such that
overall fidelity observed by the repositories is
high ---reduce communication and computational
delays..
28
If parents are not chosen judiciously
  • It may result in
  • Uneven distribution of load on repositories.
  • Increase in the number of messages in the system.

A
B
C
D
Increase in loss in fidelity!
29
LeLA
  • Looks for position of Q level by level
  • Each level as load controller node
  • For each repository on that level it calculates
    preference factor
  • Smaller the preference factor better is the
    chance of a repository to become parent

30
Preference factor
  • Data availability factor
  • Computational delay factor
  • Communication delay factor
  • Preference factor

31
DiTA
  • Repository N needs data item x
  • If the source has available push connections,
  • or the source is the only node
  • in the dissemination tree for x
  • N is made the child of the source
  • Else
  • repository is inserted in most suitable subtree
    where
  • Ns ancestors have more stringent coherency
    requirements
  • N is closest to the root

32
Most Suitable Subtree?
  • l smallest level in the subtree with coherency
    requirement less stringent than Ns.
  • d communication delay from the root of the
    subtree to N.
  • smallest (l x d ) most suitable subtree.

Essentially, minimize communication
and computational delays!
33
Example
Initially the network consists of the source.
34
Example
D requests service of q with coherency
requirement 0.2
35
Example
D requests service of q with coherency
requirement 0.2
36
Comparison of LeLA and DiTA
LeLA- Each node does more work, DiTA High
communication cost
37
Resiliency
38
Handling Failures in the Network
  • Need to detect permanent/transient failures in
    the network and to recover from them
  • Resiliency is obtained by adding redundancy
  • Without redundancy,
  • failures ? loss in fidelity
  • Adding redundancy can increase cost
  • ? possible loss of fidelity!
  • Handle failures such that
  • cost of adding resiliency is low!

39
Passive/Active Failure Handling
  • Passive failure detection
  • Parent sends Im alive messages at the end of
    every time interval.
  • what should the time interval be?
  • Active failure handling
  • Always be prepared for failures.
  • For example 2 repositories can serve the same
    data item at the same coherency to a child.
  • This means lots of work
  • ? greater loss in fidelity.

40
Middle Path
Let repository R want data item x with coherency
c.
A backup parent B is found for each data item
that the repository needs
P
c
At what coherency should B serve R ?
R
41
If a parent fails
  • Detection Child gets two consecutive updates
    from the backup parent with no updates from the
    parent

B
c
k x c
  • Recovery Backup parent is asked to serve at
    coherency c till we get an update from the parent

R
42
Adding Resiliency to DiTA
  • A sibling of P is chosen as the backup parent of
    R.
  • If P fails,
  • A serves B with coherency c
  • ? change is local.
  • If P has no siblings, a sibling of nearest
    ancestor is chosen.
  • Else the source is made the backup parent.

A
B
c
k x c
R
43
Markov Analysis for k
  • Assumptions
  • Data changes as a random walk along the line
  • The probability of an increase is the same as
    that of a decrease
  • No assumptions made about the unit of change or
    time taken for a change

Expected misses for any k lt 2 k2 2
for k 2, expected misses lt 6
44
Experimental Methodology
  • Physical network 4 servers, 600 routers,
  • 100 repositories
  • Communication delay 20-30 ms
  • Computation delay 3-5 ms
  • Real stock traces 100-1000
  • Time duration of observations 10,000 s
  • Tight coherency range 0.01 to 0.05
  • loose coherency range 0.5 to 0.99

45
Failure and Recovery Modelling
Trend for time between failure
  • Failures and recovery modeled based on trends
    observed in practice
  • Analysis of link failures in an IP backbone by
    G. Iannaccone et al
  • Internet Measurement Workshop 2002

Recovery10 gt 20 min 40 gt 1
min lt 20 min 50 lt 1 min
46
In the Presence of Failures, Varying Recovery
Times
Addition of resiliency does improve fidelity.
47
In the Presence of Failures, Varying Data Items
Increasing Load
Fidelity improves with addition of resiliency
even for large number of data items.
48
In the Absence of Failures
Increasing Load
Often, fidelity improves with addition of
resiliency, even in the absence of failures!
49
  • Delay

50
Source of delay
  • Queuing delay
  • Time delay between arrival of update and start of
    processing
  • Processing delay
  • Check delay
  • Data coherency requirement are checked
  • Computation delay
  • Computing data to be pushed and actual pushing it

51
What is our goal?
  • Aim is to improve average fidelity over all
    repositories
  • This can be achieved using
  • Better filering of updates
  • Better scheduling of dissiminations

52
Better filtering of updates
  • For every dependent repository maintains
  • Coherency requirement
  • Last pushed value
  • New value is pushed if it differs by Cr
  • This creates a window with
  • Lower bound lb Last pushed value cr
  • Upper bound ub Last pushed value cr

53
Cr as ordering parameter?
(10,10.6)
  • Is it possible to use Cr as ordering parameter?

Source
10.3
A
B
(9.5,10.5)
(9.7,10.3)
(9.5,10.5)
Last value pushed - 10
10
Last value pushed -
Next update is 10.55. Can Cr still act as
ordering parameter?
54
Restriction on updates
  • In order to use Cr as ordering parameters, some
    restrictions on updates are needed
  • If c1 lt c2 gt
  • L2 lt l1 and u2 gt u1
  • To satisfy these ineualities pseudo update value
    is used

55
Update with pseudo value
(9.9,10.5)
10.3
Source
10.2
A
B
(9.5,10.5)
(9.7,10.3)
(9.5,10.5)
Last value pushed - 10
10
Last value pushed -
56
How to calculate pseudo update?
10.4
10.1
Next update - 10.55
If v lt (lb ci ) then pseudo val lbci else if
v gt (ub - ci ) then pseudo val lb - ci Else
pseudo val v
57
Better Scheduling
  • Order in which an update should be processed
  • Order in which an update should be propogated
  • Consider u1,u2.. un be the updates that we want
    to process

58
Better Scheduling (Cont..)
  • C(u1), C(u2).. be the time delay for processing
    updates (cost)
  • B(u1), B(u2).. be the total no of descendents
    that would be benefited (Benefit)
  • Optimal ordering is by the score of
  • B(ui)/C(ui)

59
Better Scheduling (Evaluation)
60
Acknowledgements
  • Allister Bernard Vivek Sharma
  • S. Dharmarajan
  • Shweta Agarwal
  • T. Siva
  • Prof. C. Ravishankar
  • Prof. Sohoni and Prof. Rangaraj
  • Prof. S. Sudarshan
  • Prof. Krithi Ramamritham
Write a Comment
User Comments (0)
About PowerShow.com