Title: tt dafea
1UoI Presentation
DBGlobe IST-2001-32645
3rd Meeting Athens, November 29, 2002
Proactive initiative on Global Computing (GC)
Future and Emerging Technologies (FET)
The roots of innovation
2Outline
Directories Resource Location Data Delivery
3Resource Discovery
Summaries for Resource Discovery
Maintain summaries (e.g., Bloom filters) to
assist the search for a service
(resource) Directories for XML metadata and
appropriate summaries
4Resource Discovery
Motivation (DBGlobe) Large Scale and Dynamic
Environment How to locate a resource System
Model Sites that store hierarchical descriptions
of services (in XML) or XML documents Path
queries Limitations (so far) We consider only
XML-Trees (no cycles) No value queries
Joint work with Georgia Koloniari
5Resource Discovery
ltxmlgt ltdevicegt ltprintergt
ltcolorgtlt/colorgt ltpostscriptgtlt/postscriptgt
lt/printergt ltcameragt ltdigitalgtlt/digitalgt
lt/cameragt lt/devicegt
An example XML-description and the corresponding
XML-tree
Path queries From the root //device/printer Par
tial camera/digital
Overall Approach maintain Bloom-based indexes to
check whether a document (item) exists at a site
(peer)
6Resource Discovery
Bloom-Filters
test if an element b exists in a set A a1,
a2,, an of n elements (keys)
Bit Vector v
Element a h1(a) P1 h2(a) P2 h3(a)
P3 h4(a) P4
Allocate a vector v of m bits, initially all set
to 0 Choose k independent hash functions, h1, h2,
, hk, each with range 1,, m. For each
element a ? A, set the bits at positions h1(a),
h2(a), . . . , hk(a) to 1. (A particular bit
might be set to 1 multiple times) Given a query
for b, check the bits at positions h1(b), h2(b),
. . . , hk(b). If any is 0, then certainly b is
not in the set A. Otherwise we assume that b is
in the set (false positive).
1
1
m bits
1
1
7Resource Discovery
Breadth (or level) Blooms
The Breadth Bloom Filter (BBF) for an XML tree T
with j levels set of Bloom filters BBF0, BBF1,
BBF2, BBFi, i j One Bloom filter, denoted
BBFi, for each level i of the tree. BBFi the
labels (attributes) of all nodes at level i.
BBF0 all attributes that appear in any node of
the XML tree T.
device, printer, camera, color, postscript,
digital
BBF0
BBF1
device
BBF2
printer, camera
BBF3
color, postscript, digital
The BBFi s are not of the same size We may skip
levels
8Resource Discovery
Depth (or Path) Blooms
The Depth Bloom Filter (DBF) for an XML tree T
with j levels set of Bloom filters DBF0, DBF1,
DBF2, DBFi-1, i j One Bloom filter,
denoted DBFi, for each path of length i (with i1
nodes) of the tree. DBFi the labels
(attributes) of all paths of length i. DBF0 all
attributes that appear in any node of the XML
tree T.
device, printer, camera, color, postscript,
digital
DBF0
DBF1
device/printer, device/camera, printer/color,
printer/postscript, camera/digital
DBF2
device/printer/color, device/printer/postscript,
device/camera/digital
Special symbol for root paths
9Resource Discovery
- Preliminary performance results
- Both outperform (in terms of false positives) a
same size simple bloom - Depth (path) very sensitive on the number of
levels - Depth (path) need more space
- Updates are handled efficiently (just the
corresponding vectors)
10Resource Discovery
- Distribution
- Each site
- local-filter a bloom filter for local resources
- one or more summary -filter
- summary-filter merge of the bloom filters of a
set X of other sites
11Resource Discovery
Horizons (keep information for up to horizon d
neighbors (as in routing indexes) A merged-filter
for each path merge of blooms for all sites on
the path up to length equal to the horizon
1
7
2
6
Merged of nodes 1, 2
8
Merged of nodes 6, 7, 8
0
4
3
9
Merged of nodes 3, 4
5
12Resource Discovery
Hierarchical
root peers
1
2
3
Leaf sites local filter Internal sites
summaries for all nodes in its subtree Root sites
summaries for other root sites
13Resource Discovery
- Future work
- Evaluate distribution strategies
- Other ways of summarizing data (related work on
selectivity estimation) - See how this
- can be related to ontologies (meaningful path
queries) - whether/how it can be integrated with querying
14Outline
Directories Resource Location Data Delivery
15Data Delivery
For the 1st deliverable on the topic
- A survey on different modes to transmit data
- Push/pull
- Continuous (periodic) /a-periodic
- Multicast/unicast
- Directed diffusion (communication only with
neighbor nodes)
16Data Delivery
For the 1st deliverable on the topic
- The different data delivery modes in DBGlobe
- Tradeoffs of using one over the other (e.g., in
registering services, directory (location
updates) - To be extended for D10 (Data Delivery and
Querying)
17Data Delivery
Data Delivery Modes and Coherence
Focus How to achieve temporal (currency) and
Semantic (transaction-based) Coherency of Data
under different modes of data delivery
18Data Delivery
The Data Broadcast Model
- The server broadcasts data from a database to a
large number of clients - push mode no direct communication with the
server - Data updates at the server
- Periodic updates for the values on the channel
Broadcast Channel
Server
Client
- Efficient way to disseminate information to
large client populations with similar interests - Physical support in wireless networks
(satellite, cellular) - Alternative way of transmitting information for
data intensive applications (e.g., web)
19Data Delivery
Clients must read consistent and current data
without contacting the server directly
- Multiple Versions Not just one value per item,
but k such values PitouraChrysanthis, IEEE TC
2003 - Temporal and Semantic Coherency (Theory and
Protocols) Pitoura,ChrysanthisRamamritham,
ICDT03
20Data Delivery
Currency
Currency Interval of an item x in RS(R) - CI(x,
R) - is cb, ce) where cb is the time instance
when the value was stored in the database, ce is
the time insatnce of the next change of this
value in the database
Currency Interval for a set (readset)
? (x, u) ?RS(R) CI(x, R)
? ?, say cb, ce) overlapping- equal to ce-
RS(R) is a subset an actual database state at the
server
older value OV_Currency(R) ce- , where ce is
the smallest among the right limits of CI(x, R)
Two properties Temporal spread (discrepancies
among database states) Temporal Lag (how old with
regards some point in time (e.g., T_commit)
21Data Delivery
Protocols and their properties
- Timestamps (versioning)
- Invalidation Reports
- Propagation
22Data Delivery
Consistency
Degrees of Consistency C0 C1 RS(R) ? DS C2 R
serializable with the set of server transactions
that read values read (directly or indirectly) by
R C3 R serializable with the all server
transactions C4 R serializable with the all
server transactions and the serial izability
order of the server transactions that R observes
is consistent with the commit order of
transactions at the server
23Data Delivery
Protocols and their properties
Based on broadcasting the serialization graph of
the server (or parts of it)
Relation to temporal coherency
24Data Delivery
Future Work
Multiple servers model
Applications in sensor networks
25DBGlobe IST-2001-32645