Early%20Profile%20Pruning%20on%20XML-aware%20Publish-Subscribe%20Systems - PowerPoint PPT Presentation

About This Presentation

Title:

Early%20Profile%20Pruning%20on%20XML-aware%20Publish-Subscribe%20Systems

Description:

Early Profile Pruning on XML-aware Publish-Subscribe Systems Mirella M. Moro, Petko Bakalov, Vassilis J. Tsotras University of California, Riverside ... – PowerPoint PPT presentation

Number of Views:126

Avg rating:3.0/5.0

Slides: 24

Provided by: ucr98

Learn more at: http://alumni.cs.ucr.edu

Category:

more less

Transcript and Presenter's Notes

Title: Early%20Profile%20Pruning%20on%20XML-aware%20Publish-Subscribe%20Systems

1
Early Profile Pruning on XML-aware
Publish-Subscribe Systems

Mirella M. Moro, Petko Bakalov, Vassilis J.
Tsotras
University of California, Riverside

2
Overview

Motivation
Bottom-up Filtering FSM (BUFF)
Bounding-based XML Filtering (BoxFilter)
Core Modules
Filtering algorithms
Experimental results

3
Motivation

Publish-subscribe systems The message
transmission is defined by the message content
Examples notification websites hotwire.com or
ticketmaster.com

Publisher
Publisher
Publisher
Publisher
Docu ments
Docu ments
Docu ments
Docu ments
Matching algorithm
Re su l t
Re su l t
Re su l t
Re su l t
Prof ile
Prof ile
Prof ile
Prof ile
Submit, Update, Delete
Submit, Update, Delete
Submit, Update, Delete
Submit, Update, Delete
Subscriber
Subscriber
Subscriber
Subscriber
4
Publish-subscribe systems

The data is exchanged in XML format.
Nodes - correspond to elements, attributes or
text values
Edges represent immediate element-subelement or
element-value relationships

ltBibgt ltarticle vol7 no11gt lttitlegtt1lt/titl
egt ltauthorgt ltlastgtDeWittlt/lastgt ltmigtJlt/migt
ltfirstgtDavidlt/firstgt lt/authorgt ltjournalgtTP
DSlt/journalgt ltyeargt1996lt/yeargt lt/articlegt ltart
iclegt lttitlegtt2lt/titlegt ltauthorgt ltlastgtFlor
escult/lastgt ltfirstgtDanielalt/firstgt lt/authorgt
ltproceedingsgtSIGMOD lt/proceedingsgt lty
eargt2006lt/yeargt lt/articlegt lt/Bibgt
(a) Document
(b) Tree representation
5
Publish-subscribe systems (cont.)

The user profiles are expressed in XML query
language (XPath, XQuery)
XML query contains
structural constraints
value-based constraints

Structural constraints ////article/author_at_last
Smith''//procs_at_confVLDB''
Tree pattern
article
proceedings
author
conf
last
6
Related Work/Our Contribution

Current work
Construction of overlay network
Dissemination/indexing of profiles (queries)
Processing of stream of messages
We focus on the matching process that takes place
within a broker
Improves the performance of regular FSM by using
a bottom-up evaluation of the document
Develop index-based filtering technique that
performs early pruning of the query profile

7
Overview

Motivation
Bottom-up Filtering FSM (BUFF)
Bounding-based XML Filtering (BoxFilter)
Core Modules
Filtering algorithms
Experimental results

8
Bottom-up vs. Top-down filtering

State machines are among the most common methods
for the XML matching process
Top-down approach (i.e. in-order traversal or
depth first order) advancing the state machine
for each XML element (or attribute) read.
Do not consider any form of early pruning
Bottom-up approach This approach takes into
consideration the (usual) fact that an XML
document has its more selective elements located
at its leaves

9
Example

Top-down approach groups the queries according to
their common prefixes
Bottom up groups them according to their common
suffixes.

root
Q2 a
Q5 e
Q3 a
a
a
a
a
a
a
a
a
a
a
a
c
f
e
b
b
b
b
b
b
b
b
b
b
b
d
h
f
c
c
c
c
c
c
c
c
c
c
c
d
(b) Queries
(a) Document
c
d
a
3
2
3
4
4
b
b
c
Q1
1
2
Q1
d
a
c
d
1
5
6
5
e
Q2
Q2
a
f
h
f
e
a
7
8
9
0
6
7
8
0
Q4
Q3
Q3
h
e
h
e
a
f
f
11
12
10
11
12
Q5
10
9
Q4
Q5
g
g
h
e
13
14
13
14
Q6
Q6
(d) Bottom up
(c) Top-down
10
BUFF

FSM-based Bottom-up approach for XML filtering.
BUFF avoids translating documents and queries to
Prüfer sequences (as the other algorithms do),
and employs a more direct evaluation algorithm.
The document is parsed through a SAX parser,
which triggers events for specific marks (tags)
in the XML document
The machine keeps a runtime stack that stores the
current document path being processed.

11
BUFF Example
e
ltegt
d
ltdgt
c
d
b
1
2
3
4
e
c
ltcgt
Q1
0
f
b
ltbgt
c
a
b
5
6
7
8
a
ltagt
Q2
(a) Document and BUFF
(b)?
(c)?
(d)?
(e)?
(f)?
(g)?
12
Overview

Motivation
Bottom-up Filtering FSM (BUFF)
Bounding-based XML Filtering (BoxFilter)
Core Modules
Filtering algorithms
Experimental results

13
Bounding-based XML Filtering

Two major processes working asynchronously
Profile Management
Profile Matching

Prüfer Sequence
Profile Manager
Matching Algorithm
Matching Module
Input Documents
Profiles (queries)?
Matched Documents
14
Prüfer Sequence

A unique sequential encoding of a labeled tree
Algorithm
Iteratively removes nodes from the tree until
all nodes but the last two have been removed.
At each iteration, the algorithm finds and
removes the leaf with the smallest label and adds
to the Prüfer sequence the label of that leaf's
parent.
Theorem If a query tree Q is a subgraph of a
document tree D then the Prüfer sequence of Q is
a subsequence of the Prüfer sequence of D

15
Sequence Envelope

Assume a set of k Prüfer sequences representing
user profiles S1,..,Sk
We can derive two new sequences
Upper bound U for each position take largest
element
Lower bound L for each position take smallest
element
L and U form the smallest possible bounding
envelope that encompasses all members of the set
of sequences from above and below.

16
Example

Assume 3 sequences with 11 symbols each
abcabababcd
cdcdecdcdec
dedededebab

17
Sequence Envelope (Cont.)

The sequence envelope structure is that it can be
used as an aggregation of the sustaining set of
sequences

18
BoXFilter Tree

Sequence envelopes can be nested forming
BoXFilter tree

19
Filtering algorithms

The profiles in the system are organized in
BoXFilter tree. Documents are traversed thought
the tree
There are two variations of the filtering
algorithm
Sequential documents are processed one by one
Batch processing documents are organized in a
tree like the queries and both trees are joined
After the traversal of the BoXFilter tree, there
is a verification step

20
Overview

Motivation
Bottom-up Filtering FSM (BUFF)
Bounding-based XML Filtering (BoxFilter)
Core Modules
Filtering algorithms
Experimental results

21
Experimental Results

We have generated datasets with 1000, 10000 and
100000 small documents (with up to 8KB)
We generated up to 100000 queries with
selectivity fixed to 50

(a)?
(b)?
(c)?
22
Experimental Results (cont.)

In this set of experiments, we vary the number
of documents that match any of the profile
queries. (selectivity 1\ means that one percent
of the documents satisfy \textitany of the
queries.)

23
Thank You!

Write a Comment

User Comments (0)