Early%20Profile%20Pruning%20on%20XML-aware%20Publish-Subscribe%20Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Early%20Profile%20Pruning%20on%20XML-aware%20Publish-Subscribe%20Systems

Description:

Early Profile Pruning on XML-aware Publish-Subscribe Systems Mirella M. Moro, Petko Bakalov, Vassilis J. Tsotras University of California, Riverside ... – PowerPoint PPT presentation

Number of Views:123
Avg rating:3.0/5.0
Slides: 24
Provided by: ucr98
Learn more at: http://alumni.cs.ucr.edu
Category:

less

Transcript and Presenter's Notes

Title: Early%20Profile%20Pruning%20on%20XML-aware%20Publish-Subscribe%20Systems


1
Early Profile Pruning on XML-aware
Publish-Subscribe Systems
  • Mirella M. Moro, Petko Bakalov, Vassilis J.
    Tsotras
  • University of California, Riverside

2
Overview
  • Motivation
  • Bottom-up Filtering FSM (BUFF)
  • Bounding-based XML Filtering (BoxFilter)
  • Core Modules
  • Filtering algorithms
  • Experimental results

3
Motivation
  • Publish-subscribe systems The message
    transmission is defined by the message content
  • Examples notification websites hotwire.com or
    ticketmaster.com

Publisher
Publisher
Publisher
Publisher
Docu ments
Docu ments
Docu ments
Docu ments
Matching algorithm
Re su l t
Re su l t
Re su l t
Re su l t
Prof ile
Prof ile
Prof ile
Prof ile
Submit, Update, Delete
Submit, Update, Delete
Submit, Update, Delete
Submit, Update, Delete
Subscriber
Subscriber
Subscriber
Subscriber
4
Publish-subscribe systems
  • The data is exchanged in XML format.
  • Nodes - correspond to elements, attributes or
    text values
  • Edges represent immediate element-subelement or
    element-value relationships

ltBibgt ltarticle vol7 no11gt lttitlegtt1lt/titl
egt ltauthorgt ltlastgtDeWittlt/lastgt ltmigtJlt/migt
ltfirstgtDavidlt/firstgt lt/authorgt ltjournalgtTP
DSlt/journalgt ltyeargt1996lt/yeargt lt/articlegt ltart
iclegt lttitlegtt2lt/titlegt ltauthorgt ltlastgtFlor
escult/lastgt ltfirstgtDanielalt/firstgt lt/authorgt
ltproceedingsgtSIGMOD lt/proceedingsgt lty
eargt2006lt/yeargt lt/articlegt lt/Bibgt
(a) Document
(b) Tree representation
5
Publish-subscribe systems (cont.)
  • The user profiles are expressed in XML query
    language (XPath, XQuery)
  • XML query contains
  • structural constraints
  • value-based constraints

Structural constraints ////article/author_at_last
Smith''//procs_at_confVLDB''
Tree pattern
article
proceedings
author
conf
last
6
Related Work/Our Contribution
  • Current work
  • Construction of overlay network
  • Dissemination/indexing of profiles (queries)
  • Processing of stream of messages
  • We focus on the matching process that takes place
    within a broker
  • Improves the performance of regular FSM by using
    a bottom-up evaluation of the document
  • Develop index-based filtering technique that
    performs early pruning of the query profile

7
Overview
  • Motivation
  • Bottom-up Filtering FSM (BUFF)
  • Bounding-based XML Filtering (BoxFilter)
  • Core Modules
  • Filtering algorithms
  • Experimental results

8
Bottom-up vs. Top-down filtering
  • State machines are among the most common methods
    for the XML matching process
  • Top-down approach (i.e. in-order traversal or
    depth first order) advancing the state machine
    for each XML element (or attribute) read.
  • Do not consider any form of early pruning
  • Bottom-up approach This approach takes into
    consideration the (usual) fact that an XML
    document has its more selective elements located
    at its leaves

9
Example
  • Top-down approach groups the queries according to
    their common prefixes
  • Bottom up groups them according to their common
    suffixes.

root
Q2 a
Q5 e
Q3 a
a
a
a
a
a
a
a
a
a
a
a
c
f
e
b
b
b
b
b
b
b
b
b
b
b
d
h
f
c
c
c
c
c
c
c
c
c
c
c
d
(b) Queries
(a) Document
c
d
a
3
2
3
4
4
b
b
c
Q1
1
2
Q1
d
a
c
d
1
5
6
5
e
Q2
Q2
a
f
h
f
e
a
7
8
9
0
6
7
8
0
Q4
Q3
Q3
h
e
h
e
a
f
f
11
12
10
11
12
Q5
10
9
Q4
Q5
g
g
h
e
13
14
13
14
Q6
Q6
(d) Bottom up
(c) Top-down
10
BUFF
  • FSM-based Bottom-up approach for XML filtering.
  • BUFF avoids translating documents and queries to
    Prüfer sequences (as the other algorithms do),
    and employs a more direct evaluation algorithm.
  • The document is parsed through a SAX parser,
    which triggers events for specific marks (tags)
    in the XML document
  • The machine keeps a runtime stack that stores the
    current document path being processed.

11
BUFF Example
e
ltegt
d
ltdgt
c
d
b
1
2
3
4
e
c
ltcgt
Q1
0
f
b
ltbgt
c
a
b
5
6
7
8
a
ltagt
Q2
(a) Document and BUFF
(b)?
(c)?
(d)?
(e)?
(f)?
(g)?
12
Overview
  • Motivation
  • Bottom-up Filtering FSM (BUFF)
  • Bounding-based XML Filtering (BoxFilter)
  • Core Modules
  • Filtering algorithms
  • Experimental results

13
Bounding-based XML Filtering
  • Two major processes working asynchronously
  • Profile Management
  • Profile Matching

Prüfer Sequence
Profile Manager
Matching Algorithm
Matching Module
Input Documents
Profiles (queries)?
Matched Documents
14
Prüfer Sequence
  • A unique sequential encoding of a labeled tree
  • Algorithm
  • Iteratively removes nodes from the tree until
    all nodes but the last two have been removed.
  • At each iteration, the algorithm finds and
    removes the leaf with the smallest label and adds
    to the Prüfer sequence the label of that leaf's
    parent.
  • Theorem If a query tree Q is a subgraph of a
    document tree D then the Prüfer sequence of Q is
    a subsequence of the Prüfer sequence of D

15
Sequence Envelope
  • Assume a set of k Prüfer sequences representing
    user profiles S1,..,Sk
  • We can derive two new sequences
  • Upper bound U for each position take largest
    element
  • Lower bound L for each position take smallest
    element
  • L and U form the smallest possible bounding
    envelope that encompasses all members of the set
    of sequences from above and below.

16
Example
  • Assume 3 sequences with 11 symbols each
  • abcabababcd
  • cdcdecdcdec
  • dedededebab

17
Sequence Envelope (Cont.)
  • The sequence envelope structure is that it can be
    used as an aggregation of the sustaining set of
    sequences

18
BoXFilter Tree
  • Sequence envelopes can be nested forming
    BoXFilter tree

19
Filtering algorithms
  • The profiles in the system are organized in
    BoXFilter tree. Documents are traversed thought
    the tree
  • There are two variations of the filtering
    algorithm
  • Sequential documents are processed one by one
  • Batch processing documents are organized in a
    tree like the queries and both trees are joined
  • After the traversal of the BoXFilter tree, there
    is a verification step

20
Overview
  • Motivation
  • Bottom-up Filtering FSM (BUFF)
  • Bounding-based XML Filtering (BoxFilter)
  • Core Modules
  • Filtering algorithms
  • Experimental results

21
Experimental Results
  • We have generated datasets with 1000, 10000 and
    100000 small documents (with up to 8KB)
  • We generated up to 100000 queries with
    selectivity fixed to 50

(a)?
(b)?
(c)?
22
Experimental Results (cont.)
  • In this set of experiments, we vary the number
    of documents that match any of the profile
    queries. (selectivity 1\ means that one percent
    of the documents satisfy \textitany of the
    queries.)

23
Thank You!
Write a Comment
User Comments (0)
About PowerShow.com