Query subscription - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Query subscription

Description:

Alerters. Each Alerter can be viewed as a plug-in that acts on a document flow. ... On the Alerter. Exemple: l phant ELEPHANT. Noise may be introduced ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 11
Provided by: sergeab
Category:

less

Transcript and Presenter's Notes

Title: Query subscription


1
Query subscription
2
The Web changes all the time
  • Crawler that crawls the web
  • Filtering of the flow of documents based on some
    query subscriptions

3
Subscription Language
  • SQL-like language based on atomic events.
  • Combines the use of monitoring queries and
    continuous queries.
  • The language can be extended by adding new types
    of atomic events.
  • Uses an XML Query Language for continuous queries

4
Example
  • subscription myPaintings
  • what are the new painting entries in Musee
    dOrsay site
  • monitoring newPainting
  • select URL
  • where URL extends www.musee-orsay.fr/
  • and ltpaintergt contains Monet
  • manage the changes in the expositions
  • continuous delta Exposition
  • select ... from ... where
  • when monthly
  • notify daily send me a daily report

Atomic events
5
Step 1 Atomic Event Detection
5 millions of pages/day
atomic event 46 URL matches pattern
www.musee-orsay.fr/ atomic event 67 XML
document contains the tag ltpaintergt with the
value Monet
metadata manager
document alerts d/46
XML loader
complex event detection
d/46,67
6
Alerters
  • Each Alerter can be viewed as a plug-in that acts
    on a document flow.
  • All sorts of Atomic events can be detected URL
    pattern detection, Keywords, XPath expressions,
    Page rank
  • Can be distributed.
  • Some advanced alerts are
  • Long string look-ups
  • Finding XML Patterns (e.g. XPath)
  • Comparing digital signature of text documents
    (copy tracker)

7
URL Patterns Detection
  • Supported patterns
  • URL prefix suffix
  • Using Hash Table try all possible patterns
  • Test in O(1), total test time is O(n), where n is
    the length of URLs
  • Example http//www.inria.fr/verso/index.html
  • Test
  • http//www.inria.fr/verso/
  • http//www.inria.fr/

8
Stemming
  • On the Alerter
  • Exemple Éléphant gt ELEPHANT
  • Noise may be introduced
  • (Example tâche tache)
  • On the Subscription Manager
  • To avoid duplicate registration of similar events
  • To show the user how his query is stemmed
  • Real stemmers and concept extraction
  • chevaux ? cheval
  • Composite words, beau fils ? gendre

9
Step 2 Complex Event Detection
Millions of alerts of pages/day Millions of
subscriptions
HTML parser
complex event detection
complex event 12 67 46 (XML document contains
the tag ltpaintergt with value Monet and URL
matches pattern www.musee-orsay.fr/)
XML loader
10
Complex Events Algorithm
  • The formal problem is NP-hard
  • We proposed several possible algorithms
  • Experimental (simulation) values proved the
    effectiveness of our solutions
  • The Hash-Tree based algorithm is well suited for
    our application
  • 10 million Complex Events
  • 1 million Atomic Events
  • 100 Atomic events detected per document
  • 0.8 ms to process a document. 2 million
    documents per day (on each PC).

11
Step 3 Notification Processor
alerts
notification/monitoring
Reporter
complex event detection
Millions of Notifications/day
triggers
clock
continuous queries
notification/results
12
Architecture
Xyleme Query Processor
documents
Trigger Engine
Xyleme Alerter
Xyleme Reporter
Complex Event Detection
Reporter
Subscription Manager
SQL
Web Browser
Xyleme Subscription Manager
SQL
Write a Comment
User Comments (0)
About PowerShow.com