Module 8 - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

Module 8

Description:

... to find out which books were the the top sellers in Zurich around Christmas ... link http://www.nytimes.com/2002/09/07/movies/07FEST.html /link URL' Just Text ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 63
Provided by: PeterF166
Category:
Tags: module

less

Transcript and Presenter's Notes

Title: Module 8


1
Module 8
  • Push-based communication
  • (RSS, Publish/Subscribe, Information Filtering)

2
Outline
  • Detour into web services
  • Motivation for Push
  • RSS
  • YFilter
  • XML Path Filtering
  • XML Transformations
  • Semantic Overlays
  • Context-Aware Information Filtering

3
Detour into web services
  • This is an excerpt of module 4 (which was
    omitted)
  • But web service also provide a good motivation on
    the rest of todays lecture
  • Important Web services are XML all over

4
Technology overview
  • XML
  • SOAP message format
  • WSDL service description
  • UDDI service directory
  • XQuery
  • BPEL
  • XL

5
SOAP
  • SOAP Simple Object Access Protocol
  • W3C Standard current version 1.2
  • Communication between applications(e.g. RPC,
    Streams of Sensor Data)
  • Defines Layout (Type) of Messages
  • Use in Internet and through Firewalls
  • Platform- and PL independent
  • Based on XML
  • Simple and extensible
  • Basis for further standards (Encryption, ...)

6
WSDL
  • Web Service Description Language
  • Describes the Interface of a Web Service
  • Call of a Web Service done via SOAP
  • Allows the registration of services
  • Basis for UDDI
  • Syntax is XML

7
UDDI
  • Universal Description Discovery Integration
  • Directory which stores WSDL
  • Jini for the Web, yellow pages
  • Communicates via SOAP Messages
  • Organized in white, yellow and green pages
  • white, yellow pages Informationen about
    Providers
  • green pages WSDL of Services
  • IBM and Microsoft have public UDDI Server
  • Today, typically used in Intranet

8
Use cases for Web Services
  • Automization of Processes
  • Enterprise Application Integration (EAI)
  • Workflow Management
  • Data Integration
  • Enterprise Information Integration
    (EII)(Connectivity, Global Data Model)
  • Portals

Integration, Integration, Integration
9
Application Integration
II/V
App
App
I/VI
III/IV
VII/IX
VII/VIII
App
App
10
Application Integration
II/V
App
App
XI
I/VI
III/IV
XII
VII/IX
VII/VIII
App
App
11
Application Integration
II/V
App
App
XI
I/VI
III/IV
X
VII/IX
VII/VIII
App
App
  • What impact do delays have?
  • Who is affected by a change in one interface?
  • How can this process be optimized?
  • What about humans? How to exploit a Grid of
    machines?

12
Loose Coupling of Apps
App
App
Message Broker
App
App
13
Loose Coupling of Apps (Web Services)
App
App
WSDL
WSDL
Routing? Virtualisation?
SOAP
Message Broker
SOAP
WSDL
WSDL
App
App
Web Services
14
Virtualisierung von Anwendungen
App
App
WSDL
WSDL
Find Bind
I want all x!
Message Broker
WSDL
WSDL
App
App
Web Services
15
Virtualisierung von Anwendungen
App
App
WSDL
WSDL
Find Bind
I want all x!
Message Broker
x1
I want y!
WSDL
WSDL
App
App
Web Services
16
Virtualisierung von Anwendungen
App
App
WSDL
WSDL
Find Bind
y
I want all x!
Message Broker
x2,x3
x1
I want y!
WSDL
WSDL
App
App
Web Services
17
Virtualisierung von Anwendungen
App
App
WSDL
WSDL
Find Bind
y
I want all x!
x1,x2,x3
Message Broker
x2,x3
x1
I want y!
WSDL
WSDL
App
App
Web Services
18
Summary Web Services
  • There is a lot more in web services see module 4
  • Important
  • Loose coupling of applications
  • Virtualization
  • Service discovery
  • Common protocols
  • And now back to our main track push-style
    interaction

19
Outline
  • Detour into web services
  • Motivation for Push
  • RSS
  • YFilter
  • XML Path Filtering
  • XML Transformations
  • Semantic Overlays
  • Context-Aware Information Filtering

20
Is Pull the winner?
  • Most of our interaction with the Web and
    Databases is Pull
  • Browse the web to find the pictures of my
    friends vacation on some remote island
  • Query the database to find out which books were
    the the top sellers in Zurich around Christmas
  • Invoke a web service to compute p up to ten
    billion digits
  • Is this the whole story?

21
Examples of Push
  • E-Mail communication
  • Who uses a Blackberry?
  • Who sends more than a 100 SMS/month?
  • Event notification
  • Information about offers (apartments, cars,
    jobs)
  • News tickers
  • Sensor data
  • Put 100.000 very cheap (10 cent) sensors all over
    ETH
  • Notify building security if temperature exceeds
    55 degree centigrade

22
Factors favoring Push
  • Long-standing interest
  • Very low or very high update rate
  • 1 update per week
  • 100 updates per second
  • Large number of independant sources
  • Watching 100.000 news sites all over the world is
    impossible, but watching an RSS feed via Google
    News is certainly possible
  • Scalability many users want the same thing
  • E.g., this lecture, TV,

23
Outline
  • Detour into web services
  • Motivation for Push
  • RSS
  • YFilter
  • XML Path Filtering
  • XML Transformations
  • Semantic Overlays
  • Context-Aware Information Filtering

24
RSS
  • Content syndication
  • News tickers
  • Blogs
  • Alerts
  • Simple XML format
  • Lightweight
  • Still some get it wrong ?

25
RSS 2.0
  • Simple Message Format for Data Push
  • ltchannelgt
  • ltitemgt   ...   ltcalstartTimegt...lt/calstartTime
    gtlt/itemgt
  • lt/channelgt

26
RSS Items and Types
27
RSS 2.0 example
  • ltrss version"2.0" xmlnsdc"http//purl.org/dc/el
    ements/1.1/" xmlnsrdf"http//www.w3.org/1999/02/
    22-rdf-syntax-ns"gt
  • ltchannelgt
  • lttitlegtD-INFK Eventslt/titlegt
  • ltdescriptiongtEvents of the Department of
    Computer Science, ETH Zurichlt/descriptiongt
  • ltlinkgthttp//www.inf.ethz.ch/news/events/lt/linkgt
  • ltdocsgthttp//www.inf.ethz.ch/rsslt/docsgt
  • ltpubDategtTue, 17 Jan 2006 110604 GMTlt/pubDategt
  • ltimagegt lturlgthttp//www.inf.ethz.ch/rss/inf-logo
    .pnglt/urlgt
  • lttitlegtDepartment of Computer Sciencelt/titlegt
  • ltlinkgthttp//www.inf.ethz.ch/lt/linkgt
  • ltwidthgt140lt/widthgt ltheightgt35lt/heightgt
  • lt/imagegt
  • ltitem rdfabout"http//www.inf.ethz.ch/news/eve
    nts/details/index?id593"gt
  • lttitlegtEstablishing trust in electronic
    business correspondencelt/titlegt
    ltlinkgthttp//www.inf.ethz.ch/news/events/details/
    index?id593lt/linkgt
  • ltcategorygtZISC Colloquiumlt/categorygt
  • ltdescriptiongtTuesday, 17 January 2006 1715,
    by Dr. Ralf Hauser Privasphere
    AGlt/descriptiongt
  • ltdcdategt2006-01-17lt/dcdategt ltguidgthttp//www.inf
    .ethz.ch/news/events/details/index?id593lt/guidgt
    lt/itemgt

28
RSS Notes
  • Format wars RSS 0.91, 1.0, 2.0, Atom
  • All major news sites use it now
  • Blogs would not work without it
  • Currently targeted to human-machine communication
  • Might be a good candidate for push-style
    machine-machine communication, too
  • Ironically, RSS currently uses
  • push-only as interaction model
  • pull at the communication level

29
Outline
  • Detour into web services
  • Motivation for Push
  • RSS
  • YFilter
  • XML Path Filtering
  • XML Transformations
  • Semantic Overlays
  • Context-Aware Information Filtering

30
XML Message Brokering
lt?xml version"1.0" ?gt   ltnitf
version"-//IPTC-NAA//DTD NITF-XML 2.1//EN" gt
ltheadgt  lttobject tobject.type"news"gt 
lttobject.subject
tobject.subject.type"Weather"/gt  
lttobject.subject tobject.subject.matter"Statist
ics"/gt   lt/tobjectgt lt/headgt ltbodygt
ltbody.headgt
lthedlinegtlthl1gtWeather and Tide Updates for
Norfolklt/hl1gt lt/hedlinegt lt/body.head
gt .
XML messages
client queries
query results
?
XML Message Broker
lt?xml version"1.0" ?gt   ltnitf
version"-//IPTC-NAA//DTD NITF-XML 2.1//EN" gt
ltheadgt  lttobject tobject.type"news"gt 
lttobject.subject
tobject.subject.type"Weather"/gt  
lttobject.subject tobject.subject.matter"Statist
ics"/gt   lt/tobjectgt ltdocdata
doc-idref"iptc.32.a"gt  ltdoc-id
id-string"iptc.32.b" /gt   ltevloc
city"Norfolk" state-prov"VA" iso-cc"US" /gt  
ltseries series.name"Tide
Forecasts" series.part"5"/gt lt/docdatagt
lt/headgt ltbodygt ltbody.headgt
lthedlinegtlthl1gtWeather and Tide Updates
for Norfolklt/hl1gt lt/hedlinegt
ltbylinegtBy ltpersongtJohn Smithlt/persongtlt/byli
negt lt/body.headgt .
Q1
Q2
lt?xml version"1.0" ?gt   ltnitf
version"-//IPTC-NAA//DTD NITF-XML 2.1//EN"
gt ltbodygt ltbody.headgt
lthedlinegtlthl1gtWeather and Tide Updates for
Norfolklt/hl1gt lt/hedlinegt
ltbylinegtBy ltpersongtJohn Smithlt/persongtlt/bylinegt
lt/body.headgt .
?
Q3
Q4
Filtering
Transformation
Routing
31
Message-based Middleware
  • Publish/Subscribe
  • Subscribers express interests, later notified of
    relevant data from publishers.
  • Loose coupling at the communication level.
  • XML, a de facto standard for online data exchange
  • Flexible, extensible, self-describing.
  • Enhanced functionality XSLT, XQuery,
  • Loose coupling at the content level.
  • XML message brokering
  • Publish/subscribe XML flexibility at
    communication and content levels.
  • Declarative XML queries provide high
    functionality.

32
New Applications
  • Message brokering supports a large number of
    emerging distributed applications
  • Application integration
  • Personalized newspaper generation
  • Stock tickers
  • Network monitoring
  • Mobile services

33
Problem Statement
  • Inputs (1) continuously arriving XML messages
    (usually small)
  • (2) a set of XQuery queries representing
    client interests
  • Main functions of an XML message broker
  • Filtering matches messages to query predicates.
  • Transformation restructures the matching
    messages.
  • Routing directs messages to queries over a
    network of brokers.
  • Challenges providing this functionality for
  • large numbers of queries (e.g., 10s thousands of
    them)
  • high volumes of XML messages (e.g., tens or
    hundreds/sec)

34
Design Space
Distribution
TIBCO MQ Pub/Sub JMS Pub/Sub
SienaSIGCOMM03 Gryphon PODC99
xmlBlaster Snoeren et al.SOSP01
ONYX VLDB04
Le Subscribe SIGMOD01
YFilter VLDB03
Oracle Advanced Queuing
XML filtering transformation
Subject- based
Predicate- based
XML filtering
Expressive-ness
lt?xml version"1.0" ?gt   ltnitf version"-//DTD
NITF-XML 2.1//EN" gt ltheadgt  lttobject
tobject.type"news"gt 
lttobject.subject
tobject.subject.type"Weather"/gt  
lt/tobjectgt lt/headgt ltbodygt
lthedlinegtlthl1gtWeather and Tide
Updates for Norfolklt/hl1gt lt/bodygt lt/nitfgt
35
YFilter ONYX
  • YFilter, a system for XML filtering and
    transformation.
  • Filtering exploiting sharing
  • Order-of-magnitude performance benefits over
    previous work.
  • Scalable to 100s thousands of distinct queries.
  • YFilter 1.0 release used in research projects
    and product development, being integrated into
    Apache Hermes for WS-Notification.
  • Transformation exploiting sharing
  • The first algorithm for transformation for a
    large set of queries.
  • Scalable up to 10s of thousands of distinct
    queries.
  • Routing (ONYX) an overlay network of brokers
    with routing abilities, providing flexible,
    Internet-scale XML dissemination services.

36
The Filtering Problem
  • Full XPath/XQuery too expensive ?
  • Query language path expression
  • ( (/ //) (ElementName ) Predicate
    )
  • The filtering problem
  • Given (1) a set Q Q, , Qn of path queries,
    where each Qi has an associated query identifier,
    and (2) a stream of XML documents.
  • Compute, for each document D, the set of query
    identifiers corresponding to the XPath queries
    that match D.

37
Constructing an FSM for a Query
  • Key Idea represent query paths as state machine
    that are driven by the XML parser (SAX)
  • Simple paths ( (/ //) (ElementName )
    )
  • A finite state machine (FSM) for each path
    mapping steps to machine states.

Map location steps to FSM fragments.
Location steps
FSM fragments
Concatenate FSM fragments for location steps in a
query.
Query /a//b
38
Constructing the Combined FSM
  • YFilter builds a single combined FSM for all
    paths!
  • Complete prefix sharing among paths.
  • Nondeterministic Finite Automaton (NFA)-based
    implementation a small machine size, flexible,
    easy to maintain, etc.
  • Output function (Moore machine) accepting states
    ? partition of query ids.

Q1/a/b
Q5/a//b
Q6/a//c
Q2/a/c
Q7/a///c
Q3/a/b/c
Q4/a//b/c
Q8/a/b/c
39
Execution Algorithm
  • YFilter uses a stack mechanism to handle XML
  • Backtracking in the NFA.
  • No repeated work for the same element!

Runtime Stack
NFA
ltbgt
ltcgt
lt/cgt
40
DFA vs. NFA
  • DFA has exponential number of states
  • Large main-memory requirements
  • Or I/O needed in order to process messages
  • DFA has high maintenance costs
  • Need to rerun Myhill/Büchi algorithm, everytime a
    new profile is posted or deleted
  • NFA is slower than DFA
  • NFA entries in stack can grow exponentially
  • In practice, XML documents are fairly flat
  • NFA is the clear winner (current trade-offs)!

41
Performance results for YFilter
  • YFilter scales to 150,000 distinct path queries
    w/o predicates.
  • Consistently takes 30 msec or less.
  • Achieves a 25x performance improvement over
    previous approaches
  • Deep element nesting No exponential blow-up of
    active states.
  • Sensitivity to and // Little, due to
    effective prefix sharing.
  • NFA maintenance for query updates Tens of
    milliseconds for inserting 1000 queries.
  • YFilter handles 100s thousands of queries with
    predicates.
  • No real competition before
  • Mechanism not shown here. What are the
    difficulties?

42
Outline
  • Detour into web services
  • Motivation for Push
  • RSS
  • YFilter
  • XML Path Filtering
  • XML Transformations
  • Semantic Overlays
  • Context-Aware Information Filtering

43
Context-aware information filtering
  • This is our (my) current research project
  • Publish/Subscribe systems like YFilter assume a
    fairly static profile setI am interested in
    all messages regarding soccer
  • In reality, matches are often influenced by some
    state, external eventsGive me all soccer
    results if they influence the ranking of my
    favorite club. If I am in the office, send me a
    small video clip, if I only have a mobile phone,
    send me a SMS
  • We call those influencing factors context

44
Example of context
I have IBM stock
I am at work
I sold all my Euros
I have 200 unread mails
I am on a plane
I am at home
Some state (context) is referenced in the profile
and included into the matching decision
45
Examples of context usage in profiles
  • Only send me todays mensa menu if I am in less
    than 500 m distance to HG/CAB
  • Only send me the train delay messages if I plan
    to take that train or need to pick someone up who
    is on that train
  • Route the order message to a warehouse that has
    all items in stock

46
Context-Aware Information Filter
  • State updates can come from external sources
  • State change does not directly trigger a reaction
    it merely affects further matching decisions
  • Context updates compete with messages in the
    filter

profiles/
rules
Stream of
Information
Stream of
messages
Filter
(Message,
Stream of

Matched Profiles)
context updates
47
Challenges with Context-Aware IF
  • Information filters are optimized for
  • High throughput of messages
  • Large number of profiles
  • Static profiles
  • Context usage adds
  • Dynamic profiles with many context updates
  • Fluctuation in update rates

gt Integrated IF Context Management
48
Components of an IF
Indexes
Merge
Postfilter
P
Ù
M
P
P
P
Ú
P
Result with
Matches for
false positives
single index
  • Profile Indexes produce candidates for matching
    profiles (eliminate most other candidates)
  • Merge combine results of index lookups,
    (evaluate combinations of simple predicates)
  • Postfilter evaluate predicates individually for
    each profile (eliminate all false positives)

49
Architecture of an CIF
Indexes
Merge
Postfilter
P
Ù
P
P
M
P
Ú
P
Result with
Matches for
false positives
single index
U
U
U
Context Management
  • Context Management keep track of context state
  • Propagate the changes to the indexes where this
    information used
  • Postfilter also refers to context information

50
The solution AGILE (Adaptive Generic Indexing
with Local Escalations)
  • Treat index as filter (allow false positives)
  • Adapt to workload
  • High update rate Make index less accurate
  • (Fewer updates, more false positives)
  • Decreasing update rate Make index more accurate
  • (More updates, fewer false positives)
  • Control accuracy in granularity of profile

51
Decrease accuracy Escalation example
5




Alt5
Index
6
2

A

,B




A
2
3
Context Management

B
2
Update A 3
52
Probe on escalated item
5
A


6
2





B

A
3

B
2
Result for Probe(2)
A
,B
53
Update on escalated item
5
A


6
2





B

1
A
3

B
2
Update A 1
54
Deescalation of an escalated item
5
A



6
2





B

A
1
1

B
2
A


55
Index accuracy control in AGILE
  • Two new operations
  • Escalate decrease accuracy
  • Deescalate increase accuracy
  • Generic, applicable to all tree indexes
  • Fine-grained adaptivity
  • Open Issue Control Policy
  • Our solution
  • Escalation on every update (unless not necessary)
  • Deescalation by feedback of false positives

56
Accuracy control by feedback
Feedback of false positives
Indexes
Postfilter
Merge
P
Ù
P
P
P
M
Ú
P
Result with
Matches for
false positives
single index
High
U
U
U
Context Management
Cost for message handling
Cost for update handling
Low
Accuracy balance
Index cost
Postfilter cost
57
Accuracy control more updates
Decrease accuracy
Feedback of false positives
Indexes
Postfilter
Merge
P
Ù
P
P
P
M
Ú
P
Result with
Matches for
false positives
single index
High
U
U
U
Context Management
Low
58
Accurcy control fewer updates
Feedback of false positives
Increase accuracy
Indexes
Postfilter
Merge
P
Ù
P
P
P
M
Ú
P
Result with
Matches for
false positives
single index
High
U
U
U
Context Management
Low
59
AGILE benefits
  • Adaptation to changes in update rate
  • No Updates gt Accurate Index
  • Many Updates gt Increasingly fuzzy index
  • Adaptation to value skew in workload
  • Updates in one area, probes in another gtFuzzy
    in the first, accurate in the other
  • When skew changes, the index will change, too
  • Overhead
  • Filtering false positives
  • Accuracy control

60
Some performance numbers
  • 500 K profiles, 8 attributes, range predicates
  • No updates 450 messages/sec
  • Only updates 1 million updates/sec
  • Change between those extremes with some hundred
    messages

61
Status of Context-Aware IF
  • Using context in IF enables richer profiles, but
    introduces frequent updates
  • AGILE deals with the update/probe balance
  • Flexible index accuracy
  • Automatic tuning by feedback
  • Applicable to any tree-like index
  • AGILE works well
  • Best performance in middle ground
  • Low overhead at the extremes
  • Good adaptation to load changes
  • No manual intervention needed during runtime

62
Research Topics in IF(also Thesis Ad)
  • AGILE on classic indexes (B-Tree, R-Tree), also
    on disk
  • Reliability
  • How do you make sure that all events are
    processed, even if there machine crashes or
    network outages?
  • QoS issues in IF
  • How do you provide certain properties of service,
    e.g. latency or throughput?
  • Adaptivity/Feedback for other aspects
  • Go to http//www.dbis.ethz.ch/education/Theses/inf
    ofilt (No RSS yet ?)
Write a Comment
User Comments (0)
About PowerShow.com