Title: Module 8
1Module 8
- Push-based communication
- (RSS, Publish/Subscribe, Information Filtering)
2Outline
- Detour into web services
- Motivation for Push
- RSS
- YFilter
- XML Path Filtering
- XML Transformations
- Semantic Overlays
- Context-Aware Information Filtering
3Detour into web services
- This is an excerpt of module 4 (which was
omitted) - But web service also provide a good motivation on
the rest of todays lecture - Important Web services are XML all over
4Technology overview
- XML
- SOAP message format
- WSDL service description
- UDDI service directory
- XQuery
- BPEL
- XL
5SOAP
- SOAP Simple Object Access Protocol
- W3C Standard current version 1.2
- Communication between applications(e.g. RPC,
Streams of Sensor Data) - Defines Layout (Type) of Messages
- Use in Internet and through Firewalls
- Platform- and PL independent
- Based on XML
- Simple and extensible
- Basis for further standards (Encryption, ...)
6WSDL
- Web Service Description Language
- Describes the Interface of a Web Service
- Call of a Web Service done via SOAP
- Allows the registration of services
- Basis for UDDI
- Syntax is XML
7UDDI
- Universal Description Discovery Integration
- Directory which stores WSDL
- Jini for the Web, yellow pages
- Communicates via SOAP Messages
- Organized in white, yellow and green pages
- white, yellow pages Informationen about
Providers - green pages WSDL of Services
- IBM and Microsoft have public UDDI Server
- Today, typically used in Intranet
8Use cases for Web Services
- Automization of Processes
- Enterprise Application Integration (EAI)
- Workflow Management
- Data Integration
- Enterprise Information Integration
(EII)(Connectivity, Global Data Model) - Portals
Integration, Integration, Integration
9Application Integration
II/V
App
App
I/VI
III/IV
VII/IX
VII/VIII
App
App
10Application Integration
II/V
App
App
XI
I/VI
III/IV
XII
VII/IX
VII/VIII
App
App
11Application Integration
II/V
App
App
XI
I/VI
III/IV
X
VII/IX
VII/VIII
App
App
- What impact do delays have?
- Who is affected by a change in one interface?
- How can this process be optimized?
- What about humans? How to exploit a Grid of
machines?
12Loose Coupling of Apps
App
App
Message Broker
App
App
13Loose Coupling of Apps (Web Services)
App
App
WSDL
WSDL
Routing? Virtualisation?
SOAP
Message Broker
SOAP
WSDL
WSDL
App
App
Web Services
14Virtualisierung von Anwendungen
App
App
WSDL
WSDL
Find Bind
I want all x!
Message Broker
WSDL
WSDL
App
App
Web Services
15Virtualisierung von Anwendungen
App
App
WSDL
WSDL
Find Bind
I want all x!
Message Broker
x1
I want y!
WSDL
WSDL
App
App
Web Services
16Virtualisierung von Anwendungen
App
App
WSDL
WSDL
Find Bind
y
I want all x!
Message Broker
x2,x3
x1
I want y!
WSDL
WSDL
App
App
Web Services
17Virtualisierung von Anwendungen
App
App
WSDL
WSDL
Find Bind
y
I want all x!
x1,x2,x3
Message Broker
x2,x3
x1
I want y!
WSDL
WSDL
App
App
Web Services
18Summary Web Services
- There is a lot more in web services see module 4
- Important
- Loose coupling of applications
- Virtualization
- Service discovery
- Common protocols
- And now back to our main track push-style
interaction
19Outline
- Detour into web services
- Motivation for Push
- RSS
- YFilter
- XML Path Filtering
- XML Transformations
- Semantic Overlays
- Context-Aware Information Filtering
20Is Pull the winner?
- Most of our interaction with the Web and
Databases is Pull - Browse the web to find the pictures of my
friends vacation on some remote island - Query the database to find out which books were
the the top sellers in Zurich around Christmas - Invoke a web service to compute p up to ten
billion digits - Is this the whole story?
21Examples of Push
- E-Mail communication
- Who uses a Blackberry?
- Who sends more than a 100 SMS/month?
- Event notification
- Information about offers (apartments, cars,
jobs) - News tickers
- Sensor data
- Put 100.000 very cheap (10 cent) sensors all over
ETH - Notify building security if temperature exceeds
55 degree centigrade
22Factors favoring Push
- Long-standing interest
- Very low or very high update rate
- 1 update per week
- 100 updates per second
- Large number of independant sources
- Watching 100.000 news sites all over the world is
impossible, but watching an RSS feed via Google
News is certainly possible - Scalability many users want the same thing
- E.g., this lecture, TV,
23Outline
- Detour into web services
- Motivation for Push
- RSS
- YFilter
- XML Path Filtering
- XML Transformations
- Semantic Overlays
- Context-Aware Information Filtering
24RSS
- Content syndication
- News tickers
- Blogs
- Alerts
- Simple XML format
- Lightweight
- Still some get it wrong ?
25RSS 2.0
- Simple Message Format for Data Push
- ltchannelgt
- ltitemgt ... ltcalstartTimegt...lt/calstartTime
gtlt/itemgt - lt/channelgt
26RSS Items and Types
27RSS 2.0 example
- ltrss version"2.0" xmlnsdc"http//purl.org/dc/el
ements/1.1/" xmlnsrdf"http//www.w3.org/1999/02/
22-rdf-syntax-ns"gt - ltchannelgt
- lttitlegtD-INFK Eventslt/titlegt
- ltdescriptiongtEvents of the Department of
Computer Science, ETH Zurichlt/descriptiongt - ltlinkgthttp//www.inf.ethz.ch/news/events/lt/linkgt
- ltdocsgthttp//www.inf.ethz.ch/rsslt/docsgt
-
- ltpubDategtTue, 17 Jan 2006 110604 GMTlt/pubDategt
- ltimagegt lturlgthttp//www.inf.ethz.ch/rss/inf-logo
.pnglt/urlgt - lttitlegtDepartment of Computer Sciencelt/titlegt
- ltlinkgthttp//www.inf.ethz.ch/lt/linkgt
- ltwidthgt140lt/widthgt ltheightgt35lt/heightgt
- lt/imagegt
- ltitem rdfabout"http//www.inf.ethz.ch/news/eve
nts/details/index?id593"gt - lttitlegtEstablishing trust in electronic
business correspondencelt/titlegt
ltlinkgthttp//www.inf.ethz.ch/news/events/details/
index?id593lt/linkgt - ltcategorygtZISC Colloquiumlt/categorygt
- ltdescriptiongtTuesday, 17 January 2006 1715,
by Dr. Ralf Hauser Privasphere
AGlt/descriptiongt - ltdcdategt2006-01-17lt/dcdategt ltguidgthttp//www.inf
.ethz.ch/news/events/details/index?id593lt/guidgt
lt/itemgt
28RSS Notes
- Format wars RSS 0.91, 1.0, 2.0, Atom
- All major news sites use it now
- Blogs would not work without it
- Currently targeted to human-machine communication
- Might be a good candidate for push-style
machine-machine communication, too - Ironically, RSS currently uses
- push-only as interaction model
- pull at the communication level
29Outline
- Detour into web services
- Motivation for Push
- RSS
- YFilter
- XML Path Filtering
- XML Transformations
- Semantic Overlays
- Context-Aware Information Filtering
30XML Message Brokering
lt?xml version"1.0" ?gt ltnitf
version"-//IPTC-NAA//DTD NITF-XML 2.1//EN" gt
ltheadgt lttobject tobject.type"news"gt
lttobject.subject
tobject.subject.type"Weather"/gt
lttobject.subject tobject.subject.matter"Statist
ics"/gt lt/tobjectgt lt/headgt ltbodygt
ltbody.headgt
lthedlinegtlthl1gtWeather and Tide Updates for
Norfolklt/hl1gt lt/hedlinegt lt/body.head
gt .
XML messages
client queries
query results
?
XML Message Broker
lt?xml version"1.0" ?gt ltnitf
version"-//IPTC-NAA//DTD NITF-XML 2.1//EN" gt
ltheadgt lttobject tobject.type"news"gt
lttobject.subject
tobject.subject.type"Weather"/gt
lttobject.subject tobject.subject.matter"Statist
ics"/gt lt/tobjectgt ltdocdata
doc-idref"iptc.32.a"gt ltdoc-id
id-string"iptc.32.b" /gt ltevloc
city"Norfolk" state-prov"VA" iso-cc"US" /gt
ltseries series.name"Tide
Forecasts" series.part"5"/gt lt/docdatagt
lt/headgt ltbodygt ltbody.headgt
lthedlinegtlthl1gtWeather and Tide Updates
for Norfolklt/hl1gt lt/hedlinegt
ltbylinegtBy ltpersongtJohn Smithlt/persongtlt/byli
negt lt/body.headgt .
Q1
Q2
lt?xml version"1.0" ?gt ltnitf
version"-//IPTC-NAA//DTD NITF-XML 2.1//EN"
gt ltbodygt ltbody.headgt
lthedlinegtlthl1gtWeather and Tide Updates for
Norfolklt/hl1gt lt/hedlinegt
ltbylinegtBy ltpersongtJohn Smithlt/persongtlt/bylinegt
lt/body.headgt .
?
Q3
Q4
Filtering
Transformation
Routing
31Message-based Middleware
- Publish/Subscribe
- Subscribers express interests, later notified of
relevant data from publishers. - Loose coupling at the communication level.
- XML, a de facto standard for online data exchange
- Flexible, extensible, self-describing.
- Enhanced functionality XSLT, XQuery,
- Loose coupling at the content level.
- XML message brokering
- Publish/subscribe XML flexibility at
communication and content levels. - Declarative XML queries provide high
functionality.
32New Applications
- Message brokering supports a large number of
emerging distributed applications - Application integration
- Personalized newspaper generation
- Stock tickers
- Network monitoring
- Mobile services
33Problem Statement
- Inputs (1) continuously arriving XML messages
(usually small) - (2) a set of XQuery queries representing
client interests - Main functions of an XML message broker
- Filtering matches messages to query predicates.
- Transformation restructures the matching
messages. - Routing directs messages to queries over a
network of brokers. - Challenges providing this functionality for
- large numbers of queries (e.g., 10s thousands of
them) - high volumes of XML messages (e.g., tens or
hundreds/sec)
34Design Space
Distribution
TIBCO MQ Pub/Sub JMS Pub/Sub
SienaSIGCOMM03 Gryphon PODC99
xmlBlaster Snoeren et al.SOSP01
ONYX VLDB04
Le Subscribe SIGMOD01
YFilter VLDB03
Oracle Advanced Queuing
XML filtering transformation
Subject- based
Predicate- based
XML filtering
Expressive-ness
lt?xml version"1.0" ?gt ltnitf version"-//DTD
NITF-XML 2.1//EN" gt ltheadgt lttobject
tobject.type"news"gt
lttobject.subject
tobject.subject.type"Weather"/gt
lt/tobjectgt lt/headgt ltbodygt
lthedlinegtlthl1gtWeather and Tide
Updates for Norfolklt/hl1gt lt/bodygt lt/nitfgt
35YFilter ONYX
- YFilter, a system for XML filtering and
transformation. - Filtering exploiting sharing
- Order-of-magnitude performance benefits over
previous work. - Scalable to 100s thousands of distinct queries.
- YFilter 1.0 release used in research projects
and product development, being integrated into
Apache Hermes for WS-Notification. - Transformation exploiting sharing
- The first algorithm for transformation for a
large set of queries. - Scalable up to 10s of thousands of distinct
queries. - Routing (ONYX) an overlay network of brokers
with routing abilities, providing flexible,
Internet-scale XML dissemination services.
36The Filtering Problem
- Full XPath/XQuery too expensive ?
- Query language path expression
- ( (/ //) (ElementName ) Predicate
) - The filtering problem
- Given (1) a set Q Q, , Qn of path queries,
where each Qi has an associated query identifier,
and (2) a stream of XML documents. - Compute, for each document D, the set of query
identifiers corresponding to the XPath queries
that match D.
37Constructing an FSM for a Query
- Key Idea represent query paths as state machine
that are driven by the XML parser (SAX) - Simple paths ( (/ //) (ElementName )
) - A finite state machine (FSM) for each path
mapping steps to machine states.
Map location steps to FSM fragments.
Location steps
FSM fragments
Concatenate FSM fragments for location steps in a
query.
Query /a//b
38Constructing the Combined FSM
- YFilter builds a single combined FSM for all
paths! - Complete prefix sharing among paths.
- Nondeterministic Finite Automaton (NFA)-based
implementation a small machine size, flexible,
easy to maintain, etc. - Output function (Moore machine) accepting states
? partition of query ids.
Q1/a/b
Q5/a//b
Q6/a//c
Q2/a/c
Q7/a///c
Q3/a/b/c
Q4/a//b/c
Q8/a/b/c
39Execution Algorithm
- YFilter uses a stack mechanism to handle XML
- Backtracking in the NFA.
- No repeated work for the same element!
Runtime Stack
NFA
ltbgt
ltcgt
lt/cgt
40DFA vs. NFA
- DFA has exponential number of states
- Large main-memory requirements
- Or I/O needed in order to process messages
- DFA has high maintenance costs
- Need to rerun Myhill/Büchi algorithm, everytime a
new profile is posted or deleted - NFA is slower than DFA
- NFA entries in stack can grow exponentially
- In practice, XML documents are fairly flat
- NFA is the clear winner (current trade-offs)!
41Performance results for YFilter
- YFilter scales to 150,000 distinct path queries
w/o predicates. - Consistently takes 30 msec or less.
- Achieves a 25x performance improvement over
previous approaches - Deep element nesting No exponential blow-up of
active states. - Sensitivity to and // Little, due to
effective prefix sharing. - NFA maintenance for query updates Tens of
milliseconds for inserting 1000 queries. - YFilter handles 100s thousands of queries with
predicates. - No real competition before
- Mechanism not shown here. What are the
difficulties?
42Outline
- Detour into web services
- Motivation for Push
- RSS
- YFilter
- XML Path Filtering
- XML Transformations
- Semantic Overlays
- Context-Aware Information Filtering
43Context-aware information filtering
- This is our (my) current research project
- Publish/Subscribe systems like YFilter assume a
fairly static profile setI am interested in
all messages regarding soccer - In reality, matches are often influenced by some
state, external eventsGive me all soccer
results if they influence the ranking of my
favorite club. If I am in the office, send me a
small video clip, if I only have a mobile phone,
send me a SMS - We call those influencing factors context
44Example of context
I have IBM stock
I am at work
I sold all my Euros
I have 200 unread mails
I am on a plane
I am at home
Some state (context) is referenced in the profile
and included into the matching decision
45Examples of context usage in profiles
- Only send me todays mensa menu if I am in less
than 500 m distance to HG/CAB - Only send me the train delay messages if I plan
to take that train or need to pick someone up who
is on that train - Route the order message to a warehouse that has
all items in stock
46Context-Aware Information Filter
- State updates can come from external sources
- State change does not directly trigger a reaction
it merely affects further matching decisions - Context updates compete with messages in the
filter
profiles/
rules
Stream of
Information
Stream of
messages
Filter
(Message,
Stream of
Matched Profiles)
context updates
47Challenges with Context-Aware IF
- Information filters are optimized for
- High throughput of messages
- Large number of profiles
- Static profiles
- Context usage adds
- Dynamic profiles with many context updates
- Fluctuation in update rates
gt Integrated IF Context Management
48Components of an IF
Indexes
Merge
Postfilter
P
Ù
M
P
P
P
Ú
P
Result with
Matches for
false positives
single index
- Profile Indexes produce candidates for matching
profiles (eliminate most other candidates) - Merge combine results of index lookups,
(evaluate combinations of simple predicates) - Postfilter evaluate predicates individually for
each profile (eliminate all false positives)
49Architecture of an CIF
Indexes
Merge
Postfilter
P
Ù
P
P
M
P
Ú
P
Result with
Matches for
false positives
single index
U
U
U
Context Management
- Context Management keep track of context state
- Propagate the changes to the indexes where this
information used - Postfilter also refers to context information
50The solution AGILE (Adaptive Generic Indexing
with Local Escalations)
- Treat index as filter (allow false positives)
- Adapt to workload
- High update rate Make index less accurate
- (Fewer updates, more false positives)
- Decreasing update rate Make index more accurate
- (More updates, fewer false positives)
- Control accuracy in granularity of profile
51Decrease accuracy Escalation example
5
Alt5
Index
6
2
A
,B
A
2
3
Context Management
B
2
Update A 3
52Probe on escalated item
5
A
6
2
B
A
3
B
2
Result for Probe(2)
A
,B
53Update on escalated item
5
A
6
2
B
1
A
3
B
2
Update A 1
54Deescalation of an escalated item
5
A
6
2
B
A
1
1
B
2
A
55Index accuracy control in AGILE
- Two new operations
- Escalate decrease accuracy
- Deescalate increase accuracy
- Generic, applicable to all tree indexes
- Fine-grained adaptivity
- Open Issue Control Policy
- Our solution
- Escalation on every update (unless not necessary)
- Deescalation by feedback of false positives
56Accuracy control by feedback
Feedback of false positives
Indexes
Postfilter
Merge
P
Ù
P
P
P
M
Ú
P
Result with
Matches for
false positives
single index
High
U
U
U
Context Management
Cost for message handling
Cost for update handling
Low
Accuracy balance
Index cost
Postfilter cost
57Accuracy control more updates
Decrease accuracy
Feedback of false positives
Indexes
Postfilter
Merge
P
Ù
P
P
P
M
Ú
P
Result with
Matches for
false positives
single index
High
U
U
U
Context Management
Low
58Accurcy control fewer updates
Feedback of false positives
Increase accuracy
Indexes
Postfilter
Merge
P
Ù
P
P
P
M
Ú
P
Result with
Matches for
false positives
single index
High
U
U
U
Context Management
Low
59AGILE benefits
- Adaptation to changes in update rate
- No Updates gt Accurate Index
- Many Updates gt Increasingly fuzzy index
- Adaptation to value skew in workload
- Updates in one area, probes in another gtFuzzy
in the first, accurate in the other - When skew changes, the index will change, too
- Overhead
- Filtering false positives
- Accuracy control
60Some performance numbers
- 500 K profiles, 8 attributes, range predicates
- No updates 450 messages/sec
- Only updates 1 million updates/sec
- Change between those extremes with some hundred
messages
61Status of Context-Aware IF
- Using context in IF enables richer profiles, but
introduces frequent updates - AGILE deals with the update/probe balance
- Flexible index accuracy
- Automatic tuning by feedback
- Applicable to any tree-like index
- AGILE works well
- Best performance in middle ground
- Low overhead at the extremes
- Good adaptation to load changes
- No manual intervention needed during runtime
62Research Topics in IF(also Thesis Ad)
- AGILE on classic indexes (B-Tree, R-Tree), also
on disk - Reliability
- How do you make sure that all events are
processed, even if there machine crashes or
network outages? - QoS issues in IF
- How do you provide certain properties of service,
e.g. latency or throughput? - Adaptivity/Feedback for other aspects
- Go to http//www.dbis.ethz.ch/education/Theses/inf
ofilt (No RSS yet ?)