Title: Publish and Subscribe
 1Publish and Subscribe
- The Information Bus 
- An Architecture for Extensible Distributed 
 Systems
- Oki, Pfluegl, Siegel, Skeen. 1993. 
- Matching Events in a Content-based Subscription 
 System
- Aguilera, Strom, Sturman, Astley, Chandra. 1999.
Dan Sandler  COMP 520  October 7, 2004 
 2Distributed Systems in the Real World
- So far Tools for building distributed systems 
- Focused on certain problems 
- Redundancy 
- Distribution 
- Marshalling and communication 
- Less attention paid to others 
- Discoverable systems 
- Maintainable, upgradeable systems
3Generative Programming in Linda
- Review Linda 
- Typed data organized into tuples 
- Stored indefinitely in global tuple space 
- Tuples requested by partial specification 
- Anonymous communication
TUPLE SPACE 
 4Problems in Tuple Space
- Open Issues 
- Unbounded storage requirements of tuple space 
- Tuple contents weak on flexibility, metadata, 
 discoverability
- General tuple-searching can be complex, slow
TUPLE CLUTTER 
 5Take-aways from Linda
- The content itself connects senders to receivers 
- Participants have no other formal relationship 
- Lets explore this model further
6Publish and Subscribe
- Recall Lindas simple in/out operators 
- If there is an in() pending when a matching out() 
 is invoked, the scenario resembles what we now
 call Publish and Subscribe
- The Information Bus is such a system
Producer
out(ltgt)
Consumer
in(ltgt) 
 7The Information Bus
- Goal develop real-time, 24/7 systems 
- Circuit fabrication 
- Securities trading systems 
- Specific requirements derived from these 
 situations
- Continuous operation 
- Legacy systems integration 
- Dynamic system evolution
8Evolution is hard
- Capacity for change must be planned from the 
 beginning
- Systems may need to evolve in many ways 
- New kinds of data 
- New applications (services, clients) 
- Fault recovery and scalability can be considered 
 evolution
- Remember Evolution must occur without 
 interruption of service
9Architecture of the Information Bus
- Clients may publish data objects under a specific 
 subject
- Clients may subscribe to one or more subjects to 
 receive data
- Note The bus broadcasts all published data to 
 all participating hosts
10A snapshot of the Information Bus
Subject Data ltobjectgt
Subject Data ltobjectgt
Subject Data ltobjectgt
Subject Data ltobjectgt
THE INFORMATION BUS
PUBLISHER
SUBSCRIBER
UNINTERESTED 
 11Properties of the Information Bus
- P1. Minimal core semantics 
- Recall the end to end argument  complexity at 
 a low level is usually either insufficientor
 overkill
- Two styles of communication 
- Remote method invocation 
- Publish/subscribe 
- Two kinds of objects 
- Data (things sent on the bus) 
- Services and Clients (things that use the bus)
12Properties of the Information Bus (cont.)
- P2. Self-describing objects 
- We might call this introspection today 
- Given an object, we can ask at run-time for 
- object type, 
- property types and values, 
- method signatures, etc. 
- All participants and data play by these rules 
- Effect loose coupling and run-time discovery
13Properties of the Information Bus (cont.)
- P3. Dynamic classing 
- A fancy way of expressing the ability of the 
 system implementation to be changed at run-time
- Without interruption of the system 
- New classes can be defined 
- New code can be introduced 
- This is clearly necessary for evolvability
14Properties of the Information Bus (cont.)
- P4. Anonymous communication 
- The hallmark of publish-and-subscribe 
- Data objects are sent and received based on 
 content alone
- Details of the participants are irrelevant 
- In this system, the content which controls 
 subscription is a subject string
- No other part of the data is involved in 
 delivering the object to subscribers
- Subjects typically organized with hierarchy (cf. 
 Usenet groups rice.owlnews.comp520)
15Other features of the Information Bus
- What else is going on in the bus? 
- Object discovery 
- Point-to-point remote method invocation 
- Legacy data conversion
16Discovery protocol
- Discovering participants in a given subject 
- A, B, D all subscribed to Little Green Apples
A
B
C
D
Subject apples.little.greenData Whos there?
Subject apples.little.greenData Im here, my 
name is B
Subject apples.little.greenData Im here, my 
name is D
THE INFORMATION BUS 
 17RMI brokering
- Finding a participant to invoke methods 
- Like the discovery protocol
A
B
C
D
1
2
3
4
Subject apples.little.greenData I want to make 
a method call.
Subject apples.little.greenData Sure, my 
address is 2
Subject apples.little.greenData Sure, my 
address is 4
THE INFORMATION BUS 
 18Adapters
Adapters convert data from legacy systems to 
pub/sub messages
Subject Data ltobjectgt
THE INFORMATION BUS
Other clients dont know that theres a legacy 
system involved 
 19Dynamic System Evolution
- New clients can be brought on-line at any time 
- Subscribe to current subjects 
- Publish objects of conventional type 
- Publish objects of novel type and implementation 
- Create new subjects for subscription 
- Existing subscriptions unaffected
20Problems solved by the Information Bus
- System is available, evolvable 
- Maintenance may be performed on-line 
- New services and clients can be rolled out 
 incrementally, without downtime
- Is subject-based subscription a limitation? 
- Simple subject easier to test than arbitrary 
 tuple signatures
- Lets look closer at this matching problem
21Matching Events in a Content-based Subscription 
System
- Scenario The content-based pub/sub system 
- Like the Information Bus subscriptions based on 
 content, rather than a membership list
- A participant has (potentially) many 
 subscriptions
- A participant receives (potentially) many 
 publications
22The Matching Problem
- Each participant must test each event to see 
 which subscriptions it matches
- Attribute-based subscription model 
- Each event may have multiple attributes, some or 
 all of which may be tested
- Example subscriptions 
- Fruitapple Sizelittle Colorgreen 
- Fruitapple Size Colorred 
- Fruit Sizelittle Color 
-   dont care (match anything) 
23The Matching Problem
- Trivially, this problem is linear in the number 
 of subscriptions
- By adding multiple attributes, its now linear in 
 the number of attributes too
- Can we do better than the naïve matching 
 implementation?
24The Exact Attribute Problem
- Consider a special case of this problem 
- Each attribute is to be matched exactly 
- (Alternatives substring match, lexicographic 
 comparison, etc.)
25General algorithm
- Pre-process all subscriptions into a matching 
 tree
- Like a decision tree of attribute tests 
- Goal If multiple subscriptions have the same 
 attribute requirements, only test that attribute
 once for all subscriptions
- Similar problem matching multiple strings in 
 text
- consider each char of each string an attribute
26Naïve Matching
- Subscriptions 
- SUB1 apples.little.green 
- SUB2 apples..yellow 
- SUB3 bananas.little.green 
- Algorithm 
- Search each subscription separately 
- For each event, 
- For each subscription, 
- For each attribute, 
- Test against event
Naïve algorithm
1
1
1
apples?
apples?
bananas?
2
2
2
little?
little?
3
3
3
green?
yellow?
green?
SUB1
SUB2
SUB3 
 27Matching Tree
- Subscriptions 
- SUB1 apples.little.green 
- SUB2 apples..yellow 
- SUB3 bananas.little.green 
- Algorithm 
- Search all subscriptions together 
- For each event, 
- Recursive tree search 
- For each attribute (node) 
- Test against event 
- Follow all matching edges 
- Leaf nodes  matches
Matching tree algorithm
1
apples?
bananas?
2
2
little?
little?
3
3
3
green?
yellow?
green?
SUB1
SUB2
SUB3 
 28Complexity of the matching tree
- Why is this better? 
- By inspection, the matching tree tends to have 
 fewer tests than the trivial implementation
- Fewer nodes, that is, assuming theres some 
 overlap in attribute values among your
 subscriptions
- Still linear in number of subscriptions, however 
29Complexity of the matching tree (cont.)
- Deeper insight 
- For the exact-matching problem, the number of 
 branches you can follow is at most 2
- i.e. some events attri  X you can only 
 follow X and
- It gets better, however 
- If there are no  subscriptions for attri, you 
 will follow 0 or 1 branches
- Intuition more like a traditional search tree
30Complexity of the matching tree (cont.)
- Time complexity shown to be O(N1-?) 
- (The expected complexity for random events) 
- ? related to number of non- edges in the matched 
 path can be as high as ½
- Intuition the more exact tests there are, the 
 fewer branches you will follow
- Other complexity characteristics 
- Space complexity linear 
- Pre-computation linear
31Complexity of the matching tree (cont.)
- Simulation with random data 
complexity
 of subscriptions 
 32Optimizations
- Collapse multiple dont care edges into a 
 single edge
- Rationale Many subscriptions dont care about 
 most attributes of data (60 speedup in
 simulation)
- Pre-compute successor nodes 
- Short-circuit parts of the matching tree in 
 special situations
33Successor Node Optimization
- Subscriptions 
- SUB1 .little.green 
- SUB2 ..yellow 
- SUB3 bananas.little.green 
- Whats going on? 
- Annotate nodes with links to other nodes you know 
 will also match at that point
- Example if we match bananas.little, we know 
 .little and . will also match for sure
Matching tree algorithm
1
bananas?
2
2
little?
little?
3
3
3
green?
yellow?
green?
SUB1
SUB2
SUB3 
 34Summary and Discussion
- Publish/subscribe participants connected only by 
 exchanged data
- Flexible, loose connections  an evolvable system 
- No Linda-like storage 
- (but you could implement a storage service in a 
 pub/sub system)
- So what about the matching problem? 
- It only exists in broadcast pub/sub 
- Each participant sees each event 
- Question Is this realistic? 
- Trend multicast instead of broadcast 
- Subscription lists  more administration, but 
 potentially better publication performance
- P2P?