Title: Topology-Aware Overlay Networks for Group Communication
1Topology-Aware Overlay Networksfor Group
Communication
Minseok Kwon and Sonia Fahmy Department of
Computer Sciences Purdue University kwonm,
fahmy_at_cs.purdue.edu http//www.cs.purdue.edu/fah
my/
2Overlay Multicast
Switzerland
Russia
Austria
Germany
USA
Japan
Korea
3End System Multicast
- Proposed by Y. Chu, S. Rao, S. Seshan and H.
Zhang (CMU) in 2000. - One of the early application-level multicast
protocols - Easy to deploy
- Self-organizing
- Target application small sparse groups
(audio/video conferencing groups)
4Challenges
- Efficiency
- How to reduce delay penalty (relative to unicast)
and reduce number of duplicate packets (link
stress)? - Scalability
- How to reduce the amount of data maintained by
each group member, and reduce routing information
exchange?
5Our Primary Focus
- Can we exploit underlying network topology data
which routing protocols anyway establish
(together with measurements) for constructing
efficient low-overhead overlays for application
level multicast?
6Benefiting from overlapping shortest paths
Source
Shortest path for the relay node
Shortest path for the new member
Other group member (now serving as relay Node)
- Minimize the delay penalty and the number of
duplicate packets.
New member
7TAG (Topology Aware Grouping)
- Mostly single source or core-based multicast
- Overlay tree is organized based on path overlap
information - Delay (primary) and bandwidth (secondary) are
considered as metrics - Target application latency-based applications
(limited bandwidth streaming applications,
multi-player online game)
8TAG Features
- Reduces delay penalty
- Reduces the number of duplicate packets
- Low space complexity a small amount of
information (IP addresses and paths of only
parent and children) is maintained at each group
member - Low average time complexity member join and
leave require O((log n)2) and O(log n),
respectively, where n is the total number of
group members
9Difficulties with TAG
- When to re-organize?
- Overhead at (or near) the sender/core if many
members join/re-join at the same time - Bandwidth is only considered as a secondary
metric (as a loose constraint, and to break ties
among equal delay paths)
10TAG Definitions
- Path from A to B
- A sequence of routers comprising the shortest
path from A to B - Spath of A
- Path from the sender/core to A
- Length of path P
- The number of routers in the path P
- A ? B
- if spath of A is a prefix of spath of B
11TAG Member Join
Root
Path Matching
Member1
New Member
Member2
- A new member finds the parent and children by
recursively applying the path matching algorithm
12TAG Complete Path Matching
New member
D8 (R1,R2,R4,R5)
D8 (R1,R2)
D8 (R1,R5)
Family Table
D8 ? D4, D8 ? D7
New member subscribes here
D4 and D7 are the children of new member
D8. New member subscribes here
D4 ? D8
D4 is the best candidate to Proceed with
13TAG Member Join Example
Source
S
R0
R1
R2
R3
R4
14TAG Member Leave Example
Source
S
R0
R5
R1
R2
R3
D4 is leaving.
R4
15TAG Partial Path Matching
- Complete path matching does not consider
available bandwidth - Minus-k (or partial) path matching
- Node A can be the parent of node B if A has a
common spath prefix of length (spath of A) k
with B - Example
S
R0
D1 (R0,R1,R3)
R1
R3
D1
R2
D4 (R0,R1,R2,R4)
R4
k1
D4
16TAG Partial Path Matching
- Mitigates possibly high link stress and limited
bandwidth near a constrained node - When is partial path matching activated?
- Partial path matching is activated when the
available bandwidth lt bwthresh (loose constraint) - With partial matching, a new member examines
several delay-based paths and selects the path
which maximizes bandwidth (tie breaker) - k may be dynamically increased depending on the
available bottleneck bandwidth and other
constraints - Last hop(s) delay bounds, etc. can also be used
as (loose) constraints, in addition to available
bandwidth
17How do we obtain topology and bandwidth data?
- Topology
- Traceroute (experiments show that 10 of the
routers do not respond) - Topology server (e.g., OSPF topology server,
AS-level maps) - Comparing common subsequences can be used instead
of matching paths when complete information is
not easily available - Bandwidth/delay
- Bandwidth estimation tools (e.g., pathchar,
nettimer) - In-band measurements
18Ongoing work
- More intelligent path matching with multiple
tight or loose constraints and incomplete
topology data - Fault resilience
- Periodic probing of parent and children
- Adaptivity to changes
- An intermediate node probes paths to its children
- Path-based aggregation of destinations
- A change in spath affects members which overlap
with the spath
19Economies of Scale Factor
- Two important questions to answer about an
overlay multicast tree - How much bandwidth does TAG save compared to
unicast (1) ? - How much additional bandwidth does TAG consume
compared to IP multicast (0.8) ?
1. J. Chuang and M. Sirbu, Pricing Multicast
Communications A Cost-based Approach. Proc.
of Internet Society INET, 1998. 2. G. Phillips,
S. Shenker, and H. Tangmunarunkit, Scaling of
multicast trees Comments on the Chuang-Sirbu
scaling law., ACM SIGCOMM, 1999.
20A Simple Model
Primary Source
Router
End host
k
- An end-host can be attached to any router
- A router can have more than one end host attached
to it
k-ary tree
21TAG Model
- Case 1
- At least one host connected to A
- Case 2
- No host connected to A
B
B
A
C(k)
A
C(1)
C(2)
A single packet hop over the link B
22Economies of Scale Factor
- Can we develop a more realistic model? (e.g.,
unary nodes representing transit routers added to
the tree)
23Performance Evaluation
- Simulations
- Session-level simulations for TAG and ESM
- TAG
- Minus-k partial matching fixed
k1, loose bwthresh200 KB - ESM
- Degree bounds of a member in mesh lower bound
3, upper bound 6
24Performance Evaluation
- Topologies
- Transit-Stub model GT-ITM
- TS1 (492 nodes), TS2 (984 nodes), TS3 (1640
nodes) - Random symmetric link delays from 1 to 55 ms in
transits and 1 3 ms in stubs - 100 MB to 500 MB backbone bandwidth and 500 KB to
1 MB for the bandwidth of edge links - AS-level AS maps from NLANR, Inet
- AS97, Inet97 (3015 nodes)
- AS98, Inet98 (3878 nodes)
- AS99, Inet99 (4872 nodes)
25Performance Evaluation
- Performance Metrics
- Relative Delay Penalty (RDP) The relative delay
increase between two nodes in TAG against unicast
delay between the same two nodes - Link Stress (Total or maximum) Number of
duplicate copies of a packet over a physical link - Mean Available Bandwidth The mean available
end-to-end bandwidth between every two nodes
26Results Mean RDP
ESM performance significantly improves when
upper degree bound is increased to 12
27Results Total Stress
28Results Maximum Stress
Partial path matching helps reduce the stress
near highly constrained nodes.
29Results Mean Bandwidth
30Results ASMap and Inet
Config-uration Mean RDP Mean RDP Total Stress Total Stress Max Stress Max Stress Mean Bandwidth Mean Bandwidth
Config-uration TAG ESM TAG ESM TAG ESM TAG ESM
AS97 4.69 3.47 12162 13665 291 411 172 408
Inet97 4.87 6.24 12893 11103 404 310 167 371
AS98 2.67 3.03 16074 17607 347 352 166 448
Inet98 5.34 9.55 15436 15580 187 258 188 468
AS99 4.12 4.93 23774 24666 460 710 113 396
Inet99 4.40 9.56 20745 19590 379 313 161 468
31Related Work
- End System Multicast
- ScatterCast, Yoid, ALMI, Overcast, Bayeux,
SCRIBE, CAN-based multicast - Overlay networks
- RON, Detour
- Unicast-based multicast protocol
- REUNITE
- Theoretical studies
- Node degree constraints and diameter bounds in
overlay multicast networks
32Conclusions
- Network topology information is used to construct
an overlay multicast network low delay penalty
and a small number of duplicate packets - Delay (primary metric) and bandwidth (secondary)
are considered as metrics - Economies of scale factor is 0.94 for TAG
- Simulation results indicate the effectiveness of
TAG in building efficient trees for a large
number of group members, with appropriate
parameter values
33Ongoing and Future Work
- Two-tiered TAG
- Core receivers should meet given requirements
(latency or bandwidth) - More scalable
- More adaptive to dynamic changes
- Implementation and experiments