Title: Informed Content Delivery Across Adaptive Overlay Networks
1Informed Content Delivery Across Adaptive Overlay
Networks
- Presented by Kelly Whitacre
- Written by John W. Byers, Jeffrey Considine,
Michael Mitzenmacher, Member, IEEE, and Stanislav
Rost
2Problem
- Distributing a large new file across the Internet
to millions of users simultaneously has proven to
be challenging
3Possible Solution Point-to-Point?
- Having individual point-to-point connections from
a single source wastes bandwidth - Server must handle load of possible many clients
- Bandwidth costs money
- Server should utilize available Bandwidth
- Transfer rates are limited by the characteristics
of the end-to-end paths
4Possible Solution IP Multicast?
- Solves bandwidth problems of point-to-point
- Server sends one copy
- Network handles the rest
- No flow control
- No retransmission of lost packets
- Limited deployment
5Reliable Multicast
- Digital fountain approach
- Erasure codessends parity information with
packets to recover lost (no feedback channels are
needed to ensure reliable delivery) - Recirculationinformation is re-circulated
(fountain) for asynchronous client arrivals - Parallel Transfer ratesheterogeneous client
transfer rates so as to not flood network
6Digital Fountain Approach
k
Source
Instantaneous
Encoding Stream
Transmission
Received
k
Instantaneous
Message
k
Can recover file from any set of k encoding
packets.
7Digital Fountain Approach
Transmission
File
User 1
User 2
8Cyclic Interleaving
Transmission
Encoded Blocks
Interleaved Encoding
Blocks
Encoding Copy 1
File
Encoding Copy 2
Tornado Encoding
9Solution Adaptive Overlay Networks
10Adaptive Overlay Networks
- Differs from IP Multicast
- Do not use Multicast tree
- Flexibly adapt to changing network conditions
- End systems are explicitly required to
collaborate! - Can improve performance by additional
cross-connections and active collaboration
11Addressing Limitations Content Delivery Scenario
Consider Initial Delivery Tree
S Source Shaded Area each node has a working
set of packets, the subset of packets it has
received
12Addressing Limitations Improving Transfer Rates
Harnessing the Power of Parallel Downloads
Tree
Directed Acyclic Graph
Establishing concurrent connections to multiple
servers or peers with complete copies of the file
13Addressing Limitations Improving Transfer Rates
Harnessing the Power of Collaborative Transfer
Establishing concurrent connections to multiple
peers
14Addressing Limitations Improving Transfer Rates
Power of Cross-Connections Collaboration
(d) depicts the portions of content which can be
beneficially exchanged via pair-wise transfers
15Considerations
- (a) (b) impede the full flow of content to
downstream receivers - Opportunistic connections of (c) (d) allow for
higher transfer rates - Yet, demand more careful orchestration between
end systems - Must determine set difference of working sets
- Reconciliation is simple in working sets limited
to small contiguous blocks - Limits flexibility of frequent changes that arise
in AON
16Content Delivery Across Adaptive Overlay Networks
- Challenges
- Stateful vs. Non-Stateful Solutions
17Adaptive Overlay Networks in a Fluid Internet
- Asynchrony
- Receivers may open and close connections or leave
and rejoin the infrastructure at arbitrary times - Heterogeneity
- Connections vary in speed and loss rates
- Transience
- Routers, links, and end systems may fail and
their performance may fluctuate over time - Scalability
- The service must scale to large receiver
populations and large content
- Adaptively detect and avoid congested or
temporarily unstable areas of the network - Dynamically establish paths with the most
desirable end-to-end characteristics - Deliver useful content, often in parallel with a
minimum of setup overhead and message complexity
18Limitations of Stateful Solutions
- A significant per-connection state
- Issues of connection
- Connections that vary in speed and loss rates
- Clients coming and going at arbitrary times
- Is highly unscalable
- May impact performance
- state must be maintained in the face of
reconfiguration and reconnection - With parallel downloading is problematic
19Alternative Encoded Content through Digital
Fountain Approach
- Digital Fountain Approach
- Resilience to packet losserasure-correcting code
- Guarantee
- Claims recover the original source file from
any subset of distinct symbols in the encoding
stream equal to the size of the original file - In practice recover a file from a few percent
more than the number of symbols in the original
file
20Encoded Content through Digital Fountain Approach
- Continuous Encoding
- Senders with a complete copy of a file may
continuously produce fresh encoding symbols - Time Invariance
- New encoding symbols are produced independently
from symbols produced in the past - Tolerance
- Digital fountain streams are useful to all
receivers regardless of the times of their
connections or disconnections and their rates of
sampling the stream - Additivity
- Parallel downloads from multiple servers with
complete copies of the content require no
orchestration
Stateless!
21Encoded Content through Digital Fountain Approach
- Encoding/Decoding Overhead
- Reconciliation methods are needed for those
collaborating end systems have only a portion of
the content
22Reconciliation and Informed Delivery
- Coarse-grained reconciliation
- Speculative transfers
- Fine-grained reconciliation
23Note
- Approaches proposed are local in scope and
typically involve a pair or a small number of end
systems - Goal is to provide the most cost-effective
reconciliation mechanisms measuring cost both in
computation and message complexity
24Coarse-Grained Reconciliation
- Estimate resemblance working sets of pairs of
nodes prior to establishing connections - Quick estimates of the fraction of symbols common
to the working sets of both peers - Approach 1 Employs Random Sampling
- Approach 2 Employs sketches of each peers
working set - High-level information
- Lightweight, computed efficiently
- Incrementally updated
- Fit into a single 1-kB packet
25Notation Framework
- Let peers A and B have working sets SA and SB
containing symbols from an encoding of the file - Containment
- The containment of B in A is the quantity
- Resemblance
- The resemblance of A and B is the quantity
26Notation Framework
- Each element of a working set is identified by an
integer key (sending an element entails sending
its key) - Keys are distributed over the key space uniformly
at random - With 64-bit keys, a 1-kB packet can hold roughly
128 keys - Can be the same
- If the elements are determined by a hash function
seeded by the key, two keys may generate the same
element with small probability - Minimal impact
27Random Sampling
- Select elements of the working set at random and
transport those to the peer.
28Random Sampling
- Unbiased estimate of containment
- Can be incrementally updated using reservoir
sampling
- Must search its own working set for each element
in random set - Do not easily allow one peer to check the
resemblance between prospective peers - A cannot check resemblance between B C
29Min-Wise Sketches
- Calculates working set resemblance based on
min-wise sketches
30Min-Wise Sketches
- ?i represents a random permutation on the key
universe - A sends B a vector of As minima (elements that
lie in both sets) - B Counts the number of positions where the two
are equal - Divides by the total number of permutations
The result is an unbiased estimate of the
resemblance
31Min-Wise Sketches
- Unbiased estimate of resemblance
- Allows similarity comparisons given any two
sketches for any two peers - A can check resemblance between B and C
- Truly random permutations cannot be used
- Storage requirements are impractical
- Possibility of false positives
- ?i values are hashed to fewer bits to allow for
more sketch elements in packet - (Details not discussed)
32Speculative Transfers
- Involve a sender performing educated guesses as
to which symbols to generate and transfer - Send symbols which are probably useful to the
other - This process can be fine-tuned using the results
of coarse-grained reconciliation
33Speculative Transfers
- When containment of B in A is low, speculative
transfers is trivial since most of Bs symbols
are useful to A - When containment of B in A is high, strategy is
inefficientuse recoding
34Recoding
- A recoding symbol is simply the bitwise XOR of a
set of encoding symbols - Must be accompanied by a specification of the
encoding symbols blended to create it - Must explicitly list the random seeds of the
encoding symbols from which it was produced
35Encoding/Decoding Recoding Symbols
- Similar to the substitution rule
- Examplepeers with y5, y8, y13 generate recoding
symbols - Z1 y13
- Z2 y5 XOR y8
- Z3 y5 XOR y13
- Peer receives Z1, Z2, Z3 can recover y13
- By substitution recover y5 y8
36Fine-grained Reconciliation
- Is a set-difference problem
- Tries to determine the exact difference of SA -
SB - Many approaches
- Polynomial-Based
- Enumeration-Based
- Bloom filter
- Search-Based
- Approximate Reconciliation Trees (ART) which
combine the compact representation of Bloom
filters with the speed of a search-based approach
37Bloom Filter
- A set of n elements that represent the working
set calculated by independent random hash
functions - Flow
- Peer A sends B a Bloom filter FA of SA
- Peer B then checks for each element of SB in FA
- Peer B has determined SA - SB
- This solution is effective particularly when the
number of differences is a large fraction of the
set size
38Experimental Results
- Demonstrate the benefits and costs of using
reconciliation in peer-to-peer transfers and in
parallel downloads
39Simulation Parameters
- All consider transfer of a 128-MB file
- Origin server
- Divides this file into input symbols of 1400
bytes each (fit it in an Ethernet packet with
headers) - Encodes this file into a large set of encoding
symbols - Associate each encoding symbol with a 64-bit
identifier representing the set of input symbols
used to produce it - Min-wise sketches used 180 permutations, yielding
180 entries of 64 bits each for a total of 1440
bytes per summary - Bloom filters used 6 hash functions and 8(1
0.0025)L bits for a total of 96 kB per filter
40Collaboration Methods
- Uninformed
- The sending peer picks a symbol to send at random
- Speculative
- The sending peer uses a min-wise sketch from the
receiving peer to estimate the containment - Reconciled
- The sending peer uses either a Bloom filter or an
ART from the receiving peer to filter out
duplicate symbols and sends a random permutation
of the differences.
41Scenarios and Evaluation
- Varying 3 experimental factors
- Set of connections in the overlay formed between
sources and peers - Distribution of content among collaborating peers
- Slack of the scenario (1.1 1.3)
- When smaller than (1 decoding overhead), the set
of peers will be unable to recover the file - When larger than (1decoding overhead), the set
of peers will most likely recover the file - Methods provide the most significant benefits
over naive methods when there is only a small
amount of slack
42Scenario 1 Two peers with Partial Content
- One peer sends symbols to the other
of Shared Encoding Symbols
- Uninformed collaboration performs poorly and
degrades significantly as the containment
increases - Speculative collaboration is more efficient, but
the overhead still increases slowly with
containment - Overhead of reconciliation is purely from the
cost of transmitting a Bloom filter or ART (less
than a )
43Scenario 2 Download from a Server with Complete
Content
- With concurrent transfer from a peer
of Shared Encoding Symbols
- Uninformed collaboration overhead is
considerably lower than in the scenario 1 (larger
fraction of the content is sent directly via
fresh symbols from the server) - Speculative collaboration performs similarly to
scenario 1 - Reconciled collaboration has overhead slightly
higher than receiving symbols directly from the
server
44Scenario 3 Parallel Download from Peers with
Partial Content
- Collaborating With Multiple Peers in Parallel
of Shared Encoding Symbols
- Can leverage bandwidth from peers with partial
content with only a slight increase in overhead - Uninformed collaboration performs extremely
poorly - Speculative collaboration dramatically improves
as containment increases - Reconciled collaboration has much higher
overhead than before
45Conclusions
- Adaptive overlay networks offer a powerful
alternative to traditional mechanisms for content
delivery - Flexibility, scalability, and deploy-ability.
- Informed and effective collaboration between end
systems can be achieved through the digital
fountain approach - Care is needed to provide methods for
representing and transmitting the content in a
manner that is as flexible and scalable as the
underlying capabilities of the delivery model
46Questions?
47Supplemental Reading and Resources
- A Digital Fountain Approach to Reliable
Distribution of Bulk Data http//www.ecse.rpi.edu/
Homepages/shivkuma/teaching/sp2001/readings/digita
l-fountain.pdf - ACM SIGCOM 98, A Digital Fountain Approach to
Reliable Distribution of Bulk Data
http//www.sigcomm.org/sigcomm98/tp/abs_05.html