Title:
1- Large Scale
- Audio Distribution
- on the Internet
A technical perspective
by Kåre Synnes
2- Born 1969 in Sollefteå, Sweden
- Books, games, sports, food, film, music, company
- Engaged to Maggie
3Large Scale Audio Distribution on the Internet
- Techniques for Packet-Loss Repairof Audio
Streams - Layering of Audio Data
- Adaptive Audio Applications
4Large Scale Audio Distribution on the Internet
- Large Scale Many receivers
- Audio Prioritized temporal data
- Distribution One-to-Many
- Internet Best-effort (lossy)
5Issues at hand
- Distribution needs to be scalable for very large
groups - multicast RTP/UDP/IP - Best-effort IP transport results in
- delay (400ms acceptable)
- delay variation (buffering)
- loss (congestion, jitter, overload, delay
variation)
6IP Multicast
- Whats HOT!
- Minimum traffic load
- Scaleable...
- Effective protocols(RTP/UDP/IP)
- Cheap, no special network equipment needed (I.e.
MTUs)
- Whats NOT!
- By default turned off
- Complex distribution tree management
- No back-off for UDP at congestion
- Lossy
- Few applications
7Loss - a generalization
- Low loss
- Single packets are lost
- Loss are 'almost' evenly distributed
- Medium and High loss
- Packet are lost in twos or threes
- Losses are 'clustered'
- Also, given a large group
- Most receivers will have 2-5 loss
- A small number of receivers will have greater
loss - Each packet is assumed to be lost atleast once
8Techniques for Packet-Loss Repair of Audio Streams
- Sender Initiated Repairs
- Piggy-backed Redundancy
- Forward Error Correction
- Parallell Redundancy
- Receiver Initiated Repairs
- Semi-Reliable Transmissions
- Receiver-Only Repairs
- Silence Substitution
- Waveform Substitution
- White Noice
- Repetition
- (Predictive) Interpolation
9Silence Substitution
- Very simple to implement
- Adequate performance for
- small packets ( lt32ms )
- low loss ( lt1 )
- Not very good (clipping)
10White Noice
- Also, Very simple to implement
- Better than Silence Substitution
- Subconsious repairs
- Applies to noice but not silence
- Tolerance of 5-10 loss
11Self-similarity
- Speech waveforms often exhibit
a degree of
self-similarity. - Generation of a replacement packet with similar
spectral qualities is possible. - Clips shorter than 30 ms is
recommended (phonems).
12Repetition
- Again, Very simple to implement
- Significantly improves audio quality, at 5-15
loss - Bad effects if overdone (echo/reverberating)
- An amplitude gain shift is good
- Experience 50 decrease for at most 2
consecutive 40ms clips
13(Predictive) Interpolation
- Interpolation can be done in two ways
- Use two sorrounding clips (additional delay)
- Use two or more earlier clips (less accurate)
- Not so common due to complexity
- Gives better results than Repetition
14Interleaving
- Spread the effect of a packet over several
packets, thus smaller losses to repair - Phonems are 20 ms
- Additional delay
- No extra BW cost
- Uncertain of the results (intelligibility)
15Audio Formats
- There are several new codecs developed
- proprietary
- down to 1.2 kbps!
16Redundancy
- Synthetic low quality, low bit-rate encodings can
be used as redundant repairs. - LPC is considered to contain 60 of a speech
signal, while preserving the frequency spectra. - GSM is even better, but at the double bit-rate,
13 vs 4.8 kbps. - Multiple redundancy is also an option.
17Piggy-backed Redundancy
- High tolerance of loss (25-40).
- A singular redundancy using PCM (64 kbps) and GSM
(13 kbps) is common. - Degree of loss determines optimal delay.
- Non-redundancy capable receivers may be able to
skip the the redundant encoding(s).
18Forward Error Correction
- Redundancy is added with XOR methods
- 50 extra overhead in the example, but the
redundancy can be recoded - Other options possible as well, e.g.
- 1. a, f(a,b), b, f(b,c), c, ...
- 2. a, b, c, x(a,b,c), d, e, f, x(d,e,f), ...
- 3. a, b, c, x(a,c), d, x(b,d), e, x(c,e), ...
- 4. x(a,b), x(b,c), x(a,b,c), ...
- Better than simple redundancy, but more CPU
expensive
19Parallell Redundancy
- The idea is to use several channels.
- Division of bandwidth need
- Main transmission in one channel
- Redundancy over another cannel
- Can be applied to any scheme
- Receivers can decide how much redundancy, or even
which encoding they prefer - Additional overhead (headers)
20Semi-Reliable Transmissions
- A time-limited repair is achieved
- Protocols such as SRRTP can be used.
- This can be used for small groups on
networks with
low delay - Other redundancy schemes are preferable
- 1. The sender transmit a packet
- 2. A receiver send a NACK if it is lost
- 3. The sender retransmit the packet, if it is
still in the queue
21mAudio
22mAudio Recovery
int cnt 0 // Number
of consecutive lost packets byte read()
if
(received(n)) // main or redundant
packet decreaseBuffer() //
adaptive buffering cnt0 return
recode(n) increaseBuffer()
cnt
if (cnt 1) // Repeat with
50 amplitude return amplify(n-1, 0.5)
if
(cnt 2) // Repeat with 25
amplitude return amplify(n-2, 0.25)
if (cnt
lt 10) // Feed noice with
correct amplitude return noice(n-cnt)
return silence // Feed silence
Packet n is lost!
23Layered Encodings
- Allows the receivers to adapt to network
conditions - Main parts are sent over one channel
- Additional parts over other channels
- Example, 6 layers
- 50, 25, 12, 6, 4, 3
- Can be CPU expensive
- This is tricky for audio, simpler for video
24Simple Layering
8 kHz 8 kHz 16 kHz
Amplitude (db)
8,16,24,32 kHz
Time (ms)
32 kHz sampling
- Audio artifacts when only merged(frequency
overtones) - tin can sound
- reverberating
- Filtering needed
25Wavelet Encoding
Amplitude (db)
Speech
Frequency (Hz)
- Transform the data to the frequency domain, and
divide it there - Computational difficult (expensive)
- Longer delays due to buffering
- Very good division
26Adaptive Audio Applications
- How can we support heterogeneous environments?
- Network 56k modem, ISDN, xDSL, Ethernet
- Load congestion, hardware jitter, delay
variation - Client Mobile phone, PDA, NC, PC, Workstation
- Allow scaling of Quality
- NOT use a least common denominator!
- Senders should adapt slowly while receivers adapt
more rapidly, i.e. highly adaptive clients
27RTP/RTCPReceiver Reports
- The receivers report on
- Loss rate (long-term congestion)
- Delay-variation (short-term congestion)
- Throughput
- Additional (Load, Encoding etc)
- Can be used to change
- Encoding
- Redundancy
- Layering
- How do we do this for
- many receivers? Voting?
28Summary
- Receiver-only techniques are good for low loss
and small packets - Up to 40 loss rates can be repaired
intelligible, using redundancy schemes - There is a trade-off between delays and
buffering, which affects response-times - Much can be done to enhance audio quality
29Questions?
E-mail unicorn_at_cdt.luth.se URL
http//www.cdt.luth.se/unicorn/
30Future Work
- Use real network statistics to model loss, while
studying receiver report effects - Try different combinations of recovery, to
achieve optimal adaptation - Measure gain (intelligibility) vs. cost (net and
CPU load)