Title: Multimedia: Representation, Compression and Transmission
 1Chapter 2
- Multimedia Representation, Compression and 
 Transmission
2Contents
- 2. Audio 
-  2.1 Human Perception 
-  2.2 Audio Bandwidth 
-  2.3 Digitization 
-  2.4 Audio Compression 
-  2.4.1 Differential PCM 
-  2.4.2 Adaptive Differential PCM 
-  2.4.3 MP3
32.1 Human Perception
- Audio speech, music or synthesized audio. 
- Audio signals are analog. 
- Audio Perception 
- Sound waves generate air pressure oscillations. 
- Stimulate human auditory system. 
- Transform to neural signals recognizable by the 
 brain.
42.1 Human Perception
- Features of human auditory system 
- 1. Frequency range Human can listen to audio 
 signals within the typical frequency range 20 --
 20,000 Hz.
 
- 2. Dynamic range It is the range of the softest 
 to the loudest audio amplitude that human can
 hear.
 
-  Different persons may have different frequency 
 and dynamic ranges.
52.2 Audio Bandwidth
- Period and Frequency 
- A periodic signal consists of a continuously 
 repeated waveform pattern. If its period is T,
 its frequency is
 
-  
-  Example The following signals are periodic with 
 period T and frequency
62.2 Audio Bandwidth 
 72.2 Audio Bandwidth
- Signal Characteristic 
-  A signal can be decomposed into many sinusoidal 
 signal components such that different components
 
- 1. have different frequencies and 
- 2. may have different amplitudes. 
-  (This decomposition can be done by mathematical 
 techniques called Fourier series and Fourier
 transform.)
82.2 Audio Bandwidth
Frequency of 1st component (1st harmonic)  f1  
1/T Frequency of 2nd component (2nd harmonic) Fr
equency of 3rd component (3rd harmonic) 
 3 f1  5 f1 
 92.2 Audio Bandwidth
-  Frequency Domain 
- After decomposing a signal into its components, 
 we can analyze the properties of this signal in
 the frequency domain.
 
-  Example 
- It is difficult to visualize the energy content 
 of a signal in the time domain, but it is easy to
 do so in the frequency domain.
102.2 Audio Bandwidth
- Bandwidth 
- Bandwidth is the range of component frequencies. 
 Example
 
-  
- A signal may have infinite number of components. 
- In this case, bandwidth is defined to be the 
 frequency range over which x (say, 99) of the
 energy of the signal lies.
112.2 Audio Bandwidth
- Effect of Limited Bandwidth 
- If a network does not have sufficient bandwidth 
 to send all the frequency components of a signal
 
- some frequency components are omitted 
- the signal is distorted. 
- If a network has a larger bandwidth to send more 
 frequency components of an audio signal
 
- the audio signal is relatively less distorted. 
12(No Transcript) 
 132.3 Digitization
- Digitization convert an analog audio signal to 
 digital form via sampling and quantization.
 
-  Sampling 
- Sample the magnitude of the audio signal at a 
 certain rate.
142.3 Digitization
Nyquist Theorem For a signal that has no 
frequency components higher than x Hz, its analog 
signal can be completely reproduced from its 
samples taken at the rate 2 of samples per 
second.
Illustration of Nyquist sampling rate 
 152.3 Digitization
Example Telephone systems transmit voice signal
 components with at most 4000 Hz. Sampling rate 
should be 8000 samples/sec. 
 162.3 Digitization
- Quantization 
- If N bits are used to represent a sample value, 
 there are 2N distinct quantization values.
 
- Each sample value is rounded to the nearest 
 quantization value, so there may be quantization
 error.
 
172.3 Digitization
If the first sample value is 24.1, it is 
quantized to 24 (0001 1000), so the quantization 
error is 0.1. 
 182.3 Digitization
- Pulse Code Modulation (PCM) 
- PCM perform sampling and quantization on audio 
 signals.
 
- PCM is used in 
- Digital telephone networks Use a sampling rate 
 of 8000 samples per second and 8 bits per sample,
 so the data rate is 64 kbps (adopted in ITU-T
 G.711).
- Audio CD Use a sampling rate of 44100 samples 
 per second and 16 bits per sample, so the data
 rate for stereo audio is 1.411 Mbps.
192.4 Audio Compression
- 2.4.1 Differential PCM 
- Differential PCM is a compressed version of PCM. 
 It has
 
-  lower bit rate but its voice quality may be 
 poorer.
 
- Differential PCM 
-  Voice signal changes slowly compared with the 
 sampling rate.
 
- Successive sample values have a small 
 difference.
 
- Use fewer bits to encode the difference between 
 the current sample value and the previous one.
 
- Lower bit rate, but voice quality may be degraded 
 when voice amplitude changes abruptly.
202.4 Audio Compression
- Example 
- For PCM in digital telephony, sampling rate is 
 8000 samples/sec and 8 bits are used for each
 sample. Data rate is 64 kbps.
 
- If differential PCM is adopted and 6 bits are 
 used to encode the difference between successive
 sample values, data rate is reduced to 48 kbps.
212.4 Audio Compression
2.4.2 Adaptive Differential PCM 
 Adaptive differential PCM is an improved version
 of differential PCM. Main idea When the voice
 amplitude changes steeply for a significant 
duration, change to use a larger quantization 
step (i.e., a larger difference between 
successive quantization values) 
 222.4 Audio Compression 
 232.4 Audio Compression
- ITU-T G.721 adopts adaptive differential PCM, a 
 sampling rate of 8000 samples per second, and 4
 bits for encoding the
 
- difference between successive sample values. 
- Bit rate is 32 kbps, but voice quality is only 
 slightly worse than that in PCM at 64 kbps.
242.4 Audio Compression
- 2.4.3 MP3 
- CD audio has a data rate of 1.411 Mbps. 
 Well-known compression method for CD audio MP3.
 
- MP3 MPEG audio layer 3. (MPEG specifies three 
 audio compression layers.)
 
- MP3 adopts perceptual coding to attain a high 
 compression ratio and provide very good audio
 quality.
252.4 Audio Compression
- Perceptual Coding 
- It is based on the science of psychoacoustics, 
 which studies how people perceive sound.
 
- It exploits certain flaws in the human auditory 
 system for compression, such that the compressed
 audio sounds about the same to human even though
 its signal waveform may become quite different.
262.4 Audio Compression
- 1st Flaw Threshold of Audibility 
- When a frequency component is very weak (i.e., 
 its power is below a threshold), human cannot
 hear it.
 
- Threshold of audibility (averaged over many 
 people)
Compression Omit the frequency components whose 
power falls below the threshold of audibility. 
 272.4 Audio Compression
- 2nd Flaw Frequency Masking 
- Some sounds can mask other sounds a loud sound 
 in one frequency band hides a softer sound in
 another frequency band.
 
- Masking effect
Compression Omit the masked frequency components. 
 282.4 Audio Compression
- 3rd Flaw Temporal Masking 
- When a masking sound ends, it takes a short time 
 before hearing the masked sound.
 
- Masking effect
Compression If the amplitudes of the masked 
frequency components are less than the decay 
envelope, omit these components. 
 292.4 Audio Compression
- To use MP3 for compression, we select two 
 options
 
- Sampling rate We can sample the waveform at 32 
 kHz, 44.1 kHz or 48 kHz on one or two channels.
 
- Bit rate Typically, we choose the bit rate to be 
 96 kbps, 128 kbps or 160 kbps.
302.4 Audio Compression
- Main Steps for Compression 
- Perform sampling on the audio signal. Divide the 
 samples into groups with 1152 samples per group.
 
- Each group is passed through (i) 32 digital 
 filters to get 32 frequency subbands, and (ii) a
 psychoacoustic model to determine the masked
 frequencies.
- Based on the available "bit budget" (depending on 
 the chosen bit rate), allocate more bits to the
 subbands with larger unmasked spectral power.
 
- Finally, use Huffman coding to encode the bits 
 (i.e., assign shorter codewords to numbers that
 appear frequently).