MP3 and AAC - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

MP3 and AAC

Description:

... audio format of Apple's iPhone, iPod, iTunes; Sony PlayStation 3; Nintendo Wii ... 24 -31 0 -4 -2 0 6 -12 0 0 0 -1 -1 0 0 0 2 -2 0 0 0 0 0 -1 EOB end-of-block ... – PowerPoint PPT presentation

Number of Views:67

Avg rating:3.0/5.0

Slides: 22

Provided by: tracd

Category:

more less

Transcript and Presenter's Notes

Title: MP3 and AAC

1
MP3 and AAC

Trac D. Tran
ECE Department
The Johns Hopkins University
Baltimore MD 21218

2
MP3

MP3 MPEG2 Layer III audio coding
Transform cascade of 32-channel filter bank and
6-channel or 18-channel MDCT
Quantization uniform scalar quantizer with a
psycho-acoustic model
Entropy coding run-length Huffman

3
Transformation Stage in MP3
H (z)
6
0
H (z)
6
1
H (z)
6
6
xn
6-channel 12-tap MLT/MDCT
H (z)
32
0
H (z)
32
H (z)
0
32
1
H (z)
32
1
H (z)
32
31
H (z)
32-channel 512-tap CMFB
32
31
18-channel 36-tap MLT/MDCT
4
Masking

Masking discovered from psycho-acoustic
experiments
Human auditory system is less sensitive around a
strong tonal signal

5
Masking Original Signal
6
Masking Threshold

Signal components below the masking threshold are
deemed insignificant (can be quantized to zero)
Components are computed from overlapping
1024-long Hanning windows

7
Advanced Audio Coding (AAC)

Successor of MP3
Better audio quality than MP3 at most bit rates
Perceptually lossless at 320 kbps for 5-channel
surround sound (64 kbps/channel)
Almost CD quality at 96 kbps (48 kbps/channel)
AAC is part of the MPEG4 Standard
Default audio format of Apples iPhone, iPod,
iTunes Sony PlayStation 3 Nintendo Wii
MDCT Scalar Quantization Huffman Coding

8
Transformation Stage in AAC
xn
xn
H (z)
H (z)
1024
128
0
0
H (z)
H (z)
1024
128
1
1
H (z)
H (z)
1024
128
127
1023
128-channel 256-tap MDCT
1024-channel 2048-tap MDCT
for transient signals
for steady-state signals

AAC adaptively switches between
8 blocks of 128-point MDCT with 256-point windows
1 block of 1024-point MDCT with 2048-point window
All windows have 50 overlap

9
JPEG Still Image Coding Standard

Trac D. Tran
ECE Department
The Johns Hopkins University
Baltimore MD 21218

10
Overall Structure of JPEG

Color converter
RGB to YUV
Level offset
subtract 2(N-1). N bits / pixel.
Quantization
Different step size for different coefficients
DC
Predict from DC of previous block
AC
Zigzag scan to get 1-D data
Run-level joint coding of non-zero coeffs and
number of zeros before

11
JPEG Quantization

Uniform mid-tread quantizer
Larger step sizes for chroma components
Different coefficients have different step sizes
Smaller steps for low frequency coefficients
(more bits)
Larger steps for high frequency coefficients
(less bits)
Human visual system is not sensitive to error in
high frequency

Luma Quantization Table

Chroma Quantization Table

16 11 10 16 24 40 51 51 12 12 14 19 26 58
60 55 14 13 16 24 40 57 69 56 14 17 22 29
51 87 80 62 18 22 37 56 68 109 103 77 24 35
55 64 81 104 113 92 49 64 78 87 103 121 120
101 72 92 95 98 112 100 103 99
17 18 24 47 99 99 99 99 18 21 26 66 99 99 99
99 24 26 56 99 99 99 99 99 47 66 99 99 99 99 99
99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99
99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99

Actual step size Scale the basic table by a
quality factor

12
Scaling of Quantization Table

Actual Q table scaling x Basic Q table
quality factor 50 scaling 50/quality
quality factor gt 50 scaling 2 - quality/50

16 11 10 16 24 40 51 51 12 12 14 19 26 58
60 55 14 13 16 24 40 57 69 56 14 17 22 29
51 87 80 62 18 22 37 56 68 109 103 77 24 35
55 64 81 104 113 92 49 64 78 87 103 121 120
101 72 92 95 98 112 100 103 99
Quality Factor Scaling -----------------------
----------- 10 5.0 20 2.5
50 1.0 75 0.5
13
DC Prediction

DC Coefficients average of a block
DC of neighboring blocks are still similar to
each others redundancy
The redundancy can be removed by differential
coding
e(n) DC(n) DC(n-1)
Only encode the prediction error e(n)

DC coeffs of Lena
14
Coefficient Category

Divide coefficients into categories of
exponentially increased sizes
Use Huffman code to encode category ID
Use fixed length code within each category
Similar to Exponential Golomb code

15
Coding of DC Coefficients

Encode e(n) DC(n) DC(n-1)

Our example DC 8. Assume last DC 5 ? e
8 5 3. Cat. 2, index 3 ? Bitstream 10011
16
Coding of AC Coefficients

Most non-zero coefficients are in the upper-left
corner
Zigzag scanning

Example

8 24 -2 0 0 0 0 0
-31 -4 6 -1 0 0 0 0
0 -12 -1 2 0 0 0 0
0 0 -2 -1 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0

Zigzag scanning result (DC is coded separately)
24 -31 0 -4 -2 0 6 -12 0 0 0 -1 -1 0 0 0 2 -2 0
0 0 0 0 -1 EOB ltend-of-blockgt

17
A Complete Example

Original data 2-D DCT

39.8 6.5 -2.2 1.2 -0.3 -1.0 0.7
1.1 -102.4 4.5 2.2 1.1 0.3 -0.6
-1.0 -0.4 37.7 1.3 1.7 0.2 -1.5
-2.2 -0.1 0.2 -5.6 2.2 -1.3 -0.8
1.4 0.2 -0.1 0.1 -3.3 -0.7 -1.7
0.7 -0.6 -2.6 -1.3 0.7 5.9 -0.1
-0.4 -0.7 1.9 -0.2 1.4 0.0 3.9
5.5 2.3 -0.5 -0.1 -0.8 -0.5 -0.1
-3.4 0.5 -1.0 0.8 0.9 0.0 0.3
0.0
124 125 122 120 122 119 117 118 121 121
120 119 119 120 120 118 126 124 123 122 121
121 120 120 124 124 125 125 126 125 124 124
127 127 128 129 130 128 127 125 143 142 143
142 140 139 139 139 150 148 152 152 152 152
150 151 156 159 158 155 158 158 157 156

Quantized by basic table

Q table 16 11 12 14
floor(39.8/16 0.5) 2 floor(6.5/11 0.5)
1 -floor(102.4/12 0.5) -9 floor(37.7/14
0.5) 3
2 1 0 0 0 0 0 0
-9 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0

Zigzag scanning
2 1 -9 3 EOB

18
A Complete Example

Zigzag scanning
2 1 -9 3 EOB

19
Progressive JPEG

Baseline JPEG encodes the image block by block
Decoder has to wait till the end to decode and
display the entire image
Progressive Coding DCT coefficients in multiple
scans
The first scan generates a low-quality version of
the entire image
Subsequent scans refine the entire image
gradually.
Two procedures defined in JPEG
Spectral selection
Divide all DCT coefficients into several bands
(low, middle, high frequency subbands)
Bands are coded into separate scans
Successive approximation
Send MSB of all coefficients first
Send lower significant bits in subsequent scans