MP3 and AAC - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

MP3 and AAC

Description:

... audio format of Apple's iPhone, iPod, iTunes; Sony PlayStation 3; Nintendo Wii ... 24 -31 0 -4 -2 0 6 -12 0 0 0 -1 -1 0 0 0 2 -2 0 0 0 0 0 -1 EOB end-of-block ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 22
Provided by: tracd
Category:
Tags: aac | mp3 | playstation

less

Transcript and Presenter's Notes

Title: MP3 and AAC


1
MP3 and AAC
  • Trac D. Tran
  • ECE Department
  • The Johns Hopkins University
  • Baltimore MD 21218

2
MP3
  • MP3 MPEG2 Layer III audio coding
  • Transform cascade of 32-channel filter bank and
    6-channel or 18-channel MDCT
  • Quantization uniform scalar quantizer with a
    psycho-acoustic model
  • Entropy coding run-length Huffman

3
Transformation Stage in MP3
H (z)
6
0
H (z)
6
1
H (z)
6
6
xn
6-channel 12-tap MLT/MDCT
H (z)
32
0
H (z)
32
H (z)
0
32
1
H (z)
32
1
H (z)
32
31
H (z)
32-channel 512-tap CMFB
32
31
18-channel 36-tap MLT/MDCT
4
Masking
  • Masking discovered from psycho-acoustic
    experiments
  • Human auditory system is less sensitive around a
    strong tonal signal

5
Masking Original Signal
6
Masking Threshold
  • Signal components below the masking threshold are
    deemed insignificant (can be quantized to zero)
  • Components are computed from overlapping
    1024-long Hanning windows

7
Advanced Audio Coding (AAC)
  • Successor of MP3
  • Better audio quality than MP3 at most bit rates
  • Perceptually lossless at 320 kbps for 5-channel
    surround sound (64 kbps/channel)
  • Almost CD quality at 96 kbps (48 kbps/channel)
  • AAC is part of the MPEG4 Standard
  • Default audio format of Apples iPhone, iPod,
    iTunes Sony PlayStation 3 Nintendo Wii
  • MDCT Scalar Quantization Huffman Coding

8
Transformation Stage in AAC
xn
xn
H (z)
H (z)
1024
128
0
0
H (z)
H (z)
1024
128
1
1
H (z)
H (z)
1024
128
127
1023
128-channel 256-tap MDCT
1024-channel 2048-tap MDCT
for transient signals
for steady-state signals
  • AAC adaptively switches between
  • 8 blocks of 128-point MDCT with 256-point windows
  • 1 block of 1024-point MDCT with 2048-point window
  • All windows have 50 overlap

9
JPEG Still Image Coding Standard
  • Trac D. Tran
  • ECE Department
  • The Johns Hopkins University
  • Baltimore MD 21218

10
Overall Structure of JPEG
  • Color converter
  • RGB to YUV
  • Level offset
  • subtract 2(N-1). N bits / pixel.
  • Quantization
  • Different step size for different coefficients
  • DC
  • Predict from DC of previous block
  • AC
  • Zigzag scan to get 1-D data
  • Run-level joint coding of non-zero coeffs and
    number of zeros before

11
JPEG Quantization
  • Uniform mid-tread quantizer
  • Larger step sizes for chroma components
  • Different coefficients have different step sizes
  • Smaller steps for low frequency coefficients
    (more bits)
  • Larger steps for high frequency coefficients
    (less bits)
  • Human visual system is not sensitive to error in
    high frequency
  • Luma Quantization Table
  • Chroma Quantization Table

16 11 10 16 24 40 51 51 12 12 14 19 26 58
60 55 14 13 16 24 40 57 69 56 14 17 22 29
51 87 80 62 18 22 37 56 68 109 103 77 24 35
55 64 81 104 113 92 49 64 78 87 103 121 120
101 72 92 95 98 112 100 103 99
17 18 24 47 99 99 99 99 18 21 26 66 99 99 99
99 24 26 56 99 99 99 99 99 47 66 99 99 99 99 99
99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99
99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99
  • Actual step size Scale the basic table by a
    quality factor

12
Scaling of Quantization Table
  • Actual Q table scaling x Basic Q table
  • quality factor 50 scaling 50/quality
  • quality factor gt 50 scaling 2 - quality/50

16 11 10 16 24 40 51 51 12 12 14 19 26 58
60 55 14 13 16 24 40 57 69 56 14 17 22 29
51 87 80 62 18 22 37 56 68 109 103 77 24 35
55 64 81 104 113 92 49 64 78 87 103 121 120
101 72 92 95 98 112 100 103 99
Quality Factor Scaling -----------------------
----------- 10 5.0 20 2.5
50 1.0 75 0.5
13
DC Prediction
  • DC Coefficients average of a block
  • DC of neighboring blocks are still similar to
    each others redundancy
  • The redundancy can be removed by differential
    coding
  • e(n) DC(n) DC(n-1)
  • Only encode the prediction error e(n)

DC coeffs of Lena
14
Coefficient Category
  • Divide coefficients into categories of
    exponentially increased sizes
  • Use Huffman code to encode category ID
  • Use fixed length code within each category
  • Similar to Exponential Golomb code

15
Coding of DC Coefficients
  • Encode e(n) DC(n) DC(n-1)

Our example DC 8. Assume last DC 5 ? e
8 5 3. Cat. 2, index 3 ? Bitstream 10011
16
Coding of AC Coefficients
  • Most non-zero coefficients are in the upper-left
    corner
  • Zigzag scanning
  • Example

8 24 -2 0 0 0 0 0
-31 -4 6 -1 0 0 0 0
0 -12 -1 2 0 0 0 0
0 0 -2 -1 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0
  • Zigzag scanning result (DC is coded separately)
  • 24 -31 0 -4 -2 0 6 -12 0 0 0 -1 -1 0 0 0 2 -2 0
    0 0 0 0 -1 EOB ltend-of-blockgt

17
A Complete Example
  • Original data 2-D DCT

39.8 6.5 -2.2 1.2 -0.3 -1.0 0.7
1.1 -102.4 4.5 2.2 1.1 0.3 -0.6
-1.0 -0.4 37.7 1.3 1.7 0.2 -1.5
-2.2 -0.1 0.2 -5.6 2.2 -1.3 -0.8
1.4 0.2 -0.1 0.1 -3.3 -0.7 -1.7
0.7 -0.6 -2.6 -1.3 0.7 5.9 -0.1
-0.4 -0.7 1.9 -0.2 1.4 0.0 3.9
5.5 2.3 -0.5 -0.1 -0.8 -0.5 -0.1
-3.4 0.5 -1.0 0.8 0.9 0.0 0.3
0.0
124 125 122 120 122 119 117 118 121 121
120 119 119 120 120 118 126 124 123 122 121
121 120 120 124 124 125 125 126 125 124 124
127 127 128 129 130 128 127 125 143 142 143
142 140 139 139 139 150 148 152 152 152 152
150 151 156 159 158 155 158 158 157 156
  • Quantized by basic table

Q table 16 11 12 14
floor(39.8/16 0.5) 2 floor(6.5/11 0.5)
1 -floor(102.4/12 0.5) -9 floor(37.7/14
0.5) 3
2 1 0 0 0 0 0 0
-9 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
  • Zigzag scanning
  • 2 1 -9 3 EOB

18
A Complete Example
  • Zigzag scanning
  • 2 1 -9 3 EOB

19
Progressive JPEG
  • Baseline JPEG encodes the image block by block
  • Decoder has to wait till the end to decode and
    display the entire image
  • Progressive Coding DCT coefficients in multiple
    scans
  • The first scan generates a low-quality version of
    the entire image
  • Subsequent scans refine the entire image
    gradually.
  • Two procedures defined in JPEG
  • Spectral selection
  • Divide all DCT coefficients into several bands
    (low, middle, high frequency subbands)
  • Bands are coded into separate scans
  • Successive approximation
  • Send MSB of all coefficients first
  • Send lower significant bits in subsequent scans

20
JPEG Coding Result for Lena
21
Summary
  • Transformation
  • Karhunen-Loeve Transform (KLT) optimal linear
    transform
  • Discrete Cosine Transform (DCT) for images
    video
  • MDCT overlapped higher frequency resolution for
    audio
  • Discrete Wavelet Transform (DWT)
    multi-resolution representation
  • MP3 AAC
  • Audio coding FB/MDCT Quantization Huffman
  • JPEG first international compression standard
    for still images
  • DCT Quantization Run-length Huffman
  • JPEG2000 latest technology, wavelet-based
  • Scalable, progressive coding with flexible
    intelligent functionalities
Write a Comment
User Comments (0)
About PowerShow.com