Video Compression and MPEG - PowerPoint PPT Presentation

1 / 94
About This Presentation
Title:

Video Compression and MPEG

Description:

Video Compression and MPEG B. Acharya Video Basics Image Video Basics - Scanning Scanning is a process of sampling of a continuously varying 2D signals. – PowerPoint PPT presentation

Number of Views:466
Avg rating:3.0/5.0
Slides: 95
Provided by: B206
Category:

less

Transcript and Presenter's Notes

Title: Video Compression and MPEG


1
Video Compression and MPEG
  • B. Acharya

2
Video Basics
3
Image
Video Cable
Video Monitor
4
Video Basics - Scanning
  • Scanning is a process of sampling of a
    continuously varying 2D signals.
  • Raster Scanning converts 2-D image intensity into
    1-D waveform.

5
Video Basics - The Scanning Raster
625 lines (PAL- Europe)
525 lines (NTSC)
Horizontal Blanking
Vertical Blanking
Active Video
6
Video Basics - The Progressive Raster
Scan lines viewed edge-on
y
Active Video
Note All scan lines are sampled at each time
instant.
Vertical Blanking
time
x
7
Video Basics - The Interlaced Raster
8
Video Basics Interlaced Raster Scan
  • IRS scans the pictures by sampling two fields
    at different times such that two consecutive
    lines of a frame belong to alternate fields.
  • This allows slow moving objects to be perceived
    at higher vertical details and fast moving
    objects at higher temporal rates.
  • It is used extensively in TV because of the band
    width considerations, flickers and resolutions.

9
Common Rasters for Video Coding
10
Interlacing
  • Background
  • In 1930s, interlaced scanning was developed as a
    bandwidth saving technique.
  • Persistence of vision causes two fields to fuse
    into single image, without flicker.
  • All broadcasting today uses interlaced scanning.
  • Advantages
  • High vertical detail retained for still portions
    of the scene.
  • Drawbacks
  • Reduced vertical detail for moving areas
  • Flicker at edges of objects (e.g., text), which
    is why computer industry uses progressive
    scanning for monitors.
  • More complicated signal processing for resizing,
    frame rate conversion, etc.

11
Human Vision Basics
  • Human Visual System (HVS) has limitations that
    can be exploited for video system design
  • limited response to black-and-white detail
  • even more limited response to color detail
  • image motion appears fluid at rates above 24 Hz
  • limited ability to track rapidly moving objects
  • insensitivity to noise
  • at object edges
  • in highly detailed areas of a scene
  • in bright areas of a scene
  • immediately after scene changes

12
Colorimetry Basics
  • In broadcast and studio applications, the
    gamma-corrected RGB taking primaries are
    transformed to YC1C2 transmission primaries.
  • Y is the luminance (luma) component C1 and C2
    are the chrominance (chroma, or color difference)
    components.
  • To exploit the HVS reduced spatial response to
    chroma, C1 and C2 are further bandlimited in
    spatial frequency compared to Y.
  • The exact transformation matrix is
    system-dependent.

13
Colorimetry Basics
  • In 8-bit implementations,
  • Y occupies 220 levels 16, 235
  • Cr and Cb occupy 225 levels 16, 240

14
Compression
  • Data Information Redundancy
  • I need a glass of water, which is scientifically
    called H2O
  • I need a glass of water
  • Compression Reduce Redundancy

15
Redundancy
  • Spatial
  • Similarity in pattern due to position
  • Temporal
  • Similarity in pattern over time
  • Statistical
  • Similarity due to pattern of occurrence

16
Image Compression Standards
  • Binary (Bi-level, BW) images
  • ITU-T Gr., Gr43 (Fax) (1980), JBIG (1994), JBIG2
  • Continuous Tone Still Images
  • Both Gray and Colour Image
  • JPEG (1992)
  • JPEG 2000
  • Moving Pictures
  • MPEG 1(1994), MPEG2 (1995)
  • MPEG 4 (96-03), MPEG 7, MPEG 21
  • H.261 (1990), H.263 (1995), H.264 (ongoing)

17
Image Compression -- Needs
  • Image (Signal) Processing
  • Decorrelation, Transformation
  • Reduce redundancy, compact representation
  • Quantization (Psychoanalysis)
  • Mask redundant data, loss of information
  • Reduce entropy
  • Entropy Encoding (Information Theory)
  • Encode data losslessly
  • Compact representation for compression
  • Variable-length (Run-length, Huffman, Arithmetic,
    etc.)

18
Entropy
  • E average amount of information contained per
    source sysmbol
  • -p(ak) x log2 p(ak)
  • Limit of compression
  • Example
  • Pre-processing can improve compression

19
Example (entropy)
  • Data 1 2 0 1 1 2 3 1 2 3 1 1 1 2 2 2
  • Symbols 0, 1, 2, 3
  • Probability 0.0625, 0.4375, 0.375, 0.125
  • E - ? pi log(p(ai))
  • -((-1.2) .0625 (-0.359)0.4375
  • (-0.426)0.375 (-0.903)0.125)
  • 0.505

20
Pre-processing (Entropy)
  • Pre-processing ak? ak ak-1,
  • where kgt 1, a0 0
  • Data 1 1 2 1 0 1 1 2 1 1 2 0 0 1 0 0
  • S 0, 1 2
  • P 5/16, 8/16, 3/16 .3125, 0.5, 0.1875
  • E 0.445

21
What do we want in video?
  • Real time (Live viewing)
  • Low delay (No jitter)
  • Good quality (Minimal loss of information)
  • Easy and useful interactivity
  • Play, pause, random access, fast forward
  • Something more? ? ?
  • Content based retrieval, Editable, Movie quality
    (high motion, spatial scalability)

22
Target area of DVT
  • Broadcasting
  • High bandwidth
  • Better quality
  • No delay
  • Internet (I/P Network)
  • Low Bandwidth
  • Restricted quality
  • Delay
  • Jitter
  • Loss of data
  • Quality degradation
  • Wireless
  • Low bandwidth
  • Small resolution
  • Future Technology
  • Interactive
  • Broadcasting
  • Advertisement
  • Games
  • Multimedia

23
Solutions
  • Decrease size of source
  • Compression
  • Retain quality
  • Eat the cake and have it too
  • Better Delivery
  • Handle delay
  • Conceal error
  • Post-processing

24
Video Compression
25
What is Video Compression?...Orange Juice
Analogy...
26
So? What to do?
  • Exploit limitations in Human Visual System
  • Limited color sensitivity (downscale CB and CR)
  • Limited sensitivity to edges (reduce high
    frequency)
  • Can attain 501 or more compression efficiency
  • Remove spatial and temporal redundancy that exist
    in natural video imagery
  • correlation itself can be removed in a lossless
    fashion
  • only realizes about 21 compression efficiency

27
Step 1 Pre-processing
  • Pre-processing
  • Color conversion
  • RGB ? YCBCR
  • Downsizing color components
  • 420, 422
  • ? Reduction in source size

28
Chroma Formats and Picture Sizes
29
Macroblock Structures
30
Step 2 Transformation
  • Transformation
  • Want to discard high frequency components
  • Little visual quality loss
  • Spatial domain to frequency domain
  • Discrete Fourier Transform, Discrete Cosine
    Transform

31
DFT
  • Any periodic function F(t), with period T, may be
    represented by an infinite series of the form.

32
Cosine Transform
  • Original image M x N
  • A(i,j) intensity at (i,j) location
  • B(k1, k2) DCT coefficients

33
DCT and IDCT Formulas
34
DCT
  • DCT is an orthogonal transformation
  • 2-D DCT is separable in x and y dimensions
  • Has good energy compaction properties
  • Efficient hardware realization
  • Theoretically lossless, but slightly lossy in
    practice due to round off errors

35
DCT (contd)
After DCT
DC
low horizontal high
low vertical high
8x8 Forward DCT
pixels
DCT coefficients
36
DCT Example
Flower Garden
Block of 8x8 Pixels
Their DCT Coefficients
DC
Flat Area
Vertical Edge
Horizontal Edge
Diagonal Line
Single Pixel
37
2-D DCT Basis Images
38
Advantage DCT
  • Separates the image into parts
  • Spectral sub-bands of differing importance (with
    respect to the image's visual quality).
  • All DCT multiplications are real
  • lowers the number of required multiplications
    compared to DFT
  • For most images, much of the signal energy lies
    at low frequencies

39
Step 3 Quantization
STEPS
  • Dividing DCT-coefficients by a number
  • Divisor is frequency-dependent value
  • Rounding or truncating to the nearest integer
  • Inverse quantization is like multiplication
  • Quantization coefficients can be tailored to
    noise sensitivity of Human Visual System
  • Quantization is LOSSY!
  • Quantization causes information to be
    irretrievably lost

40
Quantization - Example
41
Quantization Effect
42
Quantization Artifacts
43
Artifacts - Example



44
Step 4 Spatial Prediction
  • Neighbouring pixels have similarity
  • DCT coefficients of neighboring blocks have
    correlation
  • Consider Left, Top, Left-Top

T
L-T
L
  • Differential coefficients are smaller
  • Lesser bits required to encode
  • Encode the difference coefficients

Similar neighbors
45
Difference Image
?
  • Pixel wise difference

46
Step 5 Scanning Order
  • Its rearrangement
  • Most of the coefficients after quantization
    becomes zero
  • Zigzag Scan Order

1
0
0
0
0
DC
35
1
2
3
2
-1
0
0
0
0
0
1
0
-1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
35, 1, 3, 1, 2, 2, 1, -1, 0, 0, 0, 1, -1, 0, 0,
0, ., 0
47
DC Coefficients
  • DC is average luminance/chrominance
  • Largest of 64 block coefficients
  • Kept as high as possible
  • DC moves slowly between blocks
  • ? Differential encoding
  • Example DC values 12, 13, 11, 11, 10, .
  • Differences 12, 1, -2, 0, -1, .

48
Differential Encoding
  • Values are not sent as it is (bits)
  • Coded as (length, value) pair
  • Length number of bits used
  • Value actual bits used to represent the value
  • Example

Value Length Code
12 4 1100
1 1 1
-2 2 10
0 0
-1 1 0
49
AC Coefficients
  • Smaller values
  • Compared to DC values
  • Contain zeros, even after zigzag scanning
  • 35, 1, 3, 1, 2, 2, 1, -1, 0, 0, 0, 1, -1, 0, 0,
    0, ., 0
  • Skip the Zeros
  • Run Length Encoding

50
Run Length Encoding
  • Sequence (Run of zeros) encoded as pairs of (run,
    value)
  • Run number of zeros in the run
  • Value next non-zero value

Example Sequence 35, 1, 3, 1, 2, 2, 1, -1, 0,
0, 0, 1, -1, 0, , 0 RLE (0,1), (0,3), (0,1),
(0,2), (0,2), (0,-1), (3,1), (0,-1), (0,0)
?(0,0) indicates end of block data
51
Further Encoding
Oops! Which way?
  • Replace long binary strings by shorter strings
    (code words)
  • Length of code word depends on frequency of
    occurrence
  • Small code occurs frequently
  • Huffman Coding
  • Provides tables of sequence and codeword
  • Has prefix property

52
Huffman Coding
  • Build a binary tree from least frequent symbol
  • Assign 1 to right edge and 0 to left edge

Sequence AAAABBCD
1.0
1
0
0.5
0.5
Character Frequency
A 4/8 0.5
B 2/8 0.25
C 1/8 0.125
D 1/8 0.125
Code
1
01
001
000
A
1
0
0.25
0.25
B
0
1
0.125
0.125
C
D
53
Step 6 Encoding
  • Length field of differential encoded DC
    coefficients are Huffman coded
  • The prefix property helps decoder to determine
    code unambiguously
  • Length and Run fields of AC coefficients are
    grouped together and are Huffman coded
  • Also, has the default prefix property

54
Lets Recall
  • Sub-sample chrominance components

These steps give Intra-coded (I) frames
  • DCT of each 8 x 8 block
  • Quantize DCT coefficients
  • Scan each block in particular order
  • Code coefficients using Variable Length Coding

DCT
Q
Scan
VLC
55
Temporal Prediction
  • Similarity between consecutive frames
  • Most of the regions do not change
  • Small region changes due to motion
  • Use information of previous frame to predict
    present frame

56
Gray-Scale Statistics of Prediction Error
One Frame of Original Image Pair
Prediction Error
Histogram
Histogram
57
How Does Motion Compensated Prediction Save Bits?
F
Current Macroblock
X
MVF
Motion Vector
Current Picture
Previous Picture
  • Good prediction means small prediction error
  • Needs fewer bits to code
  • Send DCT coefficients of (X F) block
  • Motion vectors are differentially coded
  • Difference with motion vectors of neighbouring
    blocks

50 - 80 savings in bits
58
Prediction Direction (Forward)
Current
Previous
Forward
59
Prediction Direction (Backward)
Not a good match
Next
Current
Previous
60
Predictive Frames
  • Depends on direction that gives better prediction
  • P-frame (predictive)
  • B-frame (bi-directional predictive)

61
Motion Estimation motion vector
62
ME - MAD
  • MAD Mean Absolute Distortion
  • A search area is chosen for finding the MADs
  • Minimum MAD in the search area is chosen which
    essentially gives the closest macroblock.

63
Forward Motion Estimation... used in P and B
frames ...
64
Example Forward Motion EstimationCase Good
prediction for still objects.
Inter-coded means predictive-coded or not-coded
65
Example Forward Motion EstimationCase Dealing
with featureless regions.
Macroblock Grid
Search Area
Previous I or P Picture. Within the search area,
many good matches are found. Encoder must pick
one and send appropriate motion vector.
Current P Picture. Current MB is shown with heavy
outline. Since a match is found, this MB is
intercoded.
66
Example of Forward Motion EstimationCase Good
prediction for linearly translating objects.
Macroblock Grid
Search Area
Current P Picture. Current MB is shown with heavy
outline. Since a match is found, this MB is
intercoded.
Previous I or P Picture. Within the search area,
a good match is found for this moving object.
Encoder sends appropriate forward motion vector.
67
Example of Forward Motion EstimationCase A good
prediction may be missed because it is outside
the search area.
Macroblock Grid
Search Area
Current P Picture. Current MB is shown with heavy
outline. Since no match is found, this MB is
intracoded.
Previous I or P Picture. Within the search area,
no good match is found. Note that a good match
would be found with a larger search area. Search
area is an important encoder design parameter.
68
Example of Forward Motion EstimationCase A good
prediction may come from an unrelated object.
Macroblock Grid
Search Area
Current P Picture. Current MB is shown with heavy
outline. Since a match is found, this MB is
intercoded.
Previous I or P Picture. Within the search area,
a good match is found, but within a different
object. There is no requirement that
motion vectors represent true motion of objects.
69
Example of Forward Motion EstimationCase
Prediction Error should have low energy.
Macroblock Grid
Prediction Error Picture, with MB Type and Motion
Vectors Superimposed. (I Intra, P Inter)
Previous I or P Picture
Current P Picture
70
Group of Pictures (GOP)
  • Intra (I) pictures ? intraframe-only spatial DCT
  • Predicted (P) pictures ? DCT with forward
    prediction
  • Bi-directional (B) pictures ? DCT with
    bi-directional prediction

71
Anchor Pictures
  • I and P pictures
  • stored in two frame buffers in encoder and
    decoder
  • form the basis for prediction of P and B pictures

72
I Pictures
  • DCT coded without reference to any other pictures
  • stored in a frame buffer in encoder and decoder
  • used as basis of prediction for entire GOP

73
P Pictures
Forward Prediction
  • DCT coded with reference to the preceding anchor
    picture
  • stored in a frame buffer in encoder and decoder
  • use forward prediction only

74
B Pictures
  • DCT coded with reference to either the preceding
    anchor picture, the following anchor picture, or
    both
  • use forward, backward or bi-directional prediction

75
Forward Prediction
  • a forward-predicted macroblock depends on decoded
    pixels from the immediately preceding anchor
    picture
  • can be used to code macroblocks in P and B
    pictures

76
Backward Prediction
Time
  • a backward-predicted macroblock depends on
    decoded pixels from the immediately following
    anchor picture
  • can only be used to code macroblocks in B pictures

77
Bi-directional (Interpolated) Prediction
  • a bi-directionally-predicted macroblock depends
    on decoded pixels from the anchor pictures
    immediately following and immediately preceding
  • can only be used to code macroblocks in B pictures

78
Review Encoding Steps
Residual Image
-
DCT
Q
Scan
VLC

-
Q 1
Original Image
Predicted Image
Encoded Image
Motion Estimation
DCT -1

Motion Compensation
Reconstructed Image
Motion Vectors
79
Remember
  • Motion compensation uses decoded picture as
    reference image

WHY????
80
A Typical Motion Estimation Architecture
81
Few More Terms
  • Group of Pictures (GOP)
  • Slice
  • Field Coding
  • Skipped Macroblocks
  • Rate Control

82
Picture Orderings
Group of Pictures
  • Two Distinct Picture Orderings
  • Display Order (input to encoder, output of
    decoder)
  • Coding Order (output of encoder, input to
    decoder)
  • These are different if B frames are present
  • B frames must be reordered so that future
    anchor pictures are available for prediction.
    Note that reordering causes DELAY!

83
Slice Structures
  • A slice is a collection of macroblocks in raster
    scan order.
  • Restriction on slice sizes
  • MPEG-1 has none. Can be single MB or entire
    picture.
  • MPEG-2 restricts a slice to be contained within a
    row of macroblocks
  • MPEG-2 allows gaps between slices in General
    Slice Structure
  • MPEG-2 defines Restricted Slice Structure, in
    which no gaps are allowed. This is used in most
    Profiles and Levels.

84
MPEG-2 Field/Frame DCT Coding
  • Frame DCT Normal MPEG-1 mode of coding
  • Field DCT Split into top and bottom fields
  • MPEG-2 encoder may choose Field DCT on any
    macroblock.
  • Decoder must interpret coding flag correctly,
    or severe errors will occur.

85
Skipped Macroblocks
  • MBs cannot be skipped in I Pictures
  • MBs can be skipped in P and B pictures if
    certain rules apply

86
Rate Control
  • There may be delay between encoding and decoding
  • There should not be delay during displaying
  • Solutions
  • IntroduceBuffer
  • Rate control

87
Rate Control
  • A buffer is used to smooth out the bit rate
  • Rate controller adjusts quantizer
  • Overflow and underflow of decoders buffer
    (Video Buffer Verifier)
  • Buffer size affects image quality and overall
    delay
  • Rate control algorithm is crucial for high
    quality compression

88
MPEG Encoder Block
Video In
Rate Control
Video Out
subtractor
Q
DCT
Buffer
Prediction
VLC
Q-1
RLC
MUX
Motion Compensator
DCT-1
SUM
Prediction Picture
Motion Vectors
Motion Estimator
89
MPEG-2 Video Decoding Process
NOTE This is a simplified, high-level
functional diagram that integrates several
separate diagrams in the MPEG-2 Video Spec
(ISO/IEC 13818-2).
90
Video Buffer Verifier (VBV)
  • The VBV is a hypothetical input rate buffer for
    the video decoder
  • connected to the output of an encoder.
  • The encoder keeps track of the VBV fullness
  • must ensure that it does not overflow or
    underflow.
  • Assuming constant end-to-end delay, the encoder
    buffer is the mirror image of the VBV.

91
MPEG's VBV Water Tank Analogy(Normal Operation)
92
MPEG's VBV Water Tank Analogy(Overflow Condition)
93
MPEG's VBV Water Tank Analogy(Underflow
Condition)
94
VBV Buffer Size and VBV Delay
-T/2
95
CBR vs. VBR VBV Models
VBV Fullness
VBV Fullness
96
MUX- Video Bitstream
97
Sequence
  • For CD-ROM applications, sequences can be used to
    indicate relatively long clips (e.g. shots,
    scenes or entire movies)
  • For broadcast applications, sequence headers are
    usually sent frequently (e.g., every GOP) so that
    key bitstream info is obtained at channel changes

98
Major Application Areas
  • MPEG-1 Video
  • 1 - 3 Mbps CD-ROM Multimedia
  • Telecommunications and Near Video on Demand
  • MPEG-2 Video
  • 3 - 15 Mbps SDTV Broadcast (e.g., ATSC and DVB)
  • Digital Video Disk (DVD)
  • 15 - 20 Mbps HDTV Broadcast (e.g., ATSC)
  • 25 - 50 Mbps SDTV Production
  • 100 - 300 Mbps HDTV Production

99
Concluding Remarks
  • The MPEG video compression standard is the result
    of many years of competitive and, ultimately,
    collaborative effort among many commercial and
    academic laboratories
  • MPEG video compression can increase a
    broadcasters channel capacity by 8x or more
  • MPEG video compression is being used successfully
    in many application areas, such as
  • CD-ROM and DVD multimedia, Satellite Broadcast,
    Terrestrial Broadcast, Cable Broadcast, Telco
    Video-on-Demand Systems
Write a Comment
User Comments (0)
About PowerShow.com