Title: Digital Video Processing
1Digital Video Processing
Digital Image Processing Fall 2008
Prof. Dmitry Goldgof
- Vasant Manohar
- Computer Science and Engineering
- University of South Florida
- http//www.csee.usf.edu/vmanohar
- ?vmanohar_at_cse.usf.edu
2Outline
- Basics of Video
- Digital Video
- MPEG
- Summary
3Basics of Video
- Static scene capture ? Image
- Bring in motion ? Video
- Image sequence A 3-D signal
- 2 spatial dimensions 1 time dimension
- Continuous I (x, y, t) ? discrete I (m, n, tk)
4Video Camera
- Frame-by-frame capturing
- CCD sensors (Charge-Coupled Devices)
- 2-D array of solid-state sensors
- Each sensor corresponds to a pixel
- Stored in a buffer and sequentially read out
- Widely used
5Video Display
- CRT (Cathode Ray Tube)
- Large dynamic range
- Bulky for large display
- CRT physical depth has to be proportional to
screen width - LCD Flat-panel display
- Use electrical field to change the optical
properties, thereby the brightness/color of
liquid crystal - Generating the electrical field
- By an array of transistors active-matrix
thin-film transistors
Active-matrix TFT display has a transistor
located at each pixel, allowing display to be
switched more frequently and with less current to
control pixel luminance. Passive matrix LCD has a
grid of conductors with pixels located at the
grid intersections
6Composite vs. Component Video
- Component video
- Three separate signals for tri-stimulus color
representation or luminance-chrominance
representation - Pro higher quality
- Con need high bandwidth and synchronization
- Composite video
- Multiplex into a single signal
- Historical reason for transmitting color TV
through monochrome channel - Pro save bandwidth
- Con cross talk
- S-video
- Luminance signal single multiplexed chrominance
signal
7Progressive vs. Interlaced Videos
- Progressive
- Every pixel on the screen is refreshed in order
(monitors) or simultaneously (films) - Interlaced
- Refreshed twice every frame the little gun at
the back of your CRT shoots all the correct
phosphors on the even numbered rows of pixels
first and then odd numbered rows - NTSC frame-rate of 29.97 means the screen is
redrawn 59.94 times a second - In other words, 59.94 half-frames per second or
59.94 fields per second
8Progressive vs. Interlaced Videos
- How interlaced video could cause problems
- Suppose you resize a 720 x 480 interlaced video
to 576 x 384 (20 reduction) - How does resizing work?
- takes a sample of the pixels from the original
source and blends them together to create the new
pixels - In case of interlaced video, you might end of
blending scan lines of two completely different
images!
9Progressive vs. Interlaced Videos
Observe distinct scan lines
Image in full 720 x 480 resolution
10Progressive vs. Interlaced Videos
Image after being resized to 576x384
Some scan lines blended together!
11Aspect Ratio
- When you view pure NTSC video on your monitor,
people look a little fatter than normal ? - TV video stored in 32 aspect ratio, while
monitors store picture data in 43 aspect ratio - A lot of capture cards crop off the 16 pixels in
the horizontal edges and capture in 704 x 480 or
352 x 480 - Aspect ratios in movies
- 53 ? mostly used in animation movies
- 169 ? academy ratio
- 219 ? cinescope
12Aspect Ratio
- Converting widescreen pictures to 43 TV format
- letterbox format (black bars above and below the
picture) - losing parts of the picture
- If we convert a 219 picture, we might lose a
large part of the picture (blue 169, red 43)
13 14Why Digital?
- Exactness
- Exact reproduction without degradation
- Accurate duplication of processing result
- Convenient powerful computer-aided processing
- Can perform rather sophisticated processing
through hardware or software - Easy storage and transmission
- 1 DVD can store a three-hour movie !!!
- Transmission of high quality video through
network in reasonable time
15Digital Video Coding
- The basic idea is to remove redundancy in video
and encode it - Perceptual redundancy
- The Human Visual System is less sensitive to
color and high frequencies - Spatial redundancy
- Pixels in a neighborhood have close luminance
levels - Low frequency
- How about temporal redundancy?
- Differences between subsequent frames are very
less. Shouldnt we exploit this?
16Hybrid Video Coding
- Hybrid combination of Spatial, Perceptual,
Temporal redundancy removal - Issues to be handled
- Not all regions are easily inferable from
previous frame - Occlusion solved by backward prediction using
future frames as reference - The decision of whether to use prediction or not
is made adaptively - Drifting and error propagation
- Solved by encoding reference regions or frames at
constant intervals of time - Random access
- Solved by encoding frame without prediction at
constant intervals of time - Bit allocation
- according to statistics
- constant and variable bit-rate requirement
MPEG combines all of these features !!!
17MPEG
- MPEG Moving Pictures Experts Group
- Coding of moving pictures and associated audio
- Picture part
- Can achieve compression ratio of about 501
through storing only the difference between
successive frames - Even higher compression ratios possible
- Audio part
- Compression of audio data at ratios ranging from
51 to 101 - MP3 MPEG-1 audio Layer-3
18MPEG Generations
19Bit Rate
- Defined in two ways
- bits per second (all inter-frame compression
algorithms) - bits per frame (most intra-frame compression
algorithms except DV and MJPEG) - What does this mean?
- If you encode something in MPEG, specify it to be
1.5 Mbps it doesnt matter what the frame-rate
is, it takes the same amount of space ? lower
frame-rate will look sharper but less smooth - If you do the same with a codec like Huffyuv or
Intel Indeo, you will get the same image quality
through all of them, but the smoothness and file
sizes will change as frame-rate changes
20Data Hierarchy
- Sequence entire video sequence
- Group of Pictures basic unit allowing for random
access - Picture primary coding unit with three color
components and different picture formats
progressive or interlaced scanning modes - Slice or Group of Blocks basic unit for
resynchronization refresh and error recovery
(skipped if erroneous) - Macro-block motion compensation unit
- Block transform and compression unit
21MPEG-1 Compression Aspects
- Lossless and Lossy compression are both used for
a high compression rate - Down-sampled chrominance
- Perceptual redundancy
- Intra-frame compression
- Spatial redundancy
- Correlation/compression within a frame
- Based on baseline JPEG compression standard
- Inter-frame compression
- Temporal redundancy
- Correlation/compression between like frames
- Audio compression
- Three different layers (MP3)
22Perceptual Redundancy
- Here is an image represented with 8-bits per pixel
23Perceptual Redundancy
- The same image at 7-bits per pixel
24Perceptual Redundancy
25Perceptual Redundancy
26Perceptual Redundancy
27Perceptual Redundancy
- It is clear that we dont all these bits!
- Our previous example illustrated the eyes
sensitivity to luminance - We can build a perceptual model
- Give more importance to what is perceivable to
the Human Visual System - Usually this is a function of the spatial
frequency
28Video Coloring Scheme
- Translate the RGB system into a YUV system
- Human perception is less sensitive to chrominance
than to brightness - Translate brightness into chrominance and then
the resolution does not have to be as good ?
lower necessary bit-rate
Coloring Scheme
JPEG Coloring Blocks Luminance Cr Cb Cg
Normal Red Green Blue
Translation formulas Y WrR WbB WgG Cr
Wr (R - Y) Cb Wb (B - Y) Cg Wg (G - Y)
29Video Coloring Scheme
- Chrominance means the difference between one
color and a reference color of the same
brightness and chromaticity. - Block composed of six blocks (420 or 411
format) - Four blocks of yellow (luminance)
- One block of Cb (blue chrominance)
- One block of Cr (red chrominance)
- Down-sampled chrominance
- Y Cb Cr coordinate and four sub-sampling formats
Ref Y. Wang, J. Osterman, Y-Q Zhang Digital
Video Processing Communications, Prentice-Hall,
2001
30Intra-frame Compression
- Intra-frame Coding
- Reduces spatial redundancy to reduce necessary
transmission rate - Encoding I-blocks are practically identical to
JPEG standard - Makes use of the DCT transform along with zigzag
ordering - Lossy data compression
31Fundamentals of JPEG
Encoder
DCT
Quantizer
Entropy coder
Compressed image data
IDCT
Dequantizer
Entropy decoder
Decoder
32Fundamentals of JPEG
- JPEG works on 88 blocks
- Extract 88 block of pixels
- Convert to DCT domain
- Quantize each coefficient
- Different stepsize for each coefficient
- Based on sensitivity of human visual system
- Order coefficients in zig-zag order
- Similar frequencies are grouped together
- Run-length encode the quantized values and then
use Huffman coding on what is left
33Random Access and Inter-frame Compression
- Temporal Redundancy
- Only perform repeated encoding of the parts of a
picture frame that are rapidly changing - Do not repeatedly encode background elements and
still elements
- Random access capability
- Prediction that does not depend upon the user
accessing the first frame (skipping through movie
scenes, arbitrary point pick-up)
343-D Motion -gt 2-D Motion
3-D MV
2-D MV
35Sample (2D) Motion Field
Anchor Frame
Target Frame
Motion Field
362-D Motion Corresponding to Camera Motion
Camera zoom
Camera rotation around Z-axis (roll)
37General Considerationsfor Motion Estimation
- Two categories of approaches
- Feature based (more often used in object
tracking, 3D reconstruction from 2D) - Intensity based (based on constant intensity
assumption) (more often used for motion
compensated prediction, required in video coding,
frame interpolation) - Three important questions
- How to represent the motion field?
- What criteria to use to estimate motion
parameters? - How to search motion parameters?
38Motion Representation
Pixel-based One MV at each pixel, with some
smoothness constraint between adjacent MVs.
Global Entire motion field is represented by a
few global parameters
Block-based Entire frame is divided into blocks,
and motion in each block is characterized by a
few parameters. Also mesh-based (flow of
corners, approximated inside)
Region-based Entire frame is divided into
regions, each region corresponding to an object
or sub-object with consistent motion, represented
by a few parameters.
39Examples
target frame
anchor frame
Predicted target frame
Motion field
Half-pel Exhaustive Block Matching Algorithm
(EBMA)
40Examples
Predicted target frame
Three-level Hierarchical Block Matching Algorithm
41Examples
EBMA
mesh-based method
EBMA vs. Mesh-based Motion Estimation
42Motion Compensated Prediction
- Divide current frame, i, into disjoint 1616
macroblocks - Search a window in previous frame, i-1, for
closest match - Calculate the prediction error
- For each of the four 88 blocks in the
macroblock, perform DCT-based coding - Transmit motion vector entropy coded prediction
error (lossy coding)
43Decoding with non-random access
- Decoding and playing sub-frames located in
section G, all frames before section G must be
decoded as well - Synchronization algorithm issues
- If section G is far along in the movie, this
could take a considerable amount of time
44Decoding with random access
Introduce I frames, frames that are NOT
predictively encoded by design Frames that are
still encoded using a prediction algorithm are
called P frames
- When decoding any frame after an I frame (frame G
in this example) - we only have to decode past frames until we reach
an I-frame - saves time when skipping from frame to frame
- I-frames are not predictively encoded
- reduction in compression ratio
- Depending on the concentration of I frames, there
is a tradeoff - More I frames ? faster random access time
- Less I frames ? better compression ratio
45MPEG-1 Video Coding
- Most MPEG1 implementations use a large number of
I frames to ensure fast access - Somewhat low compression ratio by itself
- For predictive coding, P frames depend on only a
small number of past frames - Using less past frames reduces the propagation
error - To further enhance compression in an MPEG-1 file,
introduce a third frame called the B frame ?
bi-directional frame - B frames are encoded using predictive coding of
only two other frames a past frame and a future
frame - By looking at both the past and the future, helps
reduce prediction error due to rapid changes from
frame to frame (i.e. a fight scene or fast-action
scene)
46Predictive coding hierarchyI, P and B frames
- I frames (black) do not depend on any other frame
and are encoded separately - Called Anchor frame
- P frames (red) depend on the last P frame or I
frame (whichever is closer) - Also called Anchor frame
- B frames (blue) depend on two frames the closest
past P or I frame, and the closest future P or I
frame - B frames are NOT used to predict other B frames,
only P frames and I frames are used for
predicting other frames
47MPEG-1 Temporal Order of Compression
- I frames are generated and compressed first
- Have no frame dependence
- P frames are generated and compressed second
- Only depend upon the past I frame values
- B frames are generated and compressed last
- Depend on surrounding frames
- Forward prediction needed
48Adaptive Predictive Coding inMPEG-1
- Coding each block in P-frame
- Predictive block using previous I/P frame as
reference - Intra-block encode without prediction
- use this if prediction costs more bits than
non-prediction - good for occluded area
- can also avoid error propagation
- Coding each block in B-frame
- Intra-block encode without prediction
- Predictive block
- use previous I/P frame as reference (forward
prediction) - or use future I/P frame as reference (backward
prediction) - or use both for prediction
49Codec Adjustments
- For smoothing out bit rate
- A few applications prefer approx. constant bit
rate video stream (CBR) - e.g., prescribe number of bits per second
- very-short-term bit-rate variations can be
smoothed by a buffer - variations cannot be too large on longer term,
else buffer overflow - For reducing bit rate by exploiting Human Vision
System (HVS) temporal properties - Noise/distortion in a video frame would not be
very much visible when there is a sharp temporal
transition (scene change) - can compress a few frames right after scene
change with less bits - Changing the frame types
- I I I I I I lowest compression ratio (like
MJPEG) - I P P P I P P moderate compression ratio
- I B B P B B P B B I highest compression ratio
50MPEG Library
- The MPEG Library is a C library for decoding
MPEG-1 video streams and dithering them to a
variety of color schemes. - Most of the code in the library comes directly
from an old version of the Berkeley MPEG player
(mpeg_play) - The Library can be downloaded from
- http//starship.python.net/gward/mpeglib/mpeg_lib
-1.3.1.tar.gz - It works good on all modern Unix and Unix-like
platforms with an ANSI C compiler. I have tested
it on grad. - NOTE - This is not the best library available.
But it works good for MPEG-1 and it is fairly
easy to use. If you are inquisitive, you should
check MPEG Software Simulation Group at
http//www.mpeg.org/MPEG/MSSG/ where you can find
a free MPEG-2 video coder/decoder.
51MPEG Library
- Using the Library is very similar to the way in
which files are normally handled - Open an MPEG stream and initialize internal data
structures - Read frames until the stream is done in
- If need be, you can rewind the stream and start
over - When done, you close the stream and clean up
- NOTE You cannot randomly access the stream. This
is a limitation of the Library because of the
nature of the decoding engine on which the
Library is built. - NOTE Berkeley decoding engine profusely depends
on global variables and hence cannot decode more
than one MPEG at a time.
52MPEGe Library
- The MPEGe(ncoding) Library is designed to allow
you to create MPEG movies from your application - The library can be downloaded from the files
section of - http//groups.yahoo.com/group/mpegelib/
- The encoder library uses the Berkeley MPEG
encoder engine, which handles all the
complexities of MPEG streams - As was the case with the decoder, this library
can write only one MPEG movie at a time - The library works good with most of the common
image formats - To keep things simple, we will stick to PPM
53MPEGe Library Functions
- The library consists of 3 simple functions
- MPEGe_open for initializing the encoder.
- MPEGe_image called each time you want to add a
frame to the sequence. The format of the image
pointed to by image is that used by the SDSC
Image library - SDSC is a powerful library which will allow you
to read/write 32 different image types and also
contains functions to manipulate them. The source
code as well as pre-compiled binaries can be
downloaded at ftp//ftp.sdsc.edu/pub/sdsc/graphics
/ - MPEGe_close called to end the MPEG sequence. This
function will reset the library to a sane state
and create the MPEG end sequences and close the
output file
Note All functions return non NULL (i.e. TRUE)
on success and Zero (or FALSE) on failure.
54Usage Details
- You are not required to write code using the
libraries to decode and encode MPEG streams - Copy the binary executables from
- http//www.csee.usf.edu/vmanohar/DIP/readframes
- http//www.csee.usf.edu/vmanohar/DIP/encodeframes
- Usage
- To read frames from an MPEG movie (say test.mpg)
and store them in a directory extractframes
(relative to your current working directory) with
the prefix testframe (to the filename) - readframes test.mpg extractframes/testframe
- This will decode all the frames of test.mpg
into the directory extractframes with the
filenames testframe0.ppm, testframe1.ppm - To encode,
- encodeframes 0 60 extractframes/testframe
testresult.mpg - This will encode images testframe0.ppm to
testframe60.ppm from the directory extractframes
into testresult.mpg - In order to convert between PPM and PGM formats,
copy the script from - http//www.csee.usf.edu/vmanohar/DIP/batchconvert
- Usage
- To convert all the PPM files in the directory
extractframes to PGM - batchconvert extractframes ppm pgm
- To convert all the PGM files in the directory
extractframes to PPM - batchconvert extractframes pgm ppm
55MPEG-1 Summary
- 3-type frame structures
- I-frame intra-coded (without prediction)
- Like JPEG
- P-frame predictive-coded (with previous frame
as reference) - B-frame bidirectional predictive coded
- Group-of-Pictures (GOP)
- Frame order
- I1BBP1BBP2BBI2
- Coding order
- I1P1BBP2BBI2BB
- User selectable parameters
- Distance between I-frames in a GOP
- Distance between P-frames in a GOP
- Bit-rate (Constant/Variable, Bandwidth)