Digital Video Processing - PowerPoint PPT Presentation

1 / 55

About This Presentation

Title:

Digital Video Processing

Description:

Digital Video Processing Vasant Manohar Computer Science and Engineering University of South Florida http://www.csee.usf.edu/~vmanohar vmanohar_at_cse.usf.edu – PowerPoint PPT presentation

Number of Views:456

Avg rating:3.0/5.0

Slides: 56

Provided by: vma89

Category:

more less

Transcript and Presenter's Notes

Title: Digital Video Processing

1
Digital Video Processing
Digital Image Processing Fall 2008
Prof. Dmitry Goldgof

Vasant Manohar
Computer Science and Engineering
University of South Florida

http//www.csee.usf.edu/vmanohar
?vmanohar_at_cse.usf.edu

2
Outline

Basics of Video
Digital Video
MPEG
Summary

3
Basics of Video

Static scene capture ? Image
Bring in motion ? Video
Image sequence A 3-D signal
2 spatial dimensions 1 time dimension
Continuous I (x, y, t) ? discrete I (m, n, tk)

4
Video Camera

Frame-by-frame capturing
CCD sensors (Charge-Coupled Devices)
2-D array of solid-state sensors
Each sensor corresponds to a pixel
Stored in a buffer and sequentially read out
Widely used

5
Video Display

CRT (Cathode Ray Tube)
Large dynamic range
Bulky for large display
CRT physical depth has to be proportional to
screen width
LCD Flat-panel display
Use electrical field to change the optical
properties, thereby the brightness/color of
liquid crystal
Generating the electrical field
By an array of transistors active-matrix
thin-film transistors

Active-matrix TFT display has a transistor
located at each pixel, allowing display to be
switched more frequently and with less current to
control pixel luminance. Passive matrix LCD has a
grid of conductors with pixels located at the
grid intersections
6
Composite vs. Component Video

Component video
Three separate signals for tri-stimulus color
representation or luminance-chrominance
representation
Pro higher quality
Con need high bandwidth and synchronization
Composite video
Multiplex into a single signal
Historical reason for transmitting color TV
through monochrome channel
Pro save bandwidth
Con cross talk
S-video
Luminance signal single multiplexed chrominance
signal

7
Progressive vs. Interlaced Videos

Progressive
Every pixel on the screen is refreshed in order
(monitors) or simultaneously (films)
Interlaced
Refreshed twice every frame the little gun at
the back of your CRT shoots all the correct
phosphors on the even numbered rows of pixels
first and then odd numbered rows
NTSC frame-rate of 29.97 means the screen is
redrawn 59.94 times a second
In other words, 59.94 half-frames per second or
59.94 fields per second

8
Progressive vs. Interlaced Videos

How interlaced video could cause problems
Suppose you resize a 720 x 480 interlaced video
to 576 x 384 (20 reduction)
How does resizing work?
takes a sample of the pixels from the original
source and blends them together to create the new
pixels
In case of interlaced video, you might end of
blending scan lines of two completely different
images!

9
Progressive vs. Interlaced Videos
Observe distinct scan lines
Image in full 720 x 480 resolution
10
Progressive vs. Interlaced Videos
Image after being resized to 576x384
Some scan lines blended together!
11
Aspect Ratio

When you view pure NTSC video on your monitor,
people look a little fatter than normal ?
TV video stored in 32 aspect ratio, while
monitors store picture data in 43 aspect ratio
A lot of capture cards crop off the 16 pixels in
the horizontal edges and capture in 704 x 480 or
352 x 480
Aspect ratios in movies
53 ? mostly used in animation movies
169 ? academy ratio
219 ? cinescope

12
Aspect Ratio

Converting widescreen pictures to 43 TV format
letterbox format (black bars above and below the
picture)
losing parts of the picture
If we convert a 219 picture, we might lose a
large part of the picture (blue 169, red 43)

DIGITAL VIDEO

14
Why Digital?

Exactness
Exact reproduction without degradation
Accurate duplication of processing result
Convenient powerful computer-aided processing
Can perform rather sophisticated processing
through hardware or software
Easy storage and transmission
1 DVD can store a three-hour movie !!!
Transmission of high quality video through
network in reasonable time

15
Digital Video Coding

The basic idea is to remove redundancy in video
and encode it
Perceptual redundancy
The Human Visual System is less sensitive to
color and high frequencies
Spatial redundancy
Pixels in a neighborhood have close luminance
levels
Low frequency
How about temporal redundancy?
Differences between subsequent frames are very
less. Shouldnt we exploit this?

16
Hybrid Video Coding

Hybrid combination of Spatial, Perceptual,
Temporal redundancy removal
Issues to be handled
Not all regions are easily inferable from
previous frame
Occlusion solved by backward prediction using
future frames as reference
The decision of whether to use prediction or not
is made adaptively
Drifting and error propagation
Solved by encoding reference regions or frames at
constant intervals of time
Random access
Solved by encoding frame without prediction at
constant intervals of time
Bit allocation
according to statistics
constant and variable bit-rate requirement

MPEG combines all of these features !!!
17
MPEG

MPEG Moving Pictures Experts Group
Coding of moving pictures and associated audio
Picture part
Can achieve compression ratio of about 501
through storing only the difference between
successive frames
Even higher compression ratios possible
Audio part
Compression of audio data at ratios ranging from
51 to 101
MP3 MPEG-1 audio Layer-3

18
MPEG Generations
19
Bit Rate

Defined in two ways
bits per second (all inter-frame compression
algorithms)
bits per frame (most intra-frame compression
algorithms except DV and MJPEG)
What does this mean?
If you encode something in MPEG, specify it to be
1.5 Mbps it doesnt matter what the frame-rate
is, it takes the same amount of space ? lower
frame-rate will look sharper but less smooth
If you do the same with a codec like Huffyuv or
Intel Indeo, you will get the same image quality
through all of them, but the smoothness and file
sizes will change as frame-rate changes

20
Data Hierarchy

Sequence entire video sequence
Group of Pictures basic unit allowing for random
access
Picture primary coding unit with three color
components and different picture formats
progressive or interlaced scanning modes
Slice or Group of Blocks basic unit for
resynchronization refresh and error recovery
(skipped if erroneous)
Macro-block motion compensation unit
Block transform and compression unit

21
MPEG-1 Compression Aspects

Lossless and Lossy compression are both used for
a high compression rate
Down-sampled chrominance
Perceptual redundancy
Intra-frame compression
Spatial redundancy
Correlation/compression within a frame
Based on baseline JPEG compression standard
Inter-frame compression
Temporal redundancy
Correlation/compression between like frames
Audio compression
Three different layers (MP3)

22
Perceptual Redundancy

Here is an image represented with 8-bits per pixel

23
Perceptual Redundancy

The same image at 7-bits per pixel

24
Perceptual Redundancy

At 6-bits per pixel

25
Perceptual Redundancy

At 5-bits per pixel

26
Perceptual Redundancy

At 4-bits per pixel

27
Perceptual Redundancy

It is clear that we dont all these bits!
Our previous example illustrated the eyes
sensitivity to luminance
We can build a perceptual model
Give more importance to what is perceivable to
the Human Visual System
Usually this is a function of the spatial
frequency

28
Video Coloring Scheme

Translate the RGB system into a YUV system
Human perception is less sensitive to chrominance
than to brightness
Translate brightness into chrominance and then
the resolution does not have to be as good ?
lower necessary bit-rate

Coloring Scheme
JPEG Coloring Blocks Luminance Cr Cb Cg
Normal Red Green Blue
Translation formulas Y WrR WbB WgG Cr
Wr (R - Y) Cb Wb (B - Y) Cg Wg (G - Y)
29
Video Coloring Scheme

Chrominance means the difference between one
color and a reference color of the same
brightness and chromaticity.
Block composed of six blocks (420 or 411
format)
Four blocks of yellow (luminance)
One block of Cb (blue chrominance)
One block of Cr (red chrominance)
Down-sampled chrominance
Y Cb Cr coordinate and four sub-sampling formats

Ref Y. Wang, J. Osterman, Y-Q Zhang Digital
Video Processing Communications, Prentice-Hall,
2001
30
Intra-frame Compression

Intra-frame Coding
Reduces spatial redundancy to reduce necessary
transmission rate
Encoding I-blocks are practically identical to
JPEG standard
Makes use of the DCT transform along with zigzag
ordering
Lossy data compression

31
Fundamentals of JPEG
Encoder
DCT
Quantizer
Entropy coder
Compressed image data
IDCT
Dequantizer
Entropy decoder
Decoder
32
Fundamentals of JPEG

JPEG works on 88 blocks
Extract 88 block of pixels
Convert to DCT domain
Quantize each coefficient
Different stepsize for each coefficient
Based on sensitivity of human visual system
Order coefficients in zig-zag order
Similar frequencies are grouped together
Run-length encode the quantized values and then
use Huffman coding on what is left

33
Random Access and Inter-frame Compression

Temporal Redundancy
Only perform repeated encoding of the parts of a
picture frame that are rapidly changing
Do not repeatedly encode background elements and
still elements

Random access capability
Prediction that does not depend upon the user
accessing the first frame (skipping through movie
scenes, arbitrary point pick-up)

34
3-D Motion -gt 2-D Motion
3-D MV
2-D MV
35
Sample (2D) Motion Field
Anchor Frame
Target Frame
Motion Field
36
2-D Motion Corresponding to Camera Motion
Camera zoom
Camera rotation around Z-axis (roll)
37
General Considerationsfor Motion Estimation

Two categories of approaches
Feature based (more often used in object
tracking, 3D reconstruction from 2D)
Intensity based (based on constant intensity
assumption) (more often used for motion
compensated prediction, required in video coding,
frame interpolation)
Three important questions
How to represent the motion field?
What criteria to use to estimate motion
parameters?
How to search motion parameters?

38
Motion Representation
Pixel-based One MV at each pixel, with some
smoothness constraint between adjacent MVs.
Global Entire motion field is represented by a
few global parameters
Block-based Entire frame is divided into blocks,
and motion in each block is characterized by a
few parameters. Also mesh-based (flow of
corners, approximated inside)
Region-based Entire frame is divided into
regions, each region corresponding to an object
or sub-object with consistent motion, represented
by a few parameters.
39
Examples
target frame
anchor frame
Predicted target frame
Motion field
Half-pel Exhaustive Block Matching Algorithm
(EBMA)
40
Examples
Predicted target frame
Three-level Hierarchical Block Matching Algorithm
41
Examples
EBMA
mesh-based method
EBMA vs. Mesh-based Motion Estimation
42
Motion Compensated Prediction

Divide current frame, i, into disjoint 1616
macroblocks
Search a window in previous frame, i-1, for
closest match
Calculate the prediction error
For each of the four 88 blocks in the
macroblock, perform DCT-based coding
Transmit motion vector entropy coded prediction
error (lossy coding)

43
Decoding with non-random access

Decoding and playing sub-frames located in
section G, all frames before section G must be
decoded as well
Synchronization algorithm issues
If section G is far along in the movie, this
could take a considerable amount of time

44
Decoding with random access
Introduce I frames, frames that are NOT
predictively encoded by design Frames that are
still encoded using a prediction algorithm are
called P frames

When decoding any frame after an I frame (frame G
in this example)
we only have to decode past frames until we reach
an I-frame
saves time when skipping from frame to frame
I-frames are not predictively encoded
reduction in compression ratio
Depending on the concentration of I frames, there
is a tradeoff
More I frames ? faster random access time
Less I frames ? better compression ratio

45
MPEG-1 Video Coding

Most MPEG1 implementations use a large number of
I frames to ensure fast access
Somewhat low compression ratio by itself
For predictive coding, P frames depend on only a
small number of past frames
Using less past frames reduces the propagation
error
To further enhance compression in an MPEG-1 file,
introduce a third frame called the B frame ?
bi-directional frame
B frames are encoded using predictive coding of
only two other frames a past frame and a future
frame
By looking at both the past and the future, helps
reduce prediction error due to rapid changes from
frame to frame (i.e. a fight scene or fast-action
scene)

46
Predictive coding hierarchyI, P and B frames

I frames (black) do not depend on any other frame
and are encoded separately
Called Anchor frame
P frames (red) depend on the last P frame or I
frame (whichever is closer)
Also called Anchor frame
B frames (blue) depend on two frames the closest
past P or I frame, and the closest future P or I
frame
B frames are NOT used to predict other B frames,
only P frames and I frames are used for
predicting other frames

47
MPEG-1 Temporal Order of Compression

I frames are generated and compressed first
Have no frame dependence
P frames are generated and compressed second
Only depend upon the past I frame values
B frames are generated and compressed last
Depend on surrounding frames
Forward prediction needed

48
Adaptive Predictive Coding inMPEG-1

Coding each block in P-frame
Predictive block using previous I/P frame as
reference
Intra-block encode without prediction
use this if prediction costs more bits than
non-prediction
good for occluded area
can also avoid error propagation
Coding each block in B-frame
Intra-block encode without prediction
Predictive block
use previous I/P frame as reference (forward
prediction)
or use future I/P frame as reference (backward
prediction)
or use both for prediction

49
Codec Adjustments

For smoothing out bit rate
A few applications prefer approx. constant bit
rate video stream (CBR)
e.g., prescribe number of bits per second
very-short-term bit-rate variations can be
smoothed by a buffer
variations cannot be too large on longer term,
else buffer overflow
For reducing bit rate by exploiting Human Vision
System (HVS) temporal properties
Noise/distortion in a video frame would not be
very much visible when there is a sharp temporal
transition (scene change)
can compress a few frames right after scene
change with less bits
Changing the frame types
I I I I I I lowest compression ratio (like
MJPEG)
I P P P I P P moderate compression ratio
I B B P B B P B B I highest compression ratio

50
MPEG Library

The MPEG Library is a C library for decoding
MPEG-1 video streams and dithering them to a
variety of color schemes.
Most of the code in the library comes directly
from an old version of the Berkeley MPEG player
(mpeg_play)
The Library can be downloaded from
http//starship.python.net/gward/mpeglib/mpeg_lib
-1.3.1.tar.gz
It works good on all modern Unix and Unix-like
platforms with an ANSI C compiler. I have tested
it on grad.
NOTE - This is not the best library available.
But it works good for MPEG-1 and it is fairly
easy to use. If you are inquisitive, you should
check MPEG Software Simulation Group at
http//www.mpeg.org/MPEG/MSSG/ where you can find
a free MPEG-2 video coder/decoder.

51
MPEG Library

Using the Library is very similar to the way in
which files are normally handled
Open an MPEG stream and initialize internal data
structures
Read frames until the stream is done in
If need be, you can rewind the stream and start
over
When done, you close the stream and clean up
NOTE You cannot randomly access the stream. This
is a limitation of the Library because of the
nature of the decoding engine on which the
Library is built.
NOTE Berkeley decoding engine profusely depends
on global variables and hence cannot decode more
than one MPEG at a time.

52
MPEGe Library

The MPEGe(ncoding) Library is designed to allow
you to create MPEG movies from your application
The library can be downloaded from the files
section of
http//groups.yahoo.com/group/mpegelib/
The encoder library uses the Berkeley MPEG
encoder engine, which handles all the
complexities of MPEG streams
As was the case with the decoder, this library
can write only one MPEG movie at a time
The library works good with most of the common
image formats
To keep things simple, we will stick to PPM

53
MPEGe Library Functions

The library consists of 3 simple functions
MPEGe_open for initializing the encoder.
MPEGe_image called each time you want to add a
frame to the sequence. The format of the image
pointed to by image is that used by the SDSC
Image library
SDSC is a powerful library which will allow you
to read/write 32 different image types and also
contains functions to manipulate them. The source
code as well as pre-compiled binaries can be
downloaded at ftp//ftp.sdsc.edu/pub/sdsc/graphics
/
MPEGe_close called to end the MPEG sequence. This
function will reset the library to a sane state
and create the MPEG end sequences and close the
output file

Note All functions return non NULL (i.e. TRUE)
on success and Zero (or FALSE) on failure.
54
Usage Details

You are not required to write code using the
libraries to decode and encode MPEG streams
Copy the binary executables from
http//www.csee.usf.edu/vmanohar/DIP/readframes
http//www.csee.usf.edu/vmanohar/DIP/encodeframes
Usage
To read frames from an MPEG movie (say test.mpg)
and store them in a directory extractframes
(relative to your current working directory) with
the prefix testframe (to the filename)
readframes test.mpg extractframes/testframe
This will decode all the frames of test.mpg
into the directory extractframes with the
filenames testframe0.ppm, testframe1.ppm
To encode,
encodeframes 0 60 extractframes/testframe
testresult.mpg
This will encode images testframe0.ppm to
testframe60.ppm from the directory extractframes
into testresult.mpg
In order to convert between PPM and PGM formats,
copy the script from
http//www.csee.usf.edu/vmanohar/DIP/batchconvert
Usage
To convert all the PPM files in the directory
extractframes to PGM
batchconvert extractframes ppm pgm
To convert all the PGM files in the directory
extractframes to PPM
batchconvert extractframes pgm ppm