Digital Video Processing - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

Digital Video Processing

Description:

Digital Video Processing Vasant Manohar Computer Science and Engineering University of South Florida http://www.csee.usf.edu/~vmanohar vmanohar_at_cse.usf.edu – PowerPoint PPT presentation

Number of Views:463
Avg rating:3.0/5.0
Slides: 56
Provided by: vma89
Category:

less

Transcript and Presenter's Notes

Title: Digital Video Processing


1
Digital Video Processing
Digital Image Processing Fall 2008
Prof. Dmitry Goldgof
  • Vasant Manohar
  • Computer Science and Engineering
  • University of South Florida
  • http//www.csee.usf.edu/vmanohar
  • ?vmanohar_at_cse.usf.edu

2
Outline
  • Basics of Video
  • Digital Video
  • MPEG
  • Summary

3
Basics of Video
  • Static scene capture ? Image
  • Bring in motion ? Video
  • Image sequence A 3-D signal
  • 2 spatial dimensions 1 time dimension
  • Continuous I (x, y, t) ? discrete I (m, n, tk)

4
Video Camera
  • Frame-by-frame capturing
  • CCD sensors (Charge-Coupled Devices)
  • 2-D array of solid-state sensors
  • Each sensor corresponds to a pixel
  • Stored in a buffer and sequentially read out
  • Widely used

5
Video Display
  • CRT (Cathode Ray Tube)
  • Large dynamic range
  • Bulky for large display
  • CRT physical depth has to be proportional to
    screen width
  • LCD Flat-panel display
  • Use electrical field to change the optical
    properties, thereby the brightness/color of
    liquid crystal
  • Generating the electrical field
  • By an array of transistors active-matrix
    thin-film transistors

Active-matrix TFT display has a transistor
located at each pixel, allowing display to be
switched more frequently and with less current to
control pixel luminance. Passive matrix LCD has a
grid of conductors with pixels located at the
grid intersections
6
Composite vs. Component Video
  • Component video
  • Three separate signals for tri-stimulus color
    representation or luminance-chrominance
    representation
  • Pro higher quality
  • Con need high bandwidth and synchronization
  • Composite video
  • Multiplex into a single signal
  • Historical reason for transmitting color TV
    through monochrome channel
  • Pro save bandwidth
  • Con cross talk
  • S-video
  • Luminance signal single multiplexed chrominance
    signal

7
Progressive vs. Interlaced Videos
  • Progressive
  • Every pixel on the screen is refreshed in order
    (monitors) or simultaneously (films)
  • Interlaced
  • Refreshed twice every frame the little gun at
    the back of your CRT shoots all the correct
    phosphors on the even numbered rows of pixels
    first and then odd numbered rows
  • NTSC frame-rate of 29.97 means the screen is
    redrawn 59.94 times a second
  • In other words, 59.94 half-frames per second or
    59.94 fields per second

8
Progressive vs. Interlaced Videos
  • How interlaced video could cause problems
  • Suppose you resize a 720 x 480 interlaced video
    to 576 x 384 (20 reduction)
  • How does resizing work?
  • takes a sample of the pixels from the original
    source and blends them together to create the new
    pixels
  • In case of interlaced video, you might end of
    blending scan lines of two completely different
    images!

9
Progressive vs. Interlaced Videos
Observe distinct scan lines
Image in full 720 x 480 resolution
10
Progressive vs. Interlaced Videos
Image after being resized to 576x384
Some scan lines blended together!
11
Aspect Ratio
  • When you view pure NTSC video on your monitor,
    people look a little fatter than normal ?
  • TV video stored in 32 aspect ratio, while
    monitors store picture data in 43 aspect ratio
  • A lot of capture cards crop off the 16 pixels in
    the horizontal edges and capture in 704 x 480 or
    352 x 480
  • Aspect ratios in movies
  • 53 ? mostly used in animation movies
  • 169 ? academy ratio
  • 219 ? cinescope

12
Aspect Ratio
  • Converting widescreen pictures to 43 TV format
  • letterbox format (black bars above and below the
    picture)
  • losing parts of the picture
  • If we convert a 219 picture, we might lose a
    large part of the picture (blue 169, red 43)

13
  • DIGITAL VIDEO

14
Why Digital?
  • Exactness
  • Exact reproduction without degradation
  • Accurate duplication of processing result
  • Convenient powerful computer-aided processing
  • Can perform rather sophisticated processing
    through hardware or software
  • Easy storage and transmission
  • 1 DVD can store a three-hour movie !!!
  • Transmission of high quality video through
    network in reasonable time

15
Digital Video Coding
  • The basic idea is to remove redundancy in video
    and encode it
  • Perceptual redundancy
  • The Human Visual System is less sensitive to
    color and high frequencies
  • Spatial redundancy
  • Pixels in a neighborhood have close luminance
    levels
  • Low frequency
  • How about temporal redundancy?
  • Differences between subsequent frames are very
    less. Shouldnt we exploit this?

16
Hybrid Video Coding
  • Hybrid combination of Spatial, Perceptual,
    Temporal redundancy removal
  • Issues to be handled
  • Not all regions are easily inferable from
    previous frame
  • Occlusion solved by backward prediction using
    future frames as reference
  • The decision of whether to use prediction or not
    is made adaptively
  • Drifting and error propagation
  • Solved by encoding reference regions or frames at
    constant intervals of time
  • Random access
  • Solved by encoding frame without prediction at
    constant intervals of time
  • Bit allocation
  • according to statistics
  • constant and variable bit-rate requirement

MPEG combines all of these features !!!
17
MPEG
  • MPEG Moving Pictures Experts Group
  • Coding of moving pictures and associated audio
  • Picture part
  • Can achieve compression ratio of about 501
    through storing only the difference between
    successive frames
  • Even higher compression ratios possible
  • Audio part
  • Compression of audio data at ratios ranging from
    51 to 101
  • MP3 MPEG-1 audio Layer-3

18
MPEG Generations
19
Bit Rate
  • Defined in two ways
  • bits per second (all inter-frame compression
    algorithms)
  • bits per frame (most intra-frame compression
    algorithms except DV and MJPEG)
  • What does this mean?
  • If you encode something in MPEG, specify it to be
    1.5 Mbps it doesnt matter what the frame-rate
    is, it takes the same amount of space ? lower
    frame-rate will look sharper but less smooth
  • If you do the same with a codec like Huffyuv or
    Intel Indeo, you will get the same image quality
    through all of them, but the smoothness and file
    sizes will change as frame-rate changes

20
Data Hierarchy
  • Sequence entire video sequence
  • Group of Pictures basic unit allowing for random
    access
  • Picture primary coding unit with three color
    components and different picture formats
    progressive or interlaced scanning modes
  • Slice or Group of Blocks basic unit for
    resynchronization refresh and error recovery
    (skipped if erroneous)
  • Macro-block motion compensation unit
  • Block transform and compression unit

21
MPEG-1 Compression Aspects
  • Lossless and Lossy compression are both used for
    a high compression rate
  • Down-sampled chrominance
  • Perceptual redundancy
  • Intra-frame compression
  • Spatial redundancy
  • Correlation/compression within a frame
  • Based on baseline JPEG compression standard
  • Inter-frame compression
  • Temporal redundancy
  • Correlation/compression between like frames
  • Audio compression
  • Three different layers (MP3)

22
Perceptual Redundancy
  • Here is an image represented with 8-bits per pixel

23
Perceptual Redundancy
  • The same image at 7-bits per pixel

24
Perceptual Redundancy
  • At 6-bits per pixel

25
Perceptual Redundancy
  • At 5-bits per pixel

26
Perceptual Redundancy
  • At 4-bits per pixel

27
Perceptual Redundancy
  • It is clear that we dont all these bits!
  • Our previous example illustrated the eyes
    sensitivity to luminance
  • We can build a perceptual model
  • Give more importance to what is perceivable to
    the Human Visual System
  • Usually this is a function of the spatial
    frequency

28
Video Coloring Scheme
  • Translate the RGB system into a YUV system
  • Human perception is less sensitive to chrominance
    than to brightness
  • Translate brightness into chrominance and then
    the resolution does not have to be as good ?
    lower necessary bit-rate

Coloring Scheme
JPEG Coloring Blocks Luminance Cr Cb Cg
Normal Red Green Blue
Translation formulas Y WrR WbB WgG Cr
Wr (R - Y) Cb Wb (B - Y) Cg Wg (G - Y)
29
Video Coloring Scheme
  • Chrominance means the difference between one
    color and a reference color of the same
    brightness and chromaticity.
  • Block composed of six blocks (420 or 411
    format)
  • Four blocks of yellow (luminance)
  • One block of Cb (blue chrominance)
  • One block of Cr (red chrominance)
  • Down-sampled chrominance
  • Y Cb Cr coordinate and four sub-sampling formats

Ref Y. Wang, J. Osterman, Y-Q Zhang Digital
Video Processing Communications, Prentice-Hall,
2001
30
Intra-frame Compression
  • Intra-frame Coding
  • Reduces spatial redundancy to reduce necessary
    transmission rate
  • Encoding I-blocks are practically identical to
    JPEG standard
  • Makes use of the DCT transform along with zigzag
    ordering
  • Lossy data compression

31
Fundamentals of JPEG
Encoder
DCT
Quantizer
Entropy coder
Compressed image data
IDCT
Dequantizer
Entropy decoder
Decoder
32
Fundamentals of JPEG
  • JPEG works on 88 blocks
  • Extract 88 block of pixels
  • Convert to DCT domain
  • Quantize each coefficient
  • Different stepsize for each coefficient
  • Based on sensitivity of human visual system
  • Order coefficients in zig-zag order
  • Similar frequencies are grouped together
  • Run-length encode the quantized values and then
    use Huffman coding on what is left

33
Random Access and Inter-frame Compression
  • Temporal Redundancy
  • Only perform repeated encoding of the parts of a
    picture frame that are rapidly changing
  • Do not repeatedly encode background elements and
    still elements
  • Random access capability
  • Prediction that does not depend upon the user
    accessing the first frame (skipping through movie
    scenes, arbitrary point pick-up)

34
3-D Motion -gt 2-D Motion
3-D MV
2-D MV
35
Sample (2D) Motion Field
Anchor Frame
Target Frame
Motion Field
36
2-D Motion Corresponding to Camera Motion
Camera zoom
Camera rotation around Z-axis (roll)
37
General Considerationsfor Motion Estimation
  • Two categories of approaches
  • Feature based (more often used in object
    tracking, 3D reconstruction from 2D)
  • Intensity based (based on constant intensity
    assumption) (more often used for motion
    compensated prediction, required in video coding,
    frame interpolation)
  • Three important questions
  • How to represent the motion field?
  • What criteria to use to estimate motion
    parameters?
  • How to search motion parameters?

38
Motion Representation
Pixel-based One MV at each pixel, with some
smoothness constraint between adjacent MVs.
Global Entire motion field is represented by a
few global parameters
Block-based Entire frame is divided into blocks,
and motion in each block is characterized by a
few parameters. Also mesh-based (flow of
corners, approximated inside)
Region-based Entire frame is divided into
regions, each region corresponding to an object
or sub-object with consistent motion, represented
by a few parameters.
39
Examples
target frame
anchor frame
Predicted target frame
Motion field
Half-pel Exhaustive Block Matching Algorithm
(EBMA)
40
Examples
Predicted target frame
Three-level Hierarchical Block Matching Algorithm
41
Examples
EBMA
mesh-based method
EBMA vs. Mesh-based Motion Estimation
42
Motion Compensated Prediction
  • Divide current frame, i, into disjoint 1616
    macroblocks
  • Search a window in previous frame, i-1, for
    closest match
  • Calculate the prediction error
  • For each of the four 88 blocks in the
    macroblock, perform DCT-based coding
  • Transmit motion vector entropy coded prediction
    error (lossy coding)

43
Decoding with non-random access
  • Decoding and playing sub-frames located in
    section G, all frames before section G must be
    decoded as well
  • Synchronization algorithm issues
  • If section G is far along in the movie, this
    could take a considerable amount of time

44
Decoding with random access
Introduce I frames, frames that are NOT
predictively encoded by design Frames that are
still encoded using a prediction algorithm are
called P frames
  • When decoding any frame after an I frame (frame G
    in this example)
  • we only have to decode past frames until we reach
    an I-frame
  • saves time when skipping from frame to frame
  • I-frames are not predictively encoded
  • reduction in compression ratio
  • Depending on the concentration of I frames, there
    is a tradeoff
  • More I frames ? faster random access time
  • Less I frames ? better compression ratio

45
MPEG-1 Video Coding
  • Most MPEG1 implementations use a large number of
    I frames to ensure fast access
  • Somewhat low compression ratio by itself
  • For predictive coding, P frames depend on only a
    small number of past frames
  • Using less past frames reduces the propagation
    error
  • To further enhance compression in an MPEG-1 file,
    introduce a third frame called the B frame ?
    bi-directional frame
  • B frames are encoded using predictive coding of
    only two other frames a past frame and a future
    frame
  • By looking at both the past and the future, helps
    reduce prediction error due to rapid changes from
    frame to frame (i.e. a fight scene or fast-action
    scene)

46
Predictive coding hierarchyI, P and B frames
  • I frames (black) do not depend on any other frame
    and are encoded separately
  • Called Anchor frame
  • P frames (red) depend on the last P frame or I
    frame (whichever is closer)
  • Also called Anchor frame
  • B frames (blue) depend on two frames the closest
    past P or I frame, and the closest future P or I
    frame
  • B frames are NOT used to predict other B frames,
    only P frames and I frames are used for
    predicting other frames

47
MPEG-1 Temporal Order of Compression
  • I frames are generated and compressed first
  • Have no frame dependence
  • P frames are generated and compressed second
  • Only depend upon the past I frame values
  • B frames are generated and compressed last
  • Depend on surrounding frames
  • Forward prediction needed

48
Adaptive Predictive Coding inMPEG-1
  • Coding each block in P-frame
  • Predictive block using previous I/P frame as
    reference
  • Intra-block encode without prediction
  • use this if prediction costs more bits than
    non-prediction
  • good for occluded area
  • can also avoid error propagation
  • Coding each block in B-frame
  • Intra-block encode without prediction
  • Predictive block
  • use previous I/P frame as reference (forward
    prediction)
  • or use future I/P frame as reference (backward
    prediction)
  • or use both for prediction

49
Codec Adjustments
  • For smoothing out bit rate
  • A few applications prefer approx. constant bit
    rate video stream (CBR)
  • e.g., prescribe number of bits per second
  • very-short-term bit-rate variations can be
    smoothed by a buffer
  • variations cannot be too large on longer term,
    else buffer overflow
  • For reducing bit rate by exploiting Human Vision
    System (HVS) temporal properties
  • Noise/distortion in a video frame would not be
    very much visible when there is a sharp temporal
    transition (scene change)
  • can compress a few frames right after scene
    change with less bits
  • Changing the frame types
  • I I I I I I lowest compression ratio (like
    MJPEG)
  • I P P P I P P moderate compression ratio
  • I B B P B B P B B I highest compression ratio

50
MPEG Library
  • The MPEG Library is a C library for decoding
    MPEG-1 video streams and dithering them to a
    variety of color schemes.
  • Most of the code in the library comes directly
    from an old version of the Berkeley MPEG player
    (mpeg_play)
  • The Library can be downloaded from
  • http//starship.python.net/gward/mpeglib/mpeg_lib
    -1.3.1.tar.gz
  • It works good on all modern Unix and Unix-like
    platforms with an ANSI C compiler. I have tested
    it on grad.
  • NOTE - This is not the best library available.
    But it works good for MPEG-1 and it is fairly
    easy to use. If you are inquisitive, you should
    check MPEG Software Simulation Group at
    http//www.mpeg.org/MPEG/MSSG/ where you can find
    a free MPEG-2 video coder/decoder.

51
MPEG Library
  • Using the Library is very similar to the way in
    which files are normally handled
  • Open an MPEG stream and initialize internal data
    structures
  • Read frames until the stream is done in
  • If need be, you can rewind the stream and start
    over
  • When done, you close the stream and clean up
  • NOTE You cannot randomly access the stream. This
    is a limitation of the Library because of the
    nature of the decoding engine on which the
    Library is built.
  • NOTE Berkeley decoding engine profusely depends
    on global variables and hence cannot decode more
    than one MPEG at a time.

52
MPEGe Library
  • The MPEGe(ncoding) Library is designed to allow
    you to create MPEG movies from your application
  • The library can be downloaded from the files
    section of
  • http//groups.yahoo.com/group/mpegelib/
  • The encoder library uses the Berkeley MPEG
    encoder engine, which handles all the
    complexities of MPEG streams
  • As was the case with the decoder, this library
    can write only one MPEG movie at a time
  • The library works good with most of the common
    image formats
  • To keep things simple, we will stick to PPM

53
MPEGe Library Functions
  • The library consists of 3 simple functions
  • MPEGe_open for initializing the encoder.
  • MPEGe_image called each time you want to add a
    frame to the sequence. The format of the image
    pointed to by image is that used by the SDSC
    Image library
  • SDSC is a powerful library which will allow you
    to read/write 32 different image types and also
    contains functions to manipulate them. The source
    code as well as pre-compiled binaries can be
    downloaded at ftp//ftp.sdsc.edu/pub/sdsc/graphics
    /
  • MPEGe_close called to end the MPEG sequence. This
    function will reset the library to a sane state
    and create the MPEG end sequences and close the
    output file

Note All functions return non NULL (i.e. TRUE)
on success and Zero (or FALSE) on failure.
54
Usage Details
  • You are not required to write code using the
    libraries to decode and encode MPEG streams
  • Copy the binary executables from
  • http//www.csee.usf.edu/vmanohar/DIP/readframes
  • http//www.csee.usf.edu/vmanohar/DIP/encodeframes
  • Usage
  • To read frames from an MPEG movie (say test.mpg)
    and store them in a directory extractframes
    (relative to your current working directory) with
    the prefix testframe (to the filename)
  • readframes test.mpg extractframes/testframe
  • This will decode all the frames of test.mpg
    into the directory extractframes with the
    filenames testframe0.ppm, testframe1.ppm
  • To encode,
  • encodeframes 0 60 extractframes/testframe
    testresult.mpg
  • This will encode images testframe0.ppm to
    testframe60.ppm from the directory extractframes
    into testresult.mpg
  • In order to convert between PPM and PGM formats,
    copy the script from
  • http//www.csee.usf.edu/vmanohar/DIP/batchconvert
  • Usage
  • To convert all the PPM files in the directory
    extractframes to PGM
  • batchconvert extractframes ppm pgm
  • To convert all the PGM files in the directory
    extractframes to PPM
  • batchconvert extractframes pgm ppm

55
MPEG-1 Summary
  • 3-type frame structures
  • I-frame intra-coded (without prediction)
  • Like JPEG
  • P-frame predictive-coded (with previous frame
    as reference)
  • B-frame bidirectional predictive coded
  • Group-of-Pictures (GOP)
  • Frame order
  • I1BBP1BBP2BBI2
  • Coding order
  • I1P1BBP2BBI2BB
  • User selectable parameters
  • Distance between I-frames in a GOP
  • Distance between P-frames in a GOP
  • Bit-rate (Constant/Variable, Bandwidth)
Write a Comment
User Comments (0)
About PowerShow.com