Video Processing - PowerPoint PPT Presentation

About This Presentation

Title:

Video Processing

Description:

Most computer systems use Component Video, ... I and Q are combined into a chroma signal, ... videos and music Traditional solutions: file IDs, keywords, ... – PowerPoint PPT presentation

Number of Views:190

Avg rating:3.0/5.0

Slides: 26

Provided by: cwy60

Category:

more less

Transcript and Presenter's Notes

Title: Video Processing

1

Video Processing
Wen-Hung Liao
6/2/2005

2
Outline

Basics of Video
Video Processing
Video coding/compression/conversion
Digital video production
Video special effects
Video content analysis
Summary

3
Basics of Video

Component video
Composite video
Digital video

4
Component Video

Higher-end video systems make use of three
separate video signals for the red, green, and
blue image planes. Each color channel is sent as
a separate video signal.
Most computer systems use Component Video, with
separate signals for R, G, and B signals.
For any color separation scheme, Component Video
gives the best color reproduction since there is
no crosstalk between the three channels.
This is not the case for S-Video or Composite
Video, discussed next.
Component video, however, requires more bandwidth
and good synchronization of the three components.

5
Composite Video

Color (chrominance) and intensity (luminance)
signals are mixed into a single carrier wave.
Chrominance is a composition of two color
components (I and Q, or U and V).
In NTSC TV, e.g., I and Q are combined into a
chroma signal, and a color subcarrier is then
employed to put the chroma signal at the
high-frequency end of the signal shared with the
luminance signal.
The chrominance and luminance components can be
separated at the receiver end and then the two
color components can be further recovered.
When connecting to TVs or VCRs, Composite Video
uses only one wire and video color signals are
mixed, not sent separately. The audio and sync
signals are additions to this one signal.
Since color and intensity are wrapped into the
same signal, some interference between the
luminance and chrominance signals is inevitable.

6
S-Video

As a compromise, (Separated video, or
Super-video, e.g., in S-VHS) uses two wires, one
for luminance and another for a composite
chrominance signal.
As a result, there is less crosstalk between the
color information and the crucial gray-scale
information.
The reason for placing luminance into its own
part of the signal is that black-and-white
information is most crucial for visual
perception.
In fact, humans are able to differentiate spatial
resolution in gray-scale images with a much
higher acuity than for the color part of color
images.
As a result, we can send less accurate color
information than must be sent for intensity
information we can only see fairly large blobs
of color, so it makes sense to send less color
detail.

7
Digital Video

The advantages of digital representation for
video are many.
For example
Video can be stored on digital devices or in
memory, ready to be processed (noise removal, cut
and paste, etc.), and integrated to various
multimedia applications
Direct access is possible, which makes nonlinear
video editing achievable as a simple, rather than
a complex, task
Repeated recording does not degrade image
quality
Ease of encryption and better tolerance to
channel noise.

8
Chroma Subsampling

Since humans see color with much less spatial
resolution than they see black and white, it
makes sense to decimate the chrominance signal.
Interesting (but not necessarily informative!)
names have arisen to label the different schemes
used.
To begin with, numbers are given stating how many
pixel values, per four original pixels, are
actually sent
The chroma subsampling scheme 444 indicates
that no chroma subsampling is used each pixel's
Y, Cb and Cr values are transmitted, 4 for each
of Y, Cb, Cr.

9
Chroma Subsampling (2)

The scheme 422 indicates horizontal subsampling
of the Cb, Cr signals by a factor of 2. That is,
of four pixels horizontally labeled as 0 to 3,
all four Ys are sent, and every two Cb's and two
Cr's are sent, as (Cb0, Y0)(Cr0,Y1)(Cb2, Y2)(Cr2,
Y3)(Cb4, Y4), and so on (or averaging is used).
The scheme 411 subsamples horizontally by a
factor of 4.
The scheme 420 subsamples in both the
horizontal and vertical dimensions by a factor of
2.

10
Chroma Subsampling (3)
11
RGB/YUV Conversion

http//www.fourcc.org/index.php?http3A//www.fourc
c.org/intro.php
RGB to YUV Conversion
Y (0.257 R) (0.504 G) (0.098 B) 16
Cr V (0.439 R) - (0.368 G) - (0.071 B)
128
Cb U -(0.148 R) - (0.291 G) (0.439 B)
128
YUV to RGB Conversion
B 1.164(Y - 16) 2.018(U - 128)
G 1.164(Y - 16) - 0.813(V - 128) - 0.391(U -
128)
R 1.164(Y - 16) 1.596(V - 128)

12
Video Coding Standards

MPEG Standards (1, 2,4,7,21)
MPEG-1 VCD
MPEG-2 DVD
MPEG-4 video objects
MPEG-7 Multimedia database
MPEG-21 framework
H.26x series (H.261,H.263,H.264) video
conferencing

13
Digital Video Production

Tools Adobe Premiere, After Effects,
Resources http//www.cc.gatech.edu/dvfx/resources
.htm
Exampleshttp//www.cc.gatech.edu/dvfx/videos/dvf
x2005.html

14
Video Special Effects

Examples
EffectTV http//effectv.sourceforge.net/
FreeFrame http//freeframe.sourceforge.net/galler
y.html

15
Types of Special Effects

Applying to the whole image frame
Applying to part of the image (edges, moving
pixels,)
Applying to a collection of frames
Applying to detected areas
Overlaying virtual objects
at pre-determined locations
in response to users position

16
Video Content Analysis

Event detection
For indexing/searching
To obtain high-level semantic description of the
content.

17
Image Databases

Problem accessing and searching large databases
of images, videos and music
Traditional solutions file IDs, keywords,
associated text.
Problems
cant query based on visual or musical properties
depends on the particular vocabulary used
doesnt provide queries by example
time consuming
Solution content-based retrieval using automatic
analysis tools (see http//wwwqbic.almaden.ibm.com
)

18
Retrieval of images by similarity

Components
Extraction of features or image signatures and
efficient representation and storage
A set of similarity measures
A user interface for efficient and ordered
representation of retrieved images and to
support relevance feedback
Considerations
Many definitions of similarity are possible
User interface plays a crucial role
Visual content-based retrieval is best utilized
when combined with traditional search

19
Image features for similarity definition

Color similarity
Similarity e.g., distance between color
histograms
Should use perceptually meaningful color spaces
(HSV, Lab...)
Should be relatively independent of illumination
(color constancy)
Localityfind a red object such as this one
Texture similarity
Texture feature extraction (statistical models)
Texture qualities directionality, roughness,
granularity...

20
Shape Similarity

Must distinguish between similarity between
actual geometrical 2-D shapes in the image and
underlying 3-D shape
Shape features circularity, eccentricity,
principal axis orientation...
Spatial similarity
Assumes images have been (automatically or
manually) segmented into meaningful objects
(symbolic image)
Considers the spatial layout of the objects in
the scene
Object presence analysis
Is this particular object in the image?

21
Main components of retrieval system

Database population images and videos are
processed to extract features (color, texture,
shape, camera and object motion)
Database query user composes query via graphic
user interface. Features are generated from
graphical query and input to matching engine
Relevance feedback automatically adjusts
existing query using information fed back by user
about relevance of previously retrieved objects

22
Video parsing and representation

Interaction with video using conventional
VCR-like manipulation is difficult - need to
introduce structural video analysis
Video parsing
Temporal segmentation into elemental units
Compact representation of elemental unit

23
Temporal segmentation

Fundamental unit of video manipulation video
shots
Types of transition between shots
Abrupt shot change
Fades slow change in brightness
Dissolve
Wipe pixels from second shots replace those of
previous shot in regular patterns
Other factors of image change
Motion, including camera motion and object motion
Luminosity changes and noise

24
Representation of Video

Video database population has three major
components
Shot detection
Representative frame creation for each shot
Derivation of layered representation of
coherently moving structures/objects
A representative frame (R-frame) is used for
population R-frame is treated as a still image
for representation
query R-frames are basic units initially
returned in video query
Choice of R-frame
first - middle - last frame in video shot
sprite built by seamless mosaicing all frames in
a shot

25
Video soundtrack analysis

Image/sound relationships are critical to the
perception and understanding of video content.
Possibilities
Speech, music and Foley sound, detection and
representation
Locutor identification and retrieval
Word spotting and labeling (speech recognition)
A possible query could be find the next time
this locutor is again present in this soundtrack
Video scene analysis
500-1000 shots per hours in typical movies
One level above shot sequence or scene (a series
of consecutive shots constituting a unit from the
narrative point of view)