Title: Image and Video Compression A presentation to Avocent
1Image and Video CompressionA presentation to
Avocent
- Noel OConnor, Andrew Kinane, Daniel Larkin
- 19/09/2006
2Overview
- Lossless Compression
- Entropy coding a brief review
- Huffman Coding
- Arithmetic Coding
- Lossless Compression Standards
- The FAX Group Standards, JBIG, Lossless JPEG
- Lossy Compression
- Generic Codec Structure
- DCT/IDCT
- Quantization
- Motion Estimation
- Motion Compensation
- Lossy Compression Standards
- JPEG, JPEG2000, H.261 / H.263 / H.264,
MPEG-1/-2/-4 - Image Analysis Techniques
- Visual Feature Extraction
3Lossless CompressionEntropy Coding
4Entropy Coding
- Also referred to as source coding
- Assign each symbol a binary codeword
- Allocate a specific string of bits to a symbol
- Based on information theory
- S s1 sN is set of symbols to encode with
probabilities p1 pN - Entropy H(s) is measure of the information
content - Specifies lower bound on efficiency
5Huffman Coding
- A form of Variable Length Coding
- Assign shorter code-words to symbols most likely
to occur, longer to those less likely - Problem must choose code-words carefully!
- Must obey prefix condition so decoder can parse
bitstream
Sequence s1, s4, s3, s2 Bitstream 1 0 1 0
0 1 1 0 1 Decoder
s1
s4
s3
s2
s1
s2 or s4?
6Huffman Coding
- Ensures instantaneously parseable code-words
- 100 efficient when p1 pN are negative
exponents of 2 (0.5, 0.25, etc ) - Algorithm generate Huffman coding tree
- Form the tree
- Sort the symbols by their probabilities
- Merge the two smallest probabilities by adding
them and produce a new node in the tree - Repeat until only a singe node is reached
- Assign bits
- Traverse the tree from the root to the leaf nodes
assigning each branch encountered a one or zero. - Decoding based on storing codewords in specially
constructed LUT
7Huffman Coding
- Generate code-words for each grey level
- S s1 s2 s3 s4 s5 0,4,5,6,7
- p1 p2 p3 p4 p5 0.125, 0.484, 0.25, 0.125, 0.016
8Huffman Coding
- Generate code-words for each grey level
- S s1 s2 s3 s4 s5 0,4,5,6,7
- p1 p2 p3 p4 p5 0.125, 0.484, 0.25, 0.125, 0.016
9Huffman Coding
- Efficiency
- Calculate Average Coding Rate
- Symbol probability (pi) x code-word length (li)
- Compare to entropy
H(s)
R
10Huffman Coding
- Problems
- Lower bound of 1 bit/symbol
- Does not facilitate adaptive coding
- Example
11Arithmetic Coding
- Treat groups of symbols but maintain a
symbol-by-symbol encoding mechanism - Assign a single codeword to a group of symbols
- Codeword represents a half-open interval on 0.0,
1.0) - By assigning enough precision bits, one interval
can be distinguished from another - Symbols with higher probabilities correspond to
larger intervals, thereby requiring less
precision bits
12Arithmetic Coding
- Sa,b p1 p2 1/3, 2/3
- First symbol narrows interval to that symbols
range - Subsequent symbols further restrict the current
interval. - Decoding reverses this
- Receives number in 0.0, 1.0)
- Checks which symbols range contains this
decode symbol - Since lower upper bounds of symbol known,
their effects on the encoded number can be
reversed - Gives, a new number
- REPEAT
13Arithmetic Coding
- Incremental transmission
- Example message BILLltspacegtGATES
2
25
257
2572
257216
2572167
14Arithmetic Coding
- Can be performed very efficiently using 16/32 bit
integer mathematics - Bits are transmitted as they become available
- Simplification use the value 0.999 rather than
1.0 - In binary arithmetic this corresponds to 0.111
- Only use fractional part gt only need integers
- High initially stores 0xFFFF, whilst Low stores
0x0000 - For each symbol encoded, examine most significant
bit of both High and Low - If these bits are the same, output bit
15Lossless CompressionStandards
16ITU-T Facsimile
- ITU-T Rec. T4 (Group 3)
- Targets scanned business documents
- Binary images white (1), black (0)
- Two modes
- Modified Huffman (MH)
- Run-length encoding is used to form runs of 1s
and 0s for each line in the image - Huffman coding applied to these (run,symbol)
pairs - Different Huffman codes for runs of 1s and 0s
- A special end-of-line (EOL) symbol is encoded for
error detection purposes. - Modified Read (MR)
- Pixel values from the previous line used as
predictors for current pixels to be encoded - Prediction residual is then encoded using Huffman
coding. - MR mode is periodically interspersed with MH
mode.
17JBIG
- Joint Binary Image Experts Group (JBIG) developed
jointly by ITU-T and ISO - Targets bi-level images
- may be either business documents or grey-scale
images of natural scenes rendered as bi-level
images. - Uses adaptive arithmetic encoding
- Modeling step estimates probability of next
symbol based on a context consisting of local
pixels - Probability is then used to drive the arithmetic
encoder - JBIG can be applied to grey-scale images by
treating each grey-level image plane as a
bi-level image.
18Lossless JPEG
- Joint Photographic Experts Group (JPEG) has a
lossless image compression mode. - Prediction for pixel to be encoded based on a
context of previously encoded pixels - Different ways for forming the prediction
- Method used encoded as side-information for each
scan line. - To encode the prediction residual
- (length, magnitude) pair formed
- length indicates the number of bits used to
encode the magnitude - A static Huffman code is used.
- magnitude is the actual residual value directly
encoded.
19Lossless JPEG
- p 190
- p1 184, p2 176
- P 180
- R 180-190 -10
- Encoded as the event (4,0101)
- Negative residuals encoded as 1s complement
- Huffman code for 4 is 001, then this give the
final codeword 0010101 - Decoder
- Calculates the prediction value (180)
- Parses the Huffman code, which allows decoding of
the magnitude (0101) - Detects a leading zero gt knows the value must be
negative, so next four bits decoded as -10. - Reconstruction pP-R 180-(-10) 190
20Lossy CompressionGeneric structure of a video
codec
21Redundancy in Video Sequences
- Video compression targets 3 kinds of redundancy
- Spatial the correlation that exists between
(groups of) pixels - Temporal similarity between video frames
- Perceptual Human Visual System (HVS) is less
sensitive to high-frequency information. - Lossy compression throws information away as part
of these processes - Remaining information is encoded losslessly using
entropy coding
22Redundancy in Video Sequences
- Spatial redundancy
- Transform data to be encoded into a new
representation where data is less correlated - Leads to a more compact representation.
- Temporal redundancy
- Only encode difference between 2 video frames
(lower entropy) - Form prediction of frame to be encoded and encode
prediction residual - Perceptual redundancy
- Suppress/remove high frequency components
corresponding to fine image detail.
23Coding Modes
- INTRA
- Encode a frame completely independently (i.e.
with no reference to previous/future frames) - Forms random access point in bitstream, resets
encoding, limits error propagation - Equivalent to having a JPEG-encoded still image
at periodic intervals in bitstream.
Frame 0
24Coding Modes
- INTER
- Use a previous/future frame (termed reference
frame) as the basis for a prediction of the
current frame - Could just simply subtract reference frame from
current frame - Or use a more sophisticated prediction method
- Need to use reconstructed frame as basis for
prediction so that encoder/decoder stay
synchronised.
Frame 0
25Coding Unit
- Break image/frame up into 16 x 16 macro-blocks
- For YUV
- 4 8x8 luminance pixel blocks
- 2 8x8 chrominance pixel blocks.
- Coding decisions made on macro-block basis
- INTRA/INTER coding mode
- prediction method if INTER
- Loss introduced.
- Decisions flagged in bitstream syntax.
26Generic Codec Structure
27Discrete Cosine Transform (DCT)
- Why DCT?
- What is it?
- How does it work?
- How is it computed (in reality)?
- Adoption and variations
- What about the DWT?
- Quantisation
28Why DCT?
- Neighbouring pixels are likely to be similar
- The same is true for prediction residual data
- Want to exploit this spatial correlation
- We want a transform that
- Removes correlation from data
- Packs signal energy into as few coefficients as
possible - Coefficients suitable for entropy coding
29Why DCT?
- Optimal solution
- Use eigenvectors of the covariance matrix of the
input pixel data - Order based on size of eigenvalue
- Based on theory of principal component analysis
(PCA) - Referred to as the Karhunen-Loeve Transform (KLT)
rao90 - Achieves complete de-correlation
- Packs most energy into fewest coefficients
- Minimises MSE for a given number of coefficients
(Quantisation) - Minimises the entropy
- Disadvantages
- Very computationally demanding
- Transform kernel is data dependent
- Kernel must be sent to decoder also!
- Not practical in a real compression system
- Compromise ? The DCT
30What is the DCT?
- Treat frame as a grid of 8x8 pixel blocks
- Pixel data (intra block)
- Prediction Residual (inter block)
- Compute 8x8 2D DCT on each block
- Formula
- Basis functions derived
- using Fourier theory
31What is the DCT?
- Fouriers theorem and the Nyquist sampling
criterion mean only certain discrete frequencies
can be present in an 8x8 block of sampled data. - DCT coefficients tell us how much of a
particular frequency is present in a particular
block - Very crude explanation!
- Inverse DCT (IDCT) reverses this process
- Essentially Fourier synthesis
32How does the DCT work?
- DCT does not compress anything in isolation!
- This is achieved by quantiser and entropy coding
- DCT output easier to compress though
- Most natural video dominated by low frequencies
33How does the DCT work?
- Human eye less sensitive to high frequencies
- Use a quantiser whose step size depends on
frequency - Effectively discard perceptually unimportant data
- After quantisation there will be many zero valued
coeffs - Typically only 5 or 6 non-zero valued coeffs
xanthopoulos99 - Suitable for run length and entropy coding
34How does the DCT work?
- Zig-zag scan
- Keep statistically related coeffs together
- Better run-length coding
35How is the DCT Computed?
- Most implementations exploit the fact that the 2D
DCT is separable - Compute 1D DCT on each column
- Compute 1D DCT on each resultant row
- 16 x 1D 8-point DCTs in total
- Need efficient implementation of 1D 8-point DCT
- 30 years of research in this field
- Basic implementation (64 56)
- Fast implementation loeffler89 (11 29)
- Video codec optimised implementation AAN
arai89 (5 29) - Arithmetic precision a vital decision
- If constraint is 1920x1080 _at_ 30Hz
- 97200 8x8 blocks per second
- Need at least (17x106 45x106) per second using
Loeffler!
36How is the DCT Computed?
- Sometimes dedicated hardware needed
- Performance and/or power reasons
- Hardware architecture taxonomy
37Adoption and Variations
- 8x8 DCT
- Used in JPEG, H.261, H.263, MPEG-1, MPEG-2,
MPEG-4 with specific quality requirements - Shape Adaptive DCT
- Used in MPEG-4 Advanced Coding Efficiency (ACE)
profile - Kernel basis functions determined by object shape
- Integer DCT Approximation
- Used in H.264
- Block size of 4x4 and 8x8 depending on mode
- Avoids the IDCT mismatch problem
- Less computationally demanding (16bit integer
arith) - More features (can discuss later if necessary)
38What about the DWT
- Discrete Wavelet Transform (DWT)
- Used by JPEG-2000
- MPEG-4 uses SA-DWT (for static shape textures)
- Why? ? Better than Fourier analysis for
non-stationary data - Inherently scalable
- Involves successive LPF and HPF of data and
subsampling - More efficient at very low bit rates
- DCT and coarse Q ? Blocking artefacts
- DWT and coarse Q ? Blurring/smearing (much less
perceptible) - More computationally demanding than DCT
39What is Quantisation?
- A lossy process
- Get rid of information
- Gives compression gain
- Try to minimise distortion
- Try to reduce entropy
- Two primary types
- Scalar quantiser (one to one)
- Vector quantiser (many to one)
40Scalar Quantiser
- Need to find optimal values for
- Decision levels di
- Reconstruction levels ri
- Difficult in general!
41Scalar Quantiser
- Aim to mimimise distortion
- Minimise MSE ? Lloyd-Max quantiser
- A good quantiser design depends on probability
distribution of the input data - Want less error for more probable inputs
- Case 1 Uniform distribution
- Decision bands all same width
- Reconstruction levels equally spaced
- Referred to as a linear quantiser
- Used frequently for simplicity
42Scalar Quantiser
- Case 2 Piecewise constant distribution
- Used when of decision levels N is large
- Decision level solution difficult (Use numerical
methods for Lagrange multipliers) - Reconstruction levels
43Scalar Quantiser
- Case 3 Nonuniform distribution
- Need numerical methods for di and ri
- Tables available for standard distributions
(Gaussian, Laplacian, Rayleigh,) for popular N - This is a true Lloyd-Max quantiser (or optimum
mean square quantiser) - Case 4 Uniform quantiser
- Uniform refers to equal spacing between decision
levels regardless of distribution - Similar structure to Case 1 but different
performance because distribution not uniform - Commonly used (e.g in JPEG,)
44Scalar Quantiser Performance
- MSE correlates well with subjective degradation
- Dont rely on MSE minimisation in isolation
though - Need to consider overall rate-distortion
- Measures MSE as a function of number of bits n
- Constants a and b depend on distribution
- When designing a quantiser for each DCT
coefficient i need to know ni - 64 quantisers
- How to determine ni (number of bits per
coefficient)? - Depends on variance of coefficient i relative to
others and specified average bitrate nav - Bit allocation algorithm paradigm
45Bit allocation algorithms
- Try to keep constant
- As variance increases, distortion decreases by
using more bits - Optimal allocation for N coefficients
- Often a rate controller after entropy encoder
with feedback path to quantiser
46Scalar Quantiser Summary
- Uniform quantiser most commonly used
- In fact, rather than transmitting a quantised
coefficient, usually transmit the quantisation
index - This has much lower entropy
47Vector Quantiser
- Quantise blocks of samples together
- Each block assigned a single code
- A code book used to find code for block
- Code book can be dynamic or pre-defined
- Each pattern has specific encoding
- Can give very good performance
- Quite computationally expensive
- Difficult to design tables
- Used by GIF standard
48Demo
- Compression gain
- ?
- Perceptual quality
49Motion Estimation Compensation
- Exploiting temporal redundancy
- Motion Estimation
- Block matching algorithm overview
- Matching Criteria
- Selection of Search Strategies
- More advanced motion estimation techniques
- Software / Hardware Considerations
- Motion Compensation
- Adoption in standards discussed later
50Exploiting Temporal Redundancy
- Very slight change between successive frames (e.g
A B) - Camera Object Motion
- Temporal prediction model at encoder decoder
provides compression if - model parameters correction terms lt raw pixel
information
- e.g. Frame differencing (C)
- Entropy
- B 7.15 bits/pixels
- C 4.38 bits/pixels
- More complex models can reduce entropy further
- Computational expense, memory and prediction
performance trade off - Temporal Prediction model
- Motion estimation
- Motion compensation
51Taxonomy of Motion Estimation Algorithms
- Good Motion Estimation reviews
Mitchell96Furht97Kuhn99
52Block Matching Algorithm
- For each MxN block in the current frame, find the
associated best matching block within a
predetermined or adaptive S pel search range in
a reference frame(s) - Estimates motion of a group of pixels
- Assumes translational motion only
- Typically operates on luminance component only
- Good trade off between computationally complexity
prediction accuracy - Motion vector (relative offsets to the best
match) undergoes VLC - Prediction Residual undergoes further processing
(DCT, VLC, etc)
53Matching Criteria
- At each MxN block search position a matching
criteria evaluated - Wide variety of matching criteria
- Mean Squared Error
- Mean Absolute Differences
-
- Sum of Absolute Differences
- Reduced complexity matching criteria
- Binary Block Match
- Others
- Cross correlation
- SAD summation truncation
- SAD estimation
- Reduced Bit Mean Absolute Difference
- Minimised Maximum Error function
- Etc
- Matching criteria is a complexity/prediction
performance trade off
54Search Strategies (1/4)
- Many possible search strategies!
- Full Search search every position
- Best results, but very computationally expensive
- Operations required to generate 1 MV for 1
current block - (2S1)2 block matches
- For each pixel in a M N block match subtract,
absolute, accumulate - After each block match, minimum SAD comparison
- Therefore total operations
- (2S1)2 (M N 3 1), e.g. s8, 289 (M N
3 1) - Reduce computational expense
- Logarithmic reduces number of search positions
- Assumes matching criteria monotonically increases
moving away from minimum point iteratively
converge to minimum point - Possibility of getting stuck in local minimum
- Yields higher energy prediction residual
- Pseudocode for the Three Step Search
- 1 R 2(log2S-1)
- 2 Search positions within the search window
defined using R - 3 R R/2
- 4 if Rlt1 finished, else repeat go to 2.
55Search Strategies (2/4)
- Logarithmic searches contd.
- Three Step Search Koga81
- S 8, initial R4
- Search positions defined using R
- (x-R,y-R), (x,y-R), (xR,y-R) .(x,y),(xR,yR)
- Operations required to generate 1 MV
- (988) (M N 3 1)
- Variants
- 2-D logarithmic Jain81, Parallel 1-D Chen91,
CDS Rao83, N3SS Li94, 4SS Po96
- Hierarchical Search Strategies
- Search fewer positions use fewer pixels in the
matching criteria - Achieved via sub-sampling current reference
frames - Disadvantage increased memory
- Best match in lower resolution seeds search for
subsequent resolutions - Can help to avoid local minima due to low pass
filtering effect - Local minima still possible for small regions
which disappear during sub-sampling
56Search Strategies (3/4)
- 3 Level Hierarchical Search Example
- Level 1 Original
- Sub-sampled by factor of 2 generating level 2
- Level 1 sub-sampled by 4 generating level 3
- Motion Estimation starts at level 3
- block size N/4 X M/4
- Search window S/4
- FS or TSS employed within this window
- Produces motion vector (Vx3, Vy3)
- Motion Estimation level 2
- block size N/2 X M/2
- Centered on (x/22Vx3, y/22Vy3)
- Search window 1 around this point
- Produces motion vector (Vx2, Vy2)
- Motion Estimation level 1
- Centered on (x2Vx2, y2Vy2)
- Search window 1 around this point
- Produces final motion vector (Vx1, Vy1)
- Operations required to generate 1 MV using a FS
at level 3 - (2(S/4)1)2 (M/4 N/4 3 1) 9(M/2 N/4
3 1) 9(MN 3 1)
57Search Strategies (4/4)
- Scene adaptive search area
- Zone based search strategies
- Can employ stopping threshold in each zone
- Advantageous in a rate/distortion sense
- chan95Jung96Zhe97
- Spiral Search
- Dynamic search window size
- Many techniques used to adjust range
- Spatial correlation of MV Chain95In97
- Gradient based methods
- Block based gradient decent search Liu96
- Stops after 4 steps
- Diamond search Cote97
- Early stopping technique
- Skip to next block match when the minimum SAD has
been exceeded - Successive elimination algorithm Li95
- Conservative block SAD Do98
58Different Search Strategy Performance
- Frame Differencing
- 0 Motion Vector
- Entropy 4.38 bits/pixel
- 1 operation/pixel (subtraction)
- Full Search
- Block size 16x16
- Search range 8
- Entropy 2.61 bits/pixel
- 868 operations/pixel
- Hierarchical Search
- Block size 4x4, 8x8, 16x16
- Search window 2,4, 8,
- Entropy 3.08 bits/pixel
- 39 operations/pixel
- Hierarchical Search
- Block size 4x4, 16x16, 32x32
- Search window 2, 4, 8
- Entropy 2.91 bits/pixel
- 35 operations/pixel
59More advanced techniques (1/2)
- Bi-directional (Forward and Reverse) Prediction
- Termed B-frames
- Not feasible for real-time systems
- Multiple Reference Frames
- Improves prediction
- Increases computational expense memory
requirements - Unrestricted Motion Vectors
- Allow block matches outside the reference frame
- Pixel padding used to extend beyond frame
boundaries - Predictive Motion Vectors
- Rather than start at collocated block use a MV
predictor - Temporal and/or Spatial prediction
Lee97Kos97Zheng97 - Can improve prediction residual quality
- Can employ thresholds to gate-off motion
estimation - H/W Reduces pixel reusability between current
block positions
- Global Motion Compensation
- Default motion for the frame/object
60More advanced techniques (2/2)
- Sub-pel Motion Estimation
- Real motion is not constrained by integer pixel
amounts - Half-pel quarter pel frequently used
- But memory increases
- H.264
- 6-tap FIR filter for ½ pel
- Bilinear for ¼ pel
- Variable Block Size Motion
- Smaller block size will lead to smaller residual
- But number of motion vectors signalling info
increases - 41 MV per 16x16 block in H.264
- MPEG-4 H.263 Advanced Prediction Motion
Estimation (4MV) - H.264
- Dynamically adapts between multiple block sizes
(16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4) - Rate/Distortion Optimised
- Motion Vector Coding Prediction
- Adding MVs to bitstream can be costly,
particularly if block size is small - DPCM used to exploit spatial MV redundancies
61ME Software/Hardware considerations
- Software algorithmic complexity (simplified
analysis) - To support 1920x1280 9600 x 30 288K 16x16
blocks/sec - 8 Search Window 289 Block matches per current
block - Total block matches 289 288K 83,232,000
matches/sec - Operations 83,232,000 (256 pixels31) 6.4
GOPS - Hardware implementations can be attractive
- Systolic Array (1D/2D) approaches typically
employed - Memory bandwidth efficient high throughput
- Full Search commonly used
- Architectures also available for heuristic search
strategies - Architectures for H.264 Variable Block Size
emerging - Ball park figures for H.264 VBSME core
- 1-D 16 PE SA
- Area 40-60K gates Memory Bandwidth 3 pixels
per clock cycle - 1 16x16 block match every 4096 clock cycles (8
search range) - 2-D 256 PE SA
- Area 100-200K gates Memory Bandwidth 48
pixels per clock cycle - 1 16x16 block match every 256 clock cycles (8
search range) - To support 1920x1280 9600 x 30 288K 16x16
blocks/sec
62Motion Compensation
- Straightforward relative to motion estimation
- Reconstructed MB Residual Mot. Comp. MB
(pointed to by MVs) - Copy block of pixels from displaced block in the
reference frame into the current frame - Reference frame must be stored in decoder
- For encoder and decoder to remain synchronised
- Encoder also needs to do motion compensation
- Considerations
- Additional frame memory at the decoder
- Low computational requirements
63Lossy CompressionStandards
64Standards Evolution
65JPEG
- Flexible image coding standard
- 4 Modes of operation
- Lossless encoding (earlier)
- Baseline sequential encoding
- Progressive encoding
- Hierarchical encoding (towards JPEG-2000)
- Motion JPEG
- Baseline encoding of each frame
- No motion estimation
- Not properly standardised
66JPEG-2000
- JPEG not optimised for a wide range of apps
- JPEG-2000 even more flexible
- Interesting features
- Uses DWT instead of DCT
- Region of Interest (ROI) coding
- Scalability
- Spatial scalability
- SNR scalability
- More resilient to channel errors
- Individual quality packets independently decoded
- Also supports lossless coding
- Added flexibility comes at computational cost
67JPEG/JPEG-2000 Summary
- JPEG capable of average compression of 151 for
subjectively transparent quality - JPEG-2000 better compression _at_ fixed rate
- For Foreman
- Gain of 1.5?4 dB for range of 1.2?0.12 bpp
- Applications
- Internet
- Digital photography
- Many more
68ITU-T H.261
- ITU-T narrow bandwidth real-time apps
- H.261 (p x 64)Kb/s over ISDN (1p30)
- CIF and QCIF resolution
- Real time video telephony/conferencing
- Up to 3 frames interpolated by decoder
- Supports framerates of 30Hz, 15Hz, 10Hz, 7.5Hz
- Video compression tools
- 8x8 DCT
- Uniform scalar quantiser (rate control optional)
- Entropy coder is modified run length and Huffman
- Motion Estimation
- Only forward direction
- Search window limited to 15
- Integer pixel accuracy only
- Motion Compensation is optional
- Loop filter (alleviate blocking)
69ISO/IEC MPEG-1
- Storage of AV content for delivery at 1.5Mb/s
- Flexible
- Resolutions typically 768x586
- Framerate typically 30Hz
- H.261 was starting point for the standard
- Compression gain at expense of latency
- Specific features
- Standard VLCs determined by Huffman coding
- DCT DC coeffs are differentially predicted
- Bi-directional prediction (I,P,B frames)
- Motion compensation with half-pixel accuracy
- Maximum MV range of (-512,511.5) for half pixel
and (-1024,1023) for integer pixel - Weighted quantisation (H.261 does not have this)
- Random access to bitstream, FF, FR
70ISO/IEC MPEG-1
71ISO/IEC MPEG-2
- High quality video _at_ 4-15Mb/s
- VOD, Broadcast TV, DVD, HDTV, Satellite TV
- Major differences w.r.t. MPEG-1
- More resolutions, framerates, qualities and
bitrates - SIF (352x288_at_25Hz) ? HDTV (1920x1250_at_60Hz)
- Profiles and levels
- Has interlaced/progressive option
- Frame/Field based ME, MC and DCT
- Scalability (temporal, spatial, SNR)
- Minor differences
- More bits for quantisation
- Alternate scan (as well as zigzag)
72ITU-T H.263
- Very low bitrate apps (lt 64kb/s)
- Video telephony over PSTN, mobile telephony
- Recommended resolutions subQCIF, QCIF, CIF,
4CIF, 16CIF - Non-interlaced _at_ 29.97Hz
- Similar to H.261
- Extensions (Some optional in Annex but included
in H.264) - MVs differentially encoded
- Half-pixel accurate motion estimation
- Extensions support quarter and one eighth
- Unrestricted motion vector mode
- MVs can point outside image, edge pixels form
prediction - Advanced prediction mode
- MB can have 4 MVs associated with it
- Syntax-based arithmetic encoding (SAC)
- Optional mode to replace VLCs with arithmetic
encoding - PB frames
- Error resilience
- Synchronisation markers
- Reversible VLCs
73ISO/IEC MPEG-4
- An all encompassing standard!
- Improved compression at 5kb/s ? 1Gb/s
- Resolutions of sub-QCIF to studio
- Content-based interactivity (semantic objects)
- Universal access (scalability, error resilience)
- Synthetic and natural hybrid coding (SNHC)
74ISO/IEC MPEG-4
75ISO/IEC MPEG-4
- Video coding tools
- Integer, half and quarter pixel ME
- Boundary MB ME padding or polygon matching
- Global ME
- Shape Adaptive DCT
- AC/DC intra prediction
- Enhanced scalability FGS
- Still texture coding (uses SA-DWT)
- Shape Coding tools
- Context-based arithmetic encoding (CAE)
- Compute context
- Index into LUT for probability of 0,1
- Drive arithmetic encoder
76ITU-T H.264 or ISO/IEC MPEG-4 Part 10 (AVC)
- Targets enhanced compression for wide range of
apps - Improved prediction
- Variable block-size MC with small block sizes
- Up to quarter-pixel MC
- Unrestricted motion vector mode
- Multiple reference picture MC
- Weighted prediction (generalised B-pictures)
- Directional intra prediction (9 4x4 modes, 1
16x16 mode) - In the loop adaptive deblocking filter
- Improved coding efficiency tools
- Small block size transform
- Hierarchical block transform
- Short word length transform (16 bit integer
arith) - Exact match inverse transform
- CAVLC, CABAC
- Enhanced error robustness and network
friendliness
77ITU-T H.264 or ISO/IEC MPEG-4 Part 10 (AVC)
78ITU-T H.264 or ISO/IEC MPEG-4 Part 10 (AVC)
- H.264 Version 1 has 3 profiles
- Baseline
- Main
- Extended
- Fidelity Range Extension (FRExt) Amendment
- High Profile
- High 10 Profile
- High 422 Profile
- High 444 Profile
- Up to 12 bits per sample
- Supports lossless region coding
- Codes RGB to avoid colour space transformation
error
79Comparing Standards
- Video conferencing applications
- Low latency real-time requirement
- H.264/AVC MP would improve by further 10-20
- Using low delay bi-prediction, CABAC
80Comparing Standards
- Video streaming applications
- Less of delay constraint
81Comparing Standards
- Entertainment-quality applications
- High resolution, delay tolerable
82Comparing Standards
- Professional motion picture production
- Random access to individual frames
- Up to HDTV, H.264/AVC MP comparable or better
than Motion-JPEG2000
83Comparing Standards
- PSNR while good does not take into account
intricacies of the human eye - Need subjective video tests
- Other metrics
- MPQM,
- Experiments show that H.264 gives lowest bitrate
for subjectively equivalent video over a range of
apps - Improved performance comes at the cost of
computational complexity - Main bottleneck is ME (very memory intensive)
84Image AnalysisVisual Feature Extraction
85Visual Features - Still Images
- What features are important?
- Colour
- Texture
- The feel, appearance, consistency of a surface
- In an image
- Distribution over the entire image?
- Of specific parts of the image?
No texture
Highly textured
86Visual Features - Colour
- Colour is visually important to humans
- Colour features and similarity metrics easy to
compute - Histogram Swain and Ballard, 1992
- Most commonly used structure to represent global
image features. - Invariant to translation and rotation and can be
made invariant to scale by normalisation - MPEG-7 Scalable Colour Description
- H(16 levels) S(4 levels) V(4 Levels) histogram
encoded with a Haar transform for efficiency
scaling
87Visual Features - Texture
- Simple texture descriptors Pratt, 1991
- Autocorrelation function
- Co-occurrence matrices
- Edge frequency
- Primitive length
- More sophisticated (based on transforms and/or
filtering) - Wavelet Mallat, 1990, Haar Theodoridis, 1999,
Gabor Bovis, 1990 - Others
- Mathematical morphology
- Fractals
88Visual Features - Texture
- Example MPEG-7 Edge Histogram
- Represents the global (and possibly local - Won,
2002) spatial distribution of edges - Need to first generate edge map
- Roberts, Sobel and Prewitt, Canny,
- Build histogram based on 5 edge types
89Change Detection
- Compare 2 temporally adjacent images and
determine how different they are - Why?
- Surveillance-type applications
- Assume static camera background
- Anything changing between one object and next
must be an object! - In fact, this is naïve but starting point of many
object segmentation techniques - Temporal video structuring
- Breaking video up into chunks for non-linear
browsing shots, scenes, events, story-lines
90Temporal Video Structuring
a video document
A set of keyframes
Keyframe-based video browsers
91(No Transcript)
92Temporal Video Structuring
- Shot boundary detection
- A shot is a continuous piece of video taken with
one camera - A shot cut is the abrupt or gradual transition
between two shots - Uncompressed domain
- Calculate colour histogram for each frame
- Calculate difference between histograms using
suitable metric L1 (city-block), L2 (Euclidean),
Mahanoblis, etc - Threshold
- Compressed domain
- Parse features directly from bitstream
- E.g. use DCT coefficients for each frame to
reconstruct approximation of image - E.g. motion vectors for each pair of frame and
detect changes in global statistics