YaoChung Lin presentation

About This Presentation

Transcript and Presenter's Notes

Title: YaoChung Lin

1
Introduction to H.264/SVC Differences,
Possibilities, and Limits

Yao-Chung Lin
Image, Video, and Multimedia Systems Group
Information Systems Laboratory
May 10 2006

2
Scalable Video Coding

A research topic over 20 years
Single bitstream serves diversified clients
Display resolutions (QCIF, CIF, , HDTV)
Frame rates (15Hz, 30Hz, )
Bit rates/Qualities
Developing Standard
October 2003, MPEG Call for Proposal
March 2004, 14 proposals submitted and evaluated
12 proposals are wavelet-based
2 proposals are extension of H.264/AVC
October 2004, MPEG selected HHI proposal as
starting point for H.264/ MPEG-4 AVC Amd.1
2007, final draft will be released

3
Current Draft

Based on H.264 main profile
MCTF/Hierarchical B-picture (MCTF w/o update
step) for temporal scalability
Layered pyramid prediction structure for spatial
scalability
Layered, sub bit-plane, and (run,level) coding
for SNR scalability

4
H.264 Profiles
5
Overall Architecture of SVC
A two layer example
6
Outline

Introduction
Scalabilities
Temporal Scalability
Spatial Scalability
SNR (Quality) Scalability
Other Details
Simulation Results
Conclusion Discussion

7
Temporal Scalability

Group of picture (GOP)
Concepts of motion compensated temporal filtering
(MCTF)
Hierarchical B-picture

8
Group of Picture

Instantaneous decoding refreshment (IDR) pictures
Intra coded picture
Also a key picture
A GOP with only one picture
Provide random access ability
Key pictures
The last picture in a GOP
Intra coded
Inter coded by previous key picture
Provide lowest temporal resolution
Non-key picture
Hierarchically predicted B pictures
High pass signal of MCTF
Provide various temporal resolutions
Note Reference frame number can not be greater
than 16

9
Group of Pictures
An example of a group of picture Dyadic, 4
temporal levels
ITU 2006 January, R202
10
Concepts of MCTF

Based on lifting scheme
Insures perfect reconstruction
Even if non-linear operations are used
Open loop
Non-recursive temporal decomposition
Prevent drift error
Improves efficient scalable coding, especially
with FGS

11
Lifting Scheme
r reference index m motion vector
Similar to P-picture
Similar to B-picture
ITU, 2006 January, R202
12
Motion Modes

Variable block-size inter modes from 16x16 to 4x4
Intra modes 16x16, 8x8, 4x4
Direct mode 16x16, 8x8

13
Decomposition Structure
HHI Webpage Scalable Extension of H.264/AVC
14
Decomposition Structure

A dyadic decomposition structure for 2N-1 frames
delay, where N temporal decomposition level
Update steps do not cross the GOP border

HHI Webpage Scalable Extension of H.264/AVC
15
Low Delay Support
ITU, 2006 January, R202
16
Removal of update step

Introduce high complexity to decoder
Derivation of the motion information for update
step
Smaller block sizes
9-bit residual motion compensation
Provide insignificant coding efficiency than that
of closed-loop coding with hierarchical B picture
(HB)
Rate-distortion performance of closed-loop coding
with HB is higher or similar to that of
MCTF-based coding for all test sequences
Except City sequence which has 0.5 dB gain
After temporal pre-filtering the sequence, the
MCTF gain becomes insignificant

ITU, 2005 July, P059
17
Two Closed Loops
FGS Layer
ITU, 2005 July, P059
18
Spatial Scalability

Layered pyramid prediction structure
Inter-layer intra texture prediction
Inter-layer motion prediction
Inter-layer residual prediction
Extended Spatial Scalability
Cropping
Generic upsampling (non-dyadic spatial resampling)

19
Layered Pyramid Prediction Structure

Same concepts used in H.262/MPEG-2, H.263, MPEG-4
with additional inter-layer prediction
Each spatial resolution is coded as a new layer
with texture and motion refinement
Same mechanism for coarse grain SNR scalability
(Spatial downsampling ratio1)

20
Inheritance of modes
Previous Spatial Layer
Current Layer
For spatial scaling ratio 2
21
Inter-layer Intra Texture Prediction

Unrestricted inter-layer intra texture prediction
Decode and predict from all lower layer in the
bitstream
Not supported in the standard
Constrained inter-layer intra texture prediction
For MBs in non-key pictures
The co-located block in the previous layer are
intra coded
Not supported in the standard
Constrained inter-layer intra texture prediction
for single-loop decoding
For MBs in all pictures (including key pictures)
The co-located block in the previous layer are
intra coded
Allow decoding (motion compensation) only current
layer
Supported by the current SVC draft

22
Generation of Inter-layer Texture Prediction

Directly de-block filtering
4-sample border extension
Interpolation
2x Half-pel interpolation filter of AVC
Otherwise quarter-pel interpolation filter

Schwarz, ICIP 2005
23
Inter-layer Motion Prediction

Intra base layer
If previous layer is inter, use scaled
partitioning and motion vectors of base layer
If previous layer is intra, predict from previous
layer
Quarter pel refinement
Only for reduced spatial resolution
Refine the scaled motion vector of previous layer
by 1, 0, and -1 in quarter-sample precision
Send the refinement
None
Motion vector prediction from neighbor blocks
Motion vector prediction from previous layer

24
Inter-layer Residual Prediction

Predict the residual from previous layer residual
Upsample the residual
2x separable bi-linear filter 1,1/2
Otherwise quarter-pel interpolation
Helpful while the motion information is unchanged
or slightly changed from previous layer

25
SNR Scalability

Coarse grain scalability (CGS)
Layered coding
The same mechanism as spatial scalability
Re-quantize the coefficients with finer step
Fine grain scalability
Sub-bitplane arithmetic coding
Re-quantize the coefficients with finer step
Provide a continuous refinement from a quality
base layer

26
Coarse Grain Scalability

Same mechanism as spatial scalability
Except no upsampling
Provide discrete quality refinement
Close to single layer RD performance, if dQP gt 6

27
Fine Grain SNR Scalability

Represent the residual between the original
prediction error and base layer representation
Quantized to a bisection step size (dQP6)
Coded in transform domain for single inverse
transform at decoder
Adaptive references for FGS (AR-FGS) provide
leaky prediction attenuating drift error

28
Illustration of AR-FGS
Zero Coef. Block
ITU, 2006 Jan. R202
29
Outline

Introduction
Scalabilities
Temporal Scalability
Spatial Scalability
SNR (Quality) Scalability
Other details
Simulation Results
Discussion

30
Other Details

Fidelity resolution extension (FRExt)
Support 8x8 Transform (High Profile)
Increase coding efficiency especially for
high-resolution source
Motion search block segment size down to 8x8 only
Weighted prediction
Scale the reference pictures for prediction
Find the weights at encoder
Explicitly send in syntax
Implicitly derive from temporal distance (an
option for B-picture)

31
Other Details

FGS motion
Progressive refinement slice (FGS slice) contains
motion data
Provide better prediction
Adaptive GOP Structure (AGS)
Divide a GOP into several sub GOPs by appropriate
mode decision
Decreasing the distance between two low-pass
pictures
0.62 dB gain
Detail in ITU O018
Loss Aware rate distortion optimization
The mode/parameter decision consider the packet
loss
Detail in ITU P057

32
JSVM

Written in C
Accessing from CVS
Current version 5.2
Last Update May 2, 2006

33
Simulation Results

Temporal Scalability
GOP sizes ITU, 2005 July, P014
Open loop MCTF vs. closed loop HB ITU, 2005
July, P059
Spatial
Given the same base layer
Exam the inter-layer prediction
SNR
CGS, DQP 2 or 6
FGS
Key pictures predict from base representation
FGS motion optimized at 1/3 bit rate
Open loop MCTF helpful ? ITU, P059

34
GOP Sizes
35
GOP Sizes
36
Open Loop vs. Closed Loop
37
Open Loop vs. Close Loop
38
Summary of Temporal Scalability Features

Hierarchical B pictures
B pictures gives 0.51 dB (IPP -gt IBBPBBP)
Hierarchical prediction gives additional 0.5 1
dB
MCTF
Only CITY has 0.5 dB gain compared to
closed-loop HB
The gain is diminished by encoder MCTF
pre-filtering
Improvement comes from hierarchical prediction
structure

39
Simulation Results

Temporal Scalability
GOP sizes ITU, 2005 July, P014
Open loop MCTF vs. closed loop HB ITU, 2005
July, P059
Spatial
Given the same base layer, exam the inter-layer
prediction
Multiple-loop decoding vs. single-loop decoding
(constrained inter-layer prediction) ITU, O074
SNR
CGS, DQP 2 or 6
FGS
Key pictures predict from base representation
FGS motion optimized at 1/3 bit rate

40
Spatial Scalability
Schwarz, Marpe, and Wiegand, IWSSIP 05
41
Spatial Scalability
Schwarz, Marpe, and Wiegand, IWSSIP 05
42
Constrained Inter-Layer Prediction
CIF_at_30
CIF_at_15
QCIF_at_15
QCIF_at_7.5
CIF_at_15
Foreman, Munich test points
43
Constrained Inter-Layer Prediction
4CIF_at_60
CIF_at_30
QCIF_at_15
4CIF_at_30
CIF_at_30
QCIF_at_15
Crew, Munich test points
44
Summary of Inter-layer prediction tools

Inter-layer predictions bring 2dB gain
Intra prediction 1dB
Motion prediction 0.51dB
Residual prediction 0.5dB
Constrained inter-layer intra prediction for
single layer decoding
Provide low complexity decoding
Pay lt 0.5 dB loss

45
Simulation Results

Temporal Scalability
GOP sizes ITU, 2005 July, P014
Open loop MCTF vs. closed loop HB ITU, 2005
July, P059
Spatial
Given the same base layer, exam the inter-layer
prediction
Multiple-loop decoding vs. single-loop decoding
(constrained inter-layer prediction) ITU, O074
SNR
CGS, DQP 2 or 6
FGS
Key pictures predict from base representation
FGS motion optimized at 1/3 bit rate
Open loop MCTF helpful ? ITU, P059

46
SNR Scalability
Schwarz, Marpe, and Wiegand, IWSSIP 05
47
SNR Scalability
Schwarz, Marpe, and Wiegand, IWSSIP 05
48
SNR Scalability
49
SNR Scalability
50
SNR Scalability
51
SNR Scalability

SNR scalability gives rate adaptation with 1dB
quality loss (30 rate loss)
CGS with dQP 6 has least loss in
rate-distortion performance
FGS with appropriate choice reference quality
gives near CGS performance

52
Conclusion and Discussion

Differences from H.264/AVC
Layered pyramid prediction coding structure
Inter-layer prediction
Progressive quality refinement (FGS)
Possibilities for low complexity encoding
Use previous layer motion information for ME
Develop prediction of motion vector candidates
for hierarchical prediction structure
Utilize Philips H264 encoder at TriMedia Platform
Limits
Encoding needs multiple loops
Picture buffer size increases due to hierarchical
prediction
SVC is still under developing

53
Reference

Julien Reichel, Heiko Schwarz, and Mathias Wien,
Joint Scalable Video Model JSVM-5, (R202) ITU-T
VCEG 18th meeting, January 2006
http//ip.hhi.de/imagecom_G1/savce/index.htm
Heiko Schwarz, Detlev Marpe, and Thomas Wiegand,
Comparison of MCTF and closed-loop hierarchical
B pictures, (P059) ITU-T VCEG 16th Meeting, July
2005
Heiko Schwarz, Tobias Hinz, Detlev Marpe, and
Thomas Wiegand, Constrained Inter-Layer
Prediction for Single-Loop Decoding in Spatial
Scalability, ICIP 2005
Gwang Hoon Park, Min Woo Park, Seyoon Jeong,
Kyuheon Kim, Jinwoo Hong, Improve SVC Coding
Efficiency by Adaptive GOP Structure (SVC CE2),
(O018) ITU-T VCEG 15th Meeting, April 2005
Yiliang Bao, Marta Karczewicz, Implementation of
close-loop coding in JSVM, (P057) ITU-T VCEG
16th Meeting, July 2005
Heiko Schwarz, Detlev Marpe, and Thomas Wiegand,
Hierarchical B pictures, (P014) ITU-T VCEG 16th
Meeting, July 2005
H. Schwarz, D. Marpe, T. Wiegand, Basic Concepts
for Supporting Spatial and SNR Scalibility in the
Scalable H.264/MPEG-AVC Extension, IWSSIP 05
Heiko Schwarz, Detlev Marpe, and Thomas Wiegand,
Further results on constrained inter-layer
prediction, (O074) ITU-T VCEG 15th Meeting,
April 2005

Write a Comment

User Comments (0)

About PowerShow.com

YaoChung Lin PowerPoint PPT Presentation