Title: YaoChung Lin
 1Introduction to H.264/SVC Differences, 
Possibilities, and Limits 
- Yao-Chung Lin 
- Image, Video, and Multimedia Systems Group 
- Information Systems Laboratory 
- May 10 2006
2Scalable Video Coding
- A research topic over 20 years 
- Single bitstream serves diversified clients 
- Display resolutions (QCIF, CIF, , HDTV) 
- Frame rates (15Hz, 30Hz, ) 
- Bit rates/Qualities 
- Developing Standard 
- October 2003, MPEG Call for Proposal 
- March 2004, 14 proposals submitted and evaluated 
- 12 proposals are wavelet-based 
- 2 proposals are extension of H.264/AVC 
- October 2004, MPEG selected HHI proposal as 
 starting point for H.264/ MPEG-4 AVC Amd.1
- 2007, final draft will be released
3Current Draft
- Based on H.264 main profile 
- MCTF/Hierarchical B-picture (MCTF w/o update 
 step) for temporal scalability
- Layered pyramid prediction structure for spatial 
 scalability
- Layered, sub bit-plane, and (run,level) coding 
 for SNR scalability
4H.264 Profiles 
 5Overall Architecture of SVC
A two layer example 
 6Outline
- Introduction 
- Scalabilities 
- Temporal Scalability 
- Spatial Scalability 
- SNR (Quality) Scalability 
- Other Details 
- Simulation Results 
- Conclusion  Discussion
7Temporal Scalability
- Group of picture (GOP) 
- Concepts of motion compensated temporal filtering 
 (MCTF)
- Hierarchical B-picture
8Group of Picture
- Instantaneous decoding refreshment (IDR) pictures 
- Intra coded picture 
- Also a key picture 
- A GOP with only one picture 
- Provide random access ability 
- Key pictures 
- The last picture in a GOP 
- Intra coded 
- Inter coded by previous key picture 
- Provide lowest temporal resolution 
- Non-key picture 
- Hierarchically predicted B pictures 
- High pass signal of MCTF 
- Provide various temporal resolutions 
- Note Reference frame number can not be greater 
 than 16
9Group of Pictures
An example of a group of picture Dyadic, 4 
temporal levels
ITU 2006 January, R202 
 10Concepts of MCTF
- Based on lifting scheme 
- Insures perfect reconstruction 
- Even if non-linear operations are used 
- Open loop 
- Non-recursive temporal decomposition 
- Prevent drift error 
- Improves efficient scalable coding, especially 
 with FGS
11Lifting Scheme
r reference index m motion vector
Similar to P-picture
Similar to B-picture
ITU, 2006 January, R202 
 12Motion Modes
- Variable block-size inter modes from 16x16 to 4x4 
- Intra modes 16x16, 8x8, 4x4 
- Direct mode 16x16, 8x8 
13Decomposition Structure
HHI Webpage Scalable Extension of H.264/AVC 
 14Decomposition Structure
- A dyadic decomposition structure for 2N-1 frames 
 delay, where N  temporal decomposition level
- Update steps do not cross the GOP border
HHI Webpage Scalable Extension of H.264/AVC 
 15Low Delay Support
ITU, 2006 January, R202 
 16Removal of update step
- Introduce high complexity to decoder 
- Derivation of the motion information for update 
 step
- Smaller block sizes 
- 9-bit residual motion compensation 
- Provide insignificant coding efficiency than that 
 of closed-loop coding with hierarchical B picture
 (HB)
- Rate-distortion performance of closed-loop coding 
 with HB is higher or similar to that of
 MCTF-based coding for all test sequences
- Except City sequence which has 0.5 dB gain 
- After temporal pre-filtering the sequence, the 
 MCTF gain becomes insignificant
ITU, 2005 July, P059 
 17Two Closed Loops
FGS Layer
ITU, 2005 July, P059 
 18Spatial Scalability
- Layered pyramid prediction structure 
- Inter-layer intra texture prediction 
- Inter-layer motion prediction 
- Inter-layer residual prediction 
- Extended Spatial Scalability 
- Cropping 
- Generic upsampling (non-dyadic spatial resampling)
19Layered Pyramid Prediction Structure 
- Same concepts used in H.262/MPEG-2, H.263, MPEG-4 
 with additional inter-layer prediction
- Each spatial resolution is coded as a new layer 
 with texture and motion refinement
- Same mechanism for coarse grain SNR scalability 
 (Spatial downsampling ratio1)
20Inheritance of modes
Previous Spatial Layer
Current Layer
For spatial scaling ratio  2 
 21Inter-layer Intra Texture Prediction
- Unrestricted inter-layer intra texture prediction 
- Decode and predict from all lower layer in the 
 bitstream
- Not supported in the standard 
- Constrained inter-layer intra texture prediction 
- For MBs in non-key pictures 
- The co-located block in the previous layer are 
 intra coded
- Not supported in the standard 
- Constrained inter-layer intra texture prediction 
 for single-loop decoding
- For MBs in all pictures (including key pictures) 
- The co-located block in the previous layer are 
 intra coded
- Allow decoding (motion compensation) only current 
 layer
- Supported by the current SVC draft
22Generation of Inter-layer Texture Prediction
- Directly de-block filtering 
- 4-sample border extension 
- Interpolation 
- 2x Half-pel interpolation filter of AVC 
- Otherwise quarter-pel interpolation filter
Schwarz, ICIP 2005 
 23Inter-layer Motion Prediction
- Intra base layer 
- If previous layer is inter, use scaled 
 partitioning and motion vectors of base layer
- If previous layer is intra, predict from previous 
 layer
- Quarter pel refinement 
- Only for reduced spatial resolution 
- Refine the scaled motion vector of previous layer 
 by 1, 0, and -1 in quarter-sample precision
- Send the refinement 
- None 
- Motion vector prediction from neighbor blocks 
- Motion vector prediction from previous layer
24Inter-layer Residual Prediction
- Predict the residual from previous layer residual 
- Upsample the residual 
- 2x separable bi-linear filter 1,1/2 
- Otherwise quarter-pel interpolation 
- Helpful while the motion information is unchanged 
 or slightly changed from previous layer
25SNR Scalability
- Coarse grain scalability (CGS) 
- Layered coding 
- The same mechanism as spatial scalability 
- Re-quantize the coefficients with finer step 
- Fine grain scalability 
- Sub-bitplane arithmetic coding 
- Re-quantize the coefficients with finer step 
- Provide a continuous refinement from a quality 
 base layer
26Coarse Grain Scalability
- Same mechanism as spatial scalability 
- Except no upsampling 
- Provide discrete quality refinement 
- Close to single layer RD performance, if dQP gt 6 
27Fine Grain SNR Scalability
- Represent the residual between the original 
 prediction error and base layer representation
- Quantized to a bisection step size (dQP6) 
- Coded in transform domain for single inverse 
 transform at decoder
- Adaptive references for FGS (AR-FGS) provide 
 leaky prediction attenuating drift error
28Illustration of AR-FGS
Zero Coef. Block
ITU, 2006 Jan. R202 
 29Outline
- Introduction 
- Scalabilities 
- Temporal Scalability 
- Spatial Scalability 
- SNR (Quality) Scalability 
- Other details 
- Simulation Results 
- Discussion
30Other Details
- Fidelity resolution extension (FRExt) 
- Support 8x8 Transform (High Profile) 
- Increase coding efficiency especially for 
 high-resolution source
- Motion search block segment size down to 8x8 only 
- Weighted prediction 
- Scale the reference pictures for prediction 
- Find the weights at encoder 
- Explicitly send in syntax 
- Implicitly derive from temporal distance (an 
 option for B-picture)
31Other Details
- FGS motion 
- Progressive refinement slice (FGS slice) contains 
 motion data
- Provide better prediction 
- Adaptive GOP Structure (AGS) 
- Divide a GOP into several sub GOPs by appropriate 
 mode decision
- Decreasing the distance between two low-pass 
 pictures
- 0.62 dB gain 
- Detail in ITU O018 
- Loss Aware rate distortion optimization 
- The mode/parameter decision consider the packet 
 loss
- Detail in ITU P057 
32JSVM
- Written in C 
- Accessing from CVS 
- Current version 5.2 
- Last Update May 2, 2006 
33Simulation Results
- Temporal Scalability 
- GOP sizes ITU, 2005 July, P014 
- Open loop MCTF vs. closed loop HB ITU, 2005 
 July, P059
- Spatial 
- Given the same base layer 
- Exam the inter-layer prediction 
- SNR 
- CGS, DQP  2 or 6 
- FGS 
- Key pictures predict from base representation 
- FGS motion optimized at 1/3 bit rate 
- Open loop MCTF helpful ? ITU, P059
34GOP Sizes 
 35GOP Sizes 
 36Open Loop vs. Closed Loop 
 37Open Loop vs. Close Loop 
 38Summary of Temporal Scalability Features
- Hierarchical B pictures 
- B pictures gives 0.51 dB (IPP -gt IBBPBBP) 
- Hierarchical prediction gives additional 0.5  1 
 dB
- MCTF 
- Only CITY has 0.5 dB gain compared to 
 closed-loop HB
- The gain is diminished by encoder MCTF 
 pre-filtering
- Improvement comes from hierarchical prediction 
 structure
39Simulation Results
- Temporal Scalability 
- GOP sizes ITU, 2005 July, P014 
- Open loop MCTF vs. closed loop HB ITU, 2005 
 July, P059
- Spatial 
- Given the same base layer, exam the inter-layer 
 prediction
- Multiple-loop decoding vs. single-loop decoding 
 (constrained inter-layer prediction) ITU, O074
- SNR 
- CGS, DQP  2 or 6 
- FGS 
- Key pictures predict from base representation 
- FGS motion optimized at 1/3 bit rate
40Spatial Scalability
Schwarz, Marpe, and Wiegand, IWSSIP 05 
 41Spatial Scalability
Schwarz, Marpe, and Wiegand, IWSSIP 05 
 42Constrained Inter-Layer Prediction 
CIF_at_30
CIF_at_15
QCIF_at_15
QCIF_at_7.5
CIF_at_15
Foreman, Munich test points 
 43Constrained Inter-Layer Prediction
4CIF_at_60
CIF_at_30
QCIF_at_15
4CIF_at_30
CIF_at_30
QCIF_at_15
Crew, Munich test points 
 44Summary of Inter-layer prediction tools
- Inter-layer predictions bring 2dB gain 
- Intra prediction 1dB 
- Motion prediction 0.51dB 
- Residual prediction 0.5dB 
- Constrained inter-layer intra prediction for 
 single layer decoding
- Provide low complexity decoding 
- Pay lt 0.5 dB loss
45Simulation Results
- Temporal Scalability 
- GOP sizes ITU, 2005 July, P014 
- Open loop MCTF vs. closed loop HB ITU, 2005 
 July, P059
- Spatial 
- Given the same base layer, exam the inter-layer 
 prediction
- Multiple-loop decoding vs. single-loop decoding 
 (constrained inter-layer prediction) ITU, O074
- SNR 
- CGS, DQP  2 or 6 
- FGS 
- Key pictures predict from base representation 
- FGS motion optimized at 1/3 bit rate 
- Open loop MCTF helpful ? ITU, P059 
46SNR Scalability
Schwarz, Marpe, and Wiegand, IWSSIP 05 
 47SNR Scalability
Schwarz, Marpe, and Wiegand, IWSSIP 05 
 48SNR Scalability 
 49SNR Scalability 
 50SNR Scalability 
 51SNR Scalability
- SNR scalability gives rate adaptation with 1dB 
 quality loss (30 rate loss)
- CGS with dQP  6 has least loss in 
 rate-distortion performance
- FGS with appropriate choice reference quality 
 gives near CGS performance
52Conclusion and Discussion
- Differences from H.264/AVC 
- Layered pyramid prediction coding structure 
- Inter-layer prediction 
- Progressive quality refinement (FGS) 
- Possibilities for low complexity encoding 
- Use previous layer motion information for ME 
- Develop prediction of motion vector candidates 
 for hierarchical prediction structure
- Utilize Philips H264 encoder at TriMedia Platform 
- Limits 
- Encoding needs multiple loops 
- Picture buffer size increases due to hierarchical 
 prediction
- SVC is still under developing
53Reference
- Julien Reichel, Heiko Schwarz, and Mathias Wien, 
 Joint Scalable Video Model JSVM-5, (R202) ITU-T
 VCEG 18th meeting, January 2006
- http//ip.hhi.de/imagecom_G1/savce/index.htm 
- Heiko Schwarz, Detlev Marpe, and Thomas Wiegand, 
 Comparison of MCTF and closed-loop hierarchical
 B pictures, (P059) ITU-T VCEG 16th Meeting, July
 2005
- Heiko Schwarz, Tobias Hinz, Detlev Marpe, and 
 Thomas Wiegand, Constrained Inter-Layer
 Prediction for Single-Loop Decoding in Spatial
 Scalability, ICIP 2005
- Gwang Hoon Park, Min Woo Park, Seyoon Jeong, 
 Kyuheon Kim, Jinwoo Hong, Improve SVC Coding
 Efficiency by Adaptive GOP Structure (SVC CE2),
 (O018) ITU-T VCEG 15th Meeting, April 2005
- Yiliang Bao, Marta Karczewicz, Implementation of 
 close-loop coding in JSVM, (P057) ITU-T VCEG
 16th Meeting, July 2005
- Heiko Schwarz, Detlev Marpe, and Thomas Wiegand, 
 Hierarchical B pictures, (P014) ITU-T VCEG 16th
 Meeting, July 2005
- H. Schwarz, D. Marpe, T. Wiegand, Basic Concepts 
 for Supporting Spatial and SNR Scalibility in the
 Scalable H.264/MPEG-AVC Extension, IWSSIP 05
- Heiko Schwarz, Detlev Marpe, and Thomas Wiegand, 
 Further results on constrained inter-layer
 prediction,  (O074) ITU-T VCEG 15th Meeting,
 April 2005