YaoChung Lin - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

YaoChung Lin

Description:

Scalable Video Coding. A research topic over 20 years ... October 2003, MPEG Call for Proposal. March 2004, 14 proposals submitted and evaluated ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 54
Provided by: West5
Category:
Tags: yaochung | lin | mpeg | search | video

less

Transcript and Presenter's Notes

Title: YaoChung Lin


1
Introduction to H.264/SVC Differences,
Possibilities, and Limits
  • Yao-Chung Lin
  • Image, Video, and Multimedia Systems Group
  • Information Systems Laboratory
  • May 10 2006

2
Scalable Video Coding
  • A research topic over 20 years
  • Single bitstream serves diversified clients
  • Display resolutions (QCIF, CIF, , HDTV)
  • Frame rates (15Hz, 30Hz, )
  • Bit rates/Qualities
  • Developing Standard
  • October 2003, MPEG Call for Proposal
  • March 2004, 14 proposals submitted and evaluated
  • 12 proposals are wavelet-based
  • 2 proposals are extension of H.264/AVC
  • October 2004, MPEG selected HHI proposal as
    starting point for H.264/ MPEG-4 AVC Amd.1
  • 2007, final draft will be released

3
Current Draft
  • Based on H.264 main profile
  • MCTF/Hierarchical B-picture (MCTF w/o update
    step) for temporal scalability
  • Layered pyramid prediction structure for spatial
    scalability
  • Layered, sub bit-plane, and (run,level) coding
    for SNR scalability

4
H.264 Profiles
5
Overall Architecture of SVC
A two layer example
6
Outline
  • Introduction
  • Scalabilities
  • Temporal Scalability
  • Spatial Scalability
  • SNR (Quality) Scalability
  • Other Details
  • Simulation Results
  • Conclusion Discussion

7
Temporal Scalability
  • Group of picture (GOP)
  • Concepts of motion compensated temporal filtering
    (MCTF)
  • Hierarchical B-picture

8
Group of Picture
  • Instantaneous decoding refreshment (IDR) pictures
  • Intra coded picture
  • Also a key picture
  • A GOP with only one picture
  • Provide random access ability
  • Key pictures
  • The last picture in a GOP
  • Intra coded
  • Inter coded by previous key picture
  • Provide lowest temporal resolution
  • Non-key picture
  • Hierarchically predicted B pictures
  • High pass signal of MCTF
  • Provide various temporal resolutions
  • Note Reference frame number can not be greater
    than 16

9
Group of Pictures
An example of a group of picture Dyadic, 4
temporal levels
ITU 2006 January, R202
10
Concepts of MCTF
  • Based on lifting scheme
  • Insures perfect reconstruction
  • Even if non-linear operations are used
  • Open loop
  • Non-recursive temporal decomposition
  • Prevent drift error
  • Improves efficient scalable coding, especially
    with FGS

11
Lifting Scheme
r reference index m motion vector
Similar to P-picture
Similar to B-picture
ITU, 2006 January, R202
12
Motion Modes
  • Variable block-size inter modes from 16x16 to 4x4
  • Intra modes 16x16, 8x8, 4x4
  • Direct mode 16x16, 8x8

13
Decomposition Structure
HHI Webpage Scalable Extension of H.264/AVC
14
Decomposition Structure
  • A dyadic decomposition structure for 2N-1 frames
    delay, where N temporal decomposition level
  • Update steps do not cross the GOP border

HHI Webpage Scalable Extension of H.264/AVC
15
Low Delay Support
ITU, 2006 January, R202
16
Removal of update step
  • Introduce high complexity to decoder
  • Derivation of the motion information for update
    step
  • Smaller block sizes
  • 9-bit residual motion compensation
  • Provide insignificant coding efficiency than that
    of closed-loop coding with hierarchical B picture
    (HB)
  • Rate-distortion performance of closed-loop coding
    with HB is higher or similar to that of
    MCTF-based coding for all test sequences
  • Except City sequence which has 0.5 dB gain
  • After temporal pre-filtering the sequence, the
    MCTF gain becomes insignificant

ITU, 2005 July, P059
17
Two Closed Loops
FGS Layer
ITU, 2005 July, P059
18
Spatial Scalability
  • Layered pyramid prediction structure
  • Inter-layer intra texture prediction
  • Inter-layer motion prediction
  • Inter-layer residual prediction
  • Extended Spatial Scalability
  • Cropping
  • Generic upsampling (non-dyadic spatial resampling)

19
Layered Pyramid Prediction Structure
  • Same concepts used in H.262/MPEG-2, H.263, MPEG-4
    with additional inter-layer prediction
  • Each spatial resolution is coded as a new layer
    with texture and motion refinement
  • Same mechanism for coarse grain SNR scalability
    (Spatial downsampling ratio1)

20
Inheritance of modes
Previous Spatial Layer
Current Layer
For spatial scaling ratio 2
21
Inter-layer Intra Texture Prediction
  • Unrestricted inter-layer intra texture prediction
  • Decode and predict from all lower layer in the
    bitstream
  • Not supported in the standard
  • Constrained inter-layer intra texture prediction
  • For MBs in non-key pictures
  • The co-located block in the previous layer are
    intra coded
  • Not supported in the standard
  • Constrained inter-layer intra texture prediction
    for single-loop decoding
  • For MBs in all pictures (including key pictures)
  • The co-located block in the previous layer are
    intra coded
  • Allow decoding (motion compensation) only current
    layer
  • Supported by the current SVC draft

22
Generation of Inter-layer Texture Prediction
  • Directly de-block filtering
  • 4-sample border extension
  • Interpolation
  • 2x Half-pel interpolation filter of AVC
  • Otherwise quarter-pel interpolation filter

Schwarz, ICIP 2005
23
Inter-layer Motion Prediction
  • Intra base layer
  • If previous layer is inter, use scaled
    partitioning and motion vectors of base layer
  • If previous layer is intra, predict from previous
    layer
  • Quarter pel refinement
  • Only for reduced spatial resolution
  • Refine the scaled motion vector of previous layer
    by 1, 0, and -1 in quarter-sample precision
  • Send the refinement
  • None
  • Motion vector prediction from neighbor blocks
  • Motion vector prediction from previous layer

24
Inter-layer Residual Prediction
  • Predict the residual from previous layer residual
  • Upsample the residual
  • 2x separable bi-linear filter 1,1/2
  • Otherwise quarter-pel interpolation
  • Helpful while the motion information is unchanged
    or slightly changed from previous layer

25
SNR Scalability
  • Coarse grain scalability (CGS)
  • Layered coding
  • The same mechanism as spatial scalability
  • Re-quantize the coefficients with finer step
  • Fine grain scalability
  • Sub-bitplane arithmetic coding
  • Re-quantize the coefficients with finer step
  • Provide a continuous refinement from a quality
    base layer

26
Coarse Grain Scalability
  • Same mechanism as spatial scalability
  • Except no upsampling
  • Provide discrete quality refinement
  • Close to single layer RD performance, if dQP gt 6

27
Fine Grain SNR Scalability
  • Represent the residual between the original
    prediction error and base layer representation
  • Quantized to a bisection step size (dQP6)
  • Coded in transform domain for single inverse
    transform at decoder
  • Adaptive references for FGS (AR-FGS) provide
    leaky prediction attenuating drift error

28
Illustration of AR-FGS
Zero Coef. Block
ITU, 2006 Jan. R202
29
Outline
  • Introduction
  • Scalabilities
  • Temporal Scalability
  • Spatial Scalability
  • SNR (Quality) Scalability
  • Other details
  • Simulation Results
  • Discussion

30
Other Details
  • Fidelity resolution extension (FRExt)
  • Support 8x8 Transform (High Profile)
  • Increase coding efficiency especially for
    high-resolution source
  • Motion search block segment size down to 8x8 only
  • Weighted prediction
  • Scale the reference pictures for prediction
  • Find the weights at encoder
  • Explicitly send in syntax
  • Implicitly derive from temporal distance (an
    option for B-picture)

31
Other Details
  • FGS motion
  • Progressive refinement slice (FGS slice) contains
    motion data
  • Provide better prediction
  • Adaptive GOP Structure (AGS)
  • Divide a GOP into several sub GOPs by appropriate
    mode decision
  • Decreasing the distance between two low-pass
    pictures
  • 0.62 dB gain
  • Detail in ITU O018
  • Loss Aware rate distortion optimization
  • The mode/parameter decision consider the packet
    loss
  • Detail in ITU P057

32
JSVM
  • Written in C
  • Accessing from CVS
  • Current version 5.2
  • Last Update May 2, 2006

33
Simulation Results
  • Temporal Scalability
  • GOP sizes ITU, 2005 July, P014
  • Open loop MCTF vs. closed loop HB ITU, 2005
    July, P059
  • Spatial
  • Given the same base layer
  • Exam the inter-layer prediction
  • SNR
  • CGS, DQP 2 or 6
  • FGS
  • Key pictures predict from base representation
  • FGS motion optimized at 1/3 bit rate
  • Open loop MCTF helpful ? ITU, P059

34
GOP Sizes
35
GOP Sizes
36
Open Loop vs. Closed Loop
37
Open Loop vs. Close Loop
38
Summary of Temporal Scalability Features
  • Hierarchical B pictures
  • B pictures gives 0.51 dB (IPP -gt IBBPBBP)
  • Hierarchical prediction gives additional 0.5 1
    dB
  • MCTF
  • Only CITY has 0.5 dB gain compared to
    closed-loop HB
  • The gain is diminished by encoder MCTF
    pre-filtering
  • Improvement comes from hierarchical prediction
    structure

39
Simulation Results
  • Temporal Scalability
  • GOP sizes ITU, 2005 July, P014
  • Open loop MCTF vs. closed loop HB ITU, 2005
    July, P059
  • Spatial
  • Given the same base layer, exam the inter-layer
    prediction
  • Multiple-loop decoding vs. single-loop decoding
    (constrained inter-layer prediction) ITU, O074
  • SNR
  • CGS, DQP 2 or 6
  • FGS
  • Key pictures predict from base representation
  • FGS motion optimized at 1/3 bit rate

40
Spatial Scalability
Schwarz, Marpe, and Wiegand, IWSSIP 05
41
Spatial Scalability
Schwarz, Marpe, and Wiegand, IWSSIP 05
42
Constrained Inter-Layer Prediction
CIF_at_30
CIF_at_15
QCIF_at_15
QCIF_at_7.5
CIF_at_15
Foreman, Munich test points
43
Constrained Inter-Layer Prediction
4CIF_at_60
CIF_at_30
QCIF_at_15
4CIF_at_30
CIF_at_30
QCIF_at_15
Crew, Munich test points
44
Summary of Inter-layer prediction tools
  • Inter-layer predictions bring 2dB gain
  • Intra prediction 1dB
  • Motion prediction 0.51dB
  • Residual prediction 0.5dB
  • Constrained inter-layer intra prediction for
    single layer decoding
  • Provide low complexity decoding
  • Pay lt 0.5 dB loss

45
Simulation Results
  • Temporal Scalability
  • GOP sizes ITU, 2005 July, P014
  • Open loop MCTF vs. closed loop HB ITU, 2005
    July, P059
  • Spatial
  • Given the same base layer, exam the inter-layer
    prediction
  • Multiple-loop decoding vs. single-loop decoding
    (constrained inter-layer prediction) ITU, O074
  • SNR
  • CGS, DQP 2 or 6
  • FGS
  • Key pictures predict from base representation
  • FGS motion optimized at 1/3 bit rate
  • Open loop MCTF helpful ? ITU, P059

46
SNR Scalability
Schwarz, Marpe, and Wiegand, IWSSIP 05
47
SNR Scalability
Schwarz, Marpe, and Wiegand, IWSSIP 05
48
SNR Scalability
49
SNR Scalability
50
SNR Scalability
51
SNR Scalability
  • SNR scalability gives rate adaptation with 1dB
    quality loss (30 rate loss)
  • CGS with dQP 6 has least loss in
    rate-distortion performance
  • FGS with appropriate choice reference quality
    gives near CGS performance

52
Conclusion and Discussion
  • Differences from H.264/AVC
  • Layered pyramid prediction coding structure
  • Inter-layer prediction
  • Progressive quality refinement (FGS)
  • Possibilities for low complexity encoding
  • Use previous layer motion information for ME
  • Develop prediction of motion vector candidates
    for hierarchical prediction structure
  • Utilize Philips H264 encoder at TriMedia Platform
  • Limits
  • Encoding needs multiple loops
  • Picture buffer size increases due to hierarchical
    prediction
  • SVC is still under developing

53
Reference
  • Julien Reichel, Heiko Schwarz, and Mathias Wien,
    Joint Scalable Video Model JSVM-5, (R202) ITU-T
    VCEG 18th meeting, January 2006
  • http//ip.hhi.de/imagecom_G1/savce/index.htm
  • Heiko Schwarz, Detlev Marpe, and Thomas Wiegand,
    Comparison of MCTF and closed-loop hierarchical
    B pictures, (P059) ITU-T VCEG 16th Meeting, July
    2005
  • Heiko Schwarz, Tobias Hinz, Detlev Marpe, and
    Thomas Wiegand, Constrained Inter-Layer
    Prediction for Single-Loop Decoding in Spatial
    Scalability, ICIP 2005
  • Gwang Hoon Park, Min Woo Park, Seyoon Jeong,
    Kyuheon Kim, Jinwoo Hong, Improve SVC Coding
    Efficiency by Adaptive GOP Structure (SVC CE2),
    (O018) ITU-T VCEG 15th Meeting, April 2005
  • Yiliang Bao, Marta Karczewicz, Implementation of
    close-loop coding in JSVM, (P057) ITU-T VCEG
    16th Meeting, July 2005
  • Heiko Schwarz, Detlev Marpe, and Thomas Wiegand,
    Hierarchical B pictures, (P014) ITU-T VCEG 16th
    Meeting, July 2005
  • H. Schwarz, D. Marpe, T. Wiegand, Basic Concepts
    for Supporting Spatial and SNR Scalibility in the
    Scalable H.264/MPEG-AVC Extension, IWSSIP 05
  • Heiko Schwarz, Detlev Marpe, and Thomas Wiegand,
    Further results on constrained inter-layer
    prediction, (O074) ITU-T VCEG 15th Meeting,
    April 2005
Write a Comment
User Comments (0)
About PowerShow.com