JCTVC-A116 Video Coding Technology Proposal by Fraunhofer HHI - PowerPoint PPT Presentation

About This Presentation
Title:

JCTVC-A116 Video Coding Technology Proposal by Fraunhofer HHI

Description:

Video Coding Technology Proposal by Fraunhofer HHI M. Winken, S. Bo e, B. Bross, P. Helle, T. Hinz, H. Kirchhoffer, H. Lakshman, D. Marpe, S. Oudin, – PowerPoint PPT presentation

Number of Views:146
Avg rating:3.0/5.0
Slides: 29
Provided by: wftp3Itu
Category:

less

Transcript and Presenter's Notes

Title: JCTVC-A116 Video Coding Technology Proposal by Fraunhofer HHI


1
JCTVC-A116Video Coding Technology Proposalby
Fraunhofer HHI
  • M. Winken, S. Boße, B. Bross, P. Helle, T. Hinz,
    H. Kirchhoffer, H. Lakshman, D. Marpe, S.
    Oudin, M. Preiß, H. Schwarz, M. Siekmann,
  • K. Sühring, T. Wiegand

2
Outline
  • Overview
  • Generalized Picture Partitioning for Prediction
    and Transform Coding
  • Signalling using two nested quad-tree structures
  • Codec components
  • Spatial intra prediction
  • Motion representation and coding
  • Merging of motion partitions
  • Sub-sample interpolation for inter prediction
  • In-Loop filtering
  • Transform coding of prediction residuals
  • Entropy coding
  • Encoder Control
  • Average Objective Coding Efficiency
  • Summary

3
Basic Approach Summary
  • Generalization of concepts in H.264/AVC
  • Idea use a simple structure to show potential

4
Overview
  • Hybrid video coding approach
  • Conceptual generalization of the H.264/AVC design
  • Simple individual building blocks (similar as in
    H.264/AVC)
  • Larger prediction and transform block
  • Flexible partitioning in prediction and transform
    block
  • Two nested quad-tree structures
  • Merging for inter-coded partitions
  • Spatial intra prediction
  • Motion-compensated prediction (non-adaptive
    filters)
  • Deblocking and adaptive in-loop filter
  • New entropy coding concept
  • Supports parallelized entropy decoding
  • Supports usage of VLC without compromising
    efficiency

5
Overview
  • High-level syntax is similar to H.264/AVC
  • NAL units
  • Sequence parameter sets
  • Picture parameter sets
  • Internal bit depth
  • Accuracy of 14 bit for
  • intra prediction signal
  • motion-compensated prediction signal
  • reconstructed residual signal
  • Rounding to 8 bits after reconstruction of a
    block
  • Reference pictures have an accuracy of 8 bits

6
Picture Subdivision for Prediction and Coding
  • Generalized picture plane grouping
  • Partitioning of the colour planes into plane
    groups(with the possibility of inter-plane
    prediction of parameters)
  • Same partitioning and coding parameters for a
    plane group
  • Submitted bitstreams Single plane group (Y,U,V)
  • Quadtree-based partitioning of the plane groups
  • Division of a plane group into square blocks
    (tree blocks)of maximum block size
  • Maximum block size issignalled in slice
    header(64x64 for submitted streams)
  • Quadtree-based subdivision ofthe tree block into
    prediction andtransform blocks

7
Partitioning for Prediction and Transform
  • Two nested quad-tree structures
  • Partitioning into prediction blocks (intra or
    inter prediction)
  • Partitioning of prediction blocks into transform
    blocks(specifying the transform sizes)

8
Intra prediction
  • Spatial intra prediction using neighbouring
    samples (conceptually similar to H.264/AVC)
  • Generalization of H.264/AVC intra prediction for
    arbitrary block size
  • 8 directional intra prediction modes
  • DC prediction mode
  • Adaptive smoothing of neighbouring
    samples(signalled via a flag)
  • 3-tap filter (1,2,1)

9
Motion Representation
  • Generalized multi-hypothesis prediction
  • More than two motion hypothesis are supported
  • only up to two hypothesis are used for the
    submitted streams
  • Each motion hypothesis is specified by
  • a reference list index (into a single reference
    picture list)
  • a displacement vector
  • Displacement vector accuracy is selectable on a
    slice basis
  • Quarter-luma sample accuracy used in submitted
    bitstreams
  • Motion vector prediction and coding
  • Interleaved prediction and coding of horizontal
    and vertical displacement vector components
  • Motion partition merging for inference of motion
    information from neighbouring blocks
  • No Skipped or Direct blocks

10
Motion Vector Prediction and Coding
  • Interleaved motion vector prediction and coding
  • Coding of reference index
  • Selection of neighbouring blockswith same
    reference index
  • Prediction of vertical componentusing median
    prediction
  • Coding of vertical component of the difference
    vector
  • Selection of neighbouring motionvectors with
    minimum absolutedifference in vertical component
  • Prediction of horizontal componentusing selected
    motion vectors(single vector or median
    prediction)
  • Coding of horizontal component

11
Motion Partition Merging
  • Concept
  • Reduction of side information rate for motion
    information
  • Adaptive inference of motion information from
    neighbouring inter-predicted partitions
    (prediction blocks)
  • Signalling using up to two flags per inter
    prediction block

R region with the same motion information B
first block of R in the decoding order
(transmission of motion information) For the
remaining blocks of the region R only up to two
flags specifying the merging information are coded
B
R
12
Signalling of Motion Partition Merging
T
X current inter-coded prediction block L left
neighbour of current block X T top neighbour of
current block X
L
X
  • merge_flag
  • transmitted if one or both neighbours are inter
    coded
  • if equal to 1 block X is merged with one of the
    neighbours
  • otherwise motion data are transmitted for block
    X
  • merge_left_flag
  • transmitted if merge_flag is equal to 1 and both
    neighbours are inter-coded with different motion
    parameters (inferred otherwise)
  • specifies whether current block is merged with
    left or top neighbour

13
Sub-Sample Interpolation for Inter Prediction
  • Overview
  • Non-adaptive sub-sample interpolation
  • Concept is based on interpolation with
    MOMS(Basic functions with Maximal Order and
    Minimal Support)
  • Implementation in 16-bit integer arithmetic
  • 2D separable IIR pre-filter (one coefficient)
  • 2D separable FIR interpolation filter with short
    support (4-tap)
  • Both the IIR and FIR filter steps are highly
    parallelizable

14
IIR Pre-filter
  • 1D IIR Filter in horizontal direction
  • Causal and anti-causal filtering
  • Same IIR Filter in vertical direction
  • Pole value (scaled by 215) z1 -11726

15
FIR Interpolation Filter
  • 4-tap FIR Filter (scaled by 215)
  • Applied in horizontal direction on pre-filtered
    reference picture
  • Applied in vertical direction on horizontal
    filtered picture
  • Extendable for arbitrary motion vector accuracy
  • e.g. 1/8, 1/12, 1/16 luma sample accuracy
  • changing FIR kernel while maintaining the same
    IIR filter

Integer Sample 6242, 20285, 6242, 0
¼ Sample 2889, 19078, 10520, 280
½ Sample 1073, 15311, 15311, 1073
¾ Sample 280, 10520, 19078, 2889
16
Filtering inside Motion-Compensation Loop
  • Deblocking filter
  • Similar as in H.264/AVC
  • Extended for larger block sizes
  • Adaptive In-Loop Filter
  • Separable Wiener filter
  • vertical filtering followed by horizontal
    filtering
  • Potentially different filters in horizontal and
    vertical direction
  • Filter size is chosen by minimizing a Lagrangian
    cost functional
  • supported filter sizes are 3, 5, 7, 9, and 11
  • Filters are separately estimated for luma and
    chroma planes
  • Filters may be re-used for reducing the side
    information

17
Adaptive In-Loop filter
  • Quad-tree based block-wise filter decision
  • Quad-tree is independentof prediction
    partitioning
  • Quad-tree is transmittedas side information
  • Estimation of filter coefficients
  • Estimate filter coefficients and filter size for
    entire picture
  • Determine quad-tree based filter decisions
  • Re-estimate filter coefficients and filter size
    for selected regions

18
Transform Coding of Prediction Residuals
  • Segmentation of prediction blocks
  • Segmentation into transform blocks using a
    quadtree
  • Signalization of the partitioning into transform
    blocks
  • Maximum and minimum transform size are signaled
    in slice header
  • For quad-tree nodes between these
    bounds,subdivision flags are transmitted

Transformsegmentationtree example
max. transform size
transmitted subdivision flags
min. transform size
19
Transform Coding
  • Transform kernels
  • Separable NxN transforms
  • Integer approximations of DCT-II
    kernels(obtained by scaling and rounding of
    DCT-II kernel)
  • 32 bit integer implementation with
    multiplications and additions(employing
    symmetries of basis functions)
  • Integer transform kernels havent been optimized
    for low-complexity implementations (using bit
    shifts and additions)
  • Quantization
  • Similar to H.264/AVC
  • Uniform scalar quantization without extra
    dead-zone
  • 52 quantizers with logarithmically increasing
    step size

20
Entropy Coding
  • Novel entropy coding concept
  • Binarization and context modelling as in CABAC of
    H.264/AVC
  • Modified coding of binary decisions (bins)
  • LPB probabilities are quantized (12 classes in
    implementation)
  • Separate bin encoders for each class (fixed LPB
    probabilities)
  • Supports high degree of parallelization
  • Supports variable length codeswithout
    compromising coding efficiency

21
Entropy Coding with Arithmetic Codes
  • Parallelization for large slice data NAL units
  • All arithmetic coders are operated at fixed
    probabilities
  • Arithmetic codewords for the different bin
    encoders are written to different partitions of
    the slice data NAL unit
  • Partitioning of the slice data NAL unit is
    signalled in header
  • Multiple arithmetic decoders can be operated in
    parallel
  • Remaining entropy coding process simply reads
    bins from multiple bin buffers
  • Disabling multi-codeword approach for small
    slices
  • Parallelization is not required for small slices
  • Overhead of partitioning information can be
    significant
  • Usage of conventional arithmetic coding engine
    for small slices(signalled in slice header)
  • Arithmetic coding is used in submitted bitstreams

22
Entropy Coding with Variable Length Codes
  • Alternative to arithmetic coding engines
  • Bin encoders/decoders operate at fixed
    probabilities
  • Arithmetic coding enginescan be replaced by
    simplevariable-length coders
  • Bin coders map a variablenumber of bin onto
    avariable-length codewordand vice versa
  • Potential termination ofbin sequences at the
    end of a slice(use shortest codeword)

Example VNB2VLC mapping for P0.15 (0.25
overhead relative to entropy)
bin sequence codeword
0000 1
01 001
10 010
001 011
0001 00 0001
11 0000 1
0001 1 0000 00
0001 01 0000 01
23
Entropy Coding with Variable Length Codes
  • Codeword interleaving
  • Interleaving of codewordswith any overhead
  • Codeword buffer at encoder
  • Instantaneous decoding
  • Low-delay interleaving
  • Specification of maximumbuffer delay
  • Codeword termination atencoder and decoder
    ifmaximum delay is achieved
  • Coding efficiency
  • Lossless transcoding of submitted bitstreams
    showed virtually the same coding efficiency
  • 0.18 rate savings with codeword interleaving
  • 0.10 rate savings with low-delay control (64
    Byte)

24
Encoder Control
  • Coding structure
  • Hierarchical B pictures for constraint set 1
    configuration
  • Low-delay hierarchical P pictures for constraint
    set 2 configuration
  • Motion estimation
  • Rate-constrained motion estimation (as in JM,
    JSVM, JMVM)
  • Fast integer sample motion search (same as in
    JSVM, JMVM)
  • Sub-sample refinement search
  • Quantization (for a transform block)
  • Rate-distortion optimized quantization (RDOQ)
  • Similar as in JM
  • Coding mode decision (for a prediction block)
  • Rate-constrained mode decision (as in JM, JSVM,
    JMVM)
  • Abort criterion for complexity reduction
  • Intra modes are not test, if for the inter mode
  • all transform coefficient levels (RDOQ) are equal
    to 0
  • all transform coefficients are below a certain
    threshold (depending on quantization step size)

25
Encoder Control
  • Selection of Prediction and Transform
    Segmentation
  • Use top-to-bottom and depth-first decision
    strategywith abort criterion
  • Decision is based onLagrangian costs
  • Same abort criterion as for coding mode selection
  • Smaller blocks are not tested, if
  • all transform coefficient levels (RDOQ) are equal
    to 0
  • all transform coefficients are smaller than a
    threshold(threshold is depending on quantization
    step size)
  • Uses quad-tree structure for reducing the
    computational complexity of the partition
    selection process

26
Average Objective Coding Efficiency
Constraint Set 1 Constraint Set 1 Constraint Set 2 (beta anchor) Constraint Set 2 (beta anchor)
BD-Rate (Low) BD-Rate (High) BD-Rate (Low) BD-Rate (High)
Class A -24.00 -21.84
Class B1 -32.53 -30.04 -30.61 -28.35
Class B2 -35.87 -35.53 -30.30 -29.23
Class C -30.20 -29.48 -19.45 -17.63
Class D -26.66 -27.93 -12.74 -12.12
Class E -27.50 -25.64
Average -29.87 -29.33 -22.71 -21.27
27
Software
  • Standard C Implementation
  • Platform independent
  • Compiles under Windows and Linux (32/64 bit)
  • Focus on modular design and easy extensibility
  • Slim code base
  • Only 55.000 LOC (vs. 150.000 LOC for JM 17.0)
  • Virtually no redundant code
  • English naming of variables and comments
  • No external libraries needed
  • Optional multi-threaded encoding (boost-library
    needed)
  • Parallel encoding of independent pictures
    (depends on the GOP string)
  • No need to regard multi-threading related issues
    when changing the encoding algorithm inside of a
    picture
  • E.g. CS1 bitstreams were encoded almost 8 times
    faster than single-threaded (on a computer with 8
    cores)

28
Summary
  • Hybrid video coding approach
  • Generalization of H.264/AVC concepts
  • Support of larger block sizes for prediction and
    transform
  • Flexible quad-tree based partitioning into
    prediction blocks(with additional merging for
    inter-coded blocks)
  • Flexible quad-tree based partitioning into
    transform blocks
  • Spatial intra prediction
  • Motion-compensated prediction using non-adaptive
    filters
  • Deblocking and adaptive loop filter
  • Novel entropy coding approach
  • Average objective coding results
  • About 29-30 bit rate savings for high-delay
    cases
  • About 22 bit rate savings for low-delay cases
Write a Comment
User Comments (0)
About PowerShow.com