Title: JCTVC-A116 Video Coding Technology Proposal by Fraunhofer HHI
1JCTVC-A116Video Coding Technology Proposalby
Fraunhofer HHI
- M. Winken, S. Boße, B. Bross, P. Helle, T. Hinz,
H. Kirchhoffer, H. Lakshman, D. Marpe, S.
Oudin, M. Preiß, H. Schwarz, M. Siekmann, - K. Sühring, T. Wiegand
2Outline
- Overview
- Generalized Picture Partitioning for Prediction
and Transform Coding - Signalling using two nested quad-tree structures
- Codec components
- Spatial intra prediction
- Motion representation and coding
- Merging of motion partitions
- Sub-sample interpolation for inter prediction
- In-Loop filtering
- Transform coding of prediction residuals
- Entropy coding
- Encoder Control
- Average Objective Coding Efficiency
- Summary
3Basic Approach Summary
- Generalization of concepts in H.264/AVC
- Idea use a simple structure to show potential
4Overview
- Hybrid video coding approach
- Conceptual generalization of the H.264/AVC design
- Simple individual building blocks (similar as in
H.264/AVC) - Larger prediction and transform block
- Flexible partitioning in prediction and transform
block - Two nested quad-tree structures
- Merging for inter-coded partitions
- Spatial intra prediction
- Motion-compensated prediction (non-adaptive
filters) - Deblocking and adaptive in-loop filter
- New entropy coding concept
- Supports parallelized entropy decoding
- Supports usage of VLC without compromising
efficiency
5Overview
- High-level syntax is similar to H.264/AVC
- NAL units
- Sequence parameter sets
- Picture parameter sets
- Internal bit depth
- Accuracy of 14 bit for
- intra prediction signal
- motion-compensated prediction signal
- reconstructed residual signal
- Rounding to 8 bits after reconstruction of a
block - Reference pictures have an accuracy of 8 bits
6Picture Subdivision for Prediction and Coding
- Generalized picture plane grouping
- Partitioning of the colour planes into plane
groups(with the possibility of inter-plane
prediction of parameters) - Same partitioning and coding parameters for a
plane group - Submitted bitstreams Single plane group (Y,U,V)
- Quadtree-based partitioning of the plane groups
- Division of a plane group into square blocks
(tree blocks)of maximum block size - Maximum block size issignalled in slice
header(64x64 for submitted streams) - Quadtree-based subdivision ofthe tree block into
prediction andtransform blocks
7Partitioning for Prediction and Transform
- Two nested quad-tree structures
- Partitioning into prediction blocks (intra or
inter prediction) - Partitioning of prediction blocks into transform
blocks(specifying the transform sizes)
8Intra prediction
- Spatial intra prediction using neighbouring
samples (conceptually similar to H.264/AVC) - Generalization of H.264/AVC intra prediction for
arbitrary block size - 8 directional intra prediction modes
- DC prediction mode
- Adaptive smoothing of neighbouring
samples(signalled via a flag) - 3-tap filter (1,2,1)
9Motion Representation
- Generalized multi-hypothesis prediction
- More than two motion hypothesis are supported
- only up to two hypothesis are used for the
submitted streams - Each motion hypothesis is specified by
- a reference list index (into a single reference
picture list) - a displacement vector
- Displacement vector accuracy is selectable on a
slice basis - Quarter-luma sample accuracy used in submitted
bitstreams - Motion vector prediction and coding
- Interleaved prediction and coding of horizontal
and vertical displacement vector components - Motion partition merging for inference of motion
information from neighbouring blocks - No Skipped or Direct blocks
10Motion Vector Prediction and Coding
- Interleaved motion vector prediction and coding
- Coding of reference index
- Selection of neighbouring blockswith same
reference index - Prediction of vertical componentusing median
prediction - Coding of vertical component of the difference
vector - Selection of neighbouring motionvectors with
minimum absolutedifference in vertical component - Prediction of horizontal componentusing selected
motion vectors(single vector or median
prediction) - Coding of horizontal component
11Motion Partition Merging
- Concept
- Reduction of side information rate for motion
information - Adaptive inference of motion information from
neighbouring inter-predicted partitions
(prediction blocks) - Signalling using up to two flags per inter
prediction block
R region with the same motion information B
first block of R in the decoding order
(transmission of motion information) For the
remaining blocks of the region R only up to two
flags specifying the merging information are coded
B
R
12Signalling of Motion Partition Merging
T
X current inter-coded prediction block L left
neighbour of current block X T top neighbour of
current block X
L
X
- merge_flag
- transmitted if one or both neighbours are inter
coded - if equal to 1 block X is merged with one of the
neighbours - otherwise motion data are transmitted for block
X - merge_left_flag
- transmitted if merge_flag is equal to 1 and both
neighbours are inter-coded with different motion
parameters (inferred otherwise) - specifies whether current block is merged with
left or top neighbour
13Sub-Sample Interpolation for Inter Prediction
- Overview
- Non-adaptive sub-sample interpolation
- Concept is based on interpolation with
MOMS(Basic functions with Maximal Order and
Minimal Support) - Implementation in 16-bit integer arithmetic
- 2D separable IIR pre-filter (one coefficient)
- 2D separable FIR interpolation filter with short
support (4-tap) - Both the IIR and FIR filter steps are highly
parallelizable
14IIR Pre-filter
- 1D IIR Filter in horizontal direction
- Causal and anti-causal filtering
- Same IIR Filter in vertical direction
- Pole value (scaled by 215) z1 -11726
15FIR Interpolation Filter
- 4-tap FIR Filter (scaled by 215)
- Applied in horizontal direction on pre-filtered
reference picture - Applied in vertical direction on horizontal
filtered picture - Extendable for arbitrary motion vector accuracy
- e.g. 1/8, 1/12, 1/16 luma sample accuracy
- changing FIR kernel while maintaining the same
IIR filter
Integer Sample 6242, 20285, 6242, 0
¼ Sample 2889, 19078, 10520, 280
½ Sample 1073, 15311, 15311, 1073
¾ Sample 280, 10520, 19078, 2889
16Filtering inside Motion-Compensation Loop
- Deblocking filter
- Similar as in H.264/AVC
- Extended for larger block sizes
- Adaptive In-Loop Filter
- Separable Wiener filter
- vertical filtering followed by horizontal
filtering - Potentially different filters in horizontal and
vertical direction - Filter size is chosen by minimizing a Lagrangian
cost functional - supported filter sizes are 3, 5, 7, 9, and 11
- Filters are separately estimated for luma and
chroma planes - Filters may be re-used for reducing the side
information
17Adaptive In-Loop filter
- Quad-tree based block-wise filter decision
- Quad-tree is independentof prediction
partitioning - Quad-tree is transmittedas side information
- Estimation of filter coefficients
- Estimate filter coefficients and filter size for
entire picture - Determine quad-tree based filter decisions
- Re-estimate filter coefficients and filter size
for selected regions
18Transform Coding of Prediction Residuals
- Segmentation of prediction blocks
- Segmentation into transform blocks using a
quadtree - Signalization of the partitioning into transform
blocks - Maximum and minimum transform size are signaled
in slice header - For quad-tree nodes between these
bounds,subdivision flags are transmitted
Transformsegmentationtree example
max. transform size
transmitted subdivision flags
min. transform size
19Transform Coding
- Transform kernels
- Separable NxN transforms
- Integer approximations of DCT-II
kernels(obtained by scaling and rounding of
DCT-II kernel) - 32 bit integer implementation with
multiplications and additions(employing
symmetries of basis functions) - Integer transform kernels havent been optimized
for low-complexity implementations (using bit
shifts and additions) - Quantization
- Similar to H.264/AVC
- Uniform scalar quantization without extra
dead-zone - 52 quantizers with logarithmically increasing
step size
20Entropy Coding
- Novel entropy coding concept
- Binarization and context modelling as in CABAC of
H.264/AVC - Modified coding of binary decisions (bins)
- LPB probabilities are quantized (12 classes in
implementation) - Separate bin encoders for each class (fixed LPB
probabilities) - Supports high degree of parallelization
- Supports variable length codeswithout
compromising coding efficiency
21Entropy Coding with Arithmetic Codes
- Parallelization for large slice data NAL units
- All arithmetic coders are operated at fixed
probabilities - Arithmetic codewords for the different bin
encoders are written to different partitions of
the slice data NAL unit - Partitioning of the slice data NAL unit is
signalled in header - Multiple arithmetic decoders can be operated in
parallel - Remaining entropy coding process simply reads
bins from multiple bin buffers - Disabling multi-codeword approach for small
slices - Parallelization is not required for small slices
- Overhead of partitioning information can be
significant - Usage of conventional arithmetic coding engine
for small slices(signalled in slice header) - Arithmetic coding is used in submitted bitstreams
22Entropy Coding with Variable Length Codes
- Alternative to arithmetic coding engines
- Bin encoders/decoders operate at fixed
probabilities - Arithmetic coding enginescan be replaced by
simplevariable-length coders - Bin coders map a variablenumber of bin onto
avariable-length codewordand vice versa - Potential termination ofbin sequences at the
end of a slice(use shortest codeword)
Example VNB2VLC mapping for P0.15 (0.25
overhead relative to entropy)
bin sequence codeword
0000 1
01 001
10 010
001 011
0001 00 0001
11 0000 1
0001 1 0000 00
0001 01 0000 01
23Entropy Coding with Variable Length Codes
- Codeword interleaving
- Interleaving of codewordswith any overhead
- Codeword buffer at encoder
- Instantaneous decoding
- Low-delay interleaving
- Specification of maximumbuffer delay
- Codeword termination atencoder and decoder
ifmaximum delay is achieved - Coding efficiency
- Lossless transcoding of submitted bitstreams
showed virtually the same coding efficiency - 0.18 rate savings with codeword interleaving
- 0.10 rate savings with low-delay control (64
Byte)
24Encoder Control
- Coding structure
- Hierarchical B pictures for constraint set 1
configuration - Low-delay hierarchical P pictures for constraint
set 2 configuration - Motion estimation
- Rate-constrained motion estimation (as in JM,
JSVM, JMVM) - Fast integer sample motion search (same as in
JSVM, JMVM) - Sub-sample refinement search
- Quantization (for a transform block)
- Rate-distortion optimized quantization (RDOQ)
- Similar as in JM
- Coding mode decision (for a prediction block)
- Rate-constrained mode decision (as in JM, JSVM,
JMVM) - Abort criterion for complexity reduction
- Intra modes are not test, if for the inter mode
- all transform coefficient levels (RDOQ) are equal
to 0 - all transform coefficients are below a certain
threshold (depending on quantization step size)
25Encoder Control
- Selection of Prediction and Transform
Segmentation - Use top-to-bottom and depth-first decision
strategywith abort criterion - Decision is based onLagrangian costs
- Same abort criterion as for coding mode selection
- Smaller blocks are not tested, if
- all transform coefficient levels (RDOQ) are equal
to 0 - all transform coefficients are smaller than a
threshold(threshold is depending on quantization
step size) - Uses quad-tree structure for reducing the
computational complexity of the partition
selection process
26Average Objective Coding Efficiency
Constraint Set 1 Constraint Set 1 Constraint Set 2 (beta anchor) Constraint Set 2 (beta anchor)
BD-Rate (Low) BD-Rate (High) BD-Rate (Low) BD-Rate (High)
Class A -24.00 -21.84
Class B1 -32.53 -30.04 -30.61 -28.35
Class B2 -35.87 -35.53 -30.30 -29.23
Class C -30.20 -29.48 -19.45 -17.63
Class D -26.66 -27.93 -12.74 -12.12
Class E -27.50 -25.64
Average -29.87 -29.33 -22.71 -21.27
27Software
- Standard C Implementation
- Platform independent
- Compiles under Windows and Linux (32/64 bit)
- Focus on modular design and easy extensibility
- Slim code base
- Only 55.000 LOC (vs. 150.000 LOC for JM 17.0)
- Virtually no redundant code
- English naming of variables and comments
- No external libraries needed
- Optional multi-threaded encoding (boost-library
needed) - Parallel encoding of independent pictures
(depends on the GOP string) - No need to regard multi-threading related issues
when changing the encoding algorithm inside of a
picture - E.g. CS1 bitstreams were encoded almost 8 times
faster than single-threaded (on a computer with 8
cores)
28Summary
- Hybrid video coding approach
- Generalization of H.264/AVC concepts
- Support of larger block sizes for prediction and
transform - Flexible quad-tree based partitioning into
prediction blocks(with additional merging for
inter-coded blocks) - Flexible quad-tree based partitioning into
transform blocks - Spatial intra prediction
- Motion-compensated prediction using non-adaptive
filters - Deblocking and adaptive loop filter
- Novel entropy coding approach
- Average objective coding results
- About 29-30 bit rate savings for high-delay
cases - About 22 bit rate savings for low-delay cases