Title: Overview of H.264 / MPEG-4 Part10
1Overview of H.264 /MPEG-4 Part10
2004. 10. 20.
Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui
University, T-Mobile, University of Texas at
Arlington
2Contents
- Introduction
- Layered Structure
- Video Coding Algorithm
- Error Resilience
- Comparison of Coding Efficiency
- Conclusions
3Introduction
- Scope of Image and Video Coding Standards
- Only the Syntax and Decoder are standardized
- Optimization beyond the obvious
- Complexity reduction for implementation
- Provides no guarantees of quality
Input (image / video)
Pre-Processing
Encoding
Output (image / video)
Post-Processing Error Recovery
Decoding
Scope of Standard
4Introduction
Year
Main Applications
Standard
1992-1999, 2000
Image
JPEG, JPEG2000
1995-2000
Fax
JBIG
1990
Video Conferencing
H.261
1995, 2000
DTV, SDTV
H.262, H.262
1998, 2000
Videophone
H.263, H.263
1992
Video CD
MPEG-1
1995
DTV, SDTV, HDTV, DVD
MPEG-2
2000
Interactive video
MPEG-4
Multimedia Content description Interface
2001
MPEG-7
2002
Multimedia Framework
MPEG-21
2003
Advanced Video Coding
H.264/MPEG-4 part 10
Fidelity Range Extensions (High profile), Studio
editing, Post processing, Digital cinema
2004 August
5Introduction
- MPEG-1
- Formally ISO/IEC 11172-2 (93), developed by
ISO/IEC JTC1 SC29 WG11 (MPEG) use is fairly
widespread, but mostly overtaken by MPEG-2 - Superior quality compared to H.261 when operated
at higher bit rates ( ? 1Mbps for CIF 352x288
resolution) - Provides approximately VHS quality between
1-2Mbps using SIF 352x240/288 resolution - Additional technical features
- Bi-directional motion prediction (B-pictures)
- Half-pel motion vector resolution
- Slice-structured coding
- DC-only D pictures
6Introduction
- Predictive Coding with B Pictures
I
B
P
B
P
7Introduction
- MPEG-2 / H.262
- Formally ISO/IEC 13818-2 ITU-T H.262,
developed (1994) jointly by ITU-T and ISO/IEC
SC29 WG11 (MPEG) Now in wide use for DVD and
standard high-definition DTV (the most commonly
used video coding standard) - Primary new technical features
- Support for interlaced-scan pictures
- Also
- Various forms of scalability (SNR, Spatial,
Temporal and hybrid) - I-picture concealment motion vectors
- Essentially same as MPEG-1 for progressive-scan
pictures, and MPEG-1 forward compatibility is
required - Not especially useful below 2-3Mbps (range
2-5Mbps SDTV broadcast, 6-8Mbps DVD, 18Mbps
HDTV), picture skipping not easy
8Introduction
- H.263 The Next Generation
- ITU-T Rec. H.263 (v1 1995) The next generation
of video coding performance, developed by ITU-T
the current premier ITU-T video standard (has
overtaken H.261 as dominant videoconferencing
codec) - Superior quality to prior standards at all bit
rates (except perhaps for interlaced video) - Wins by a factor of two at very low rates
- Version 2 (late 1997 / early 1998) version 3
(2000) later developed with a large number of new
features - Profiles defined early 2001
- H.263 H.263 (Extensions to H.263)
9Introduction
- MPEG-4 Visual Baseline H.263 and Many Creative
Extras - MPEG-4 Visual (formally 14496-2, v1 early 1999)
Contains the H.263 baseline design and adds
essentially all prior features and many creative
new extras - Segmented coding of shapes
- Scalable wavelet coding of still textures
- Mesh coding
- Face animation coding
- Coding of synthetic and semi-synthetic content
- 10 12-bit sampling
- More
- v2 (early 2000) v3 (early 2001) added later
10Introduction
- Relationship to Other Standards
- Same design to be approved in both ITU-T / VCEG
and ISO/IEC / MPEG - In ITU-T / VCEG this is a new separate standard
- ITU-T Recommendation H.264
- ITU-T Systems (H.32x) is modified to support it
- In ISO/IEC / MPEG this is a new part in the
MPEG-4 suite - Separate coded design from prior MPEG-4 visual
(Part 2) - New part 10 called Advanced Video Coding (AVC
similar to AAC MPEG-2 as separate audio codec) - Not backward or forward compatible with prior
standards - MPEG-4 Systems / File Format modifying to support
it - H.222.0 MPEG-2 Systems are also be modified to
support it - IETF working on RTP payload packetization
11Introduction
- History of H.264 / MPEG-4 part 10
- ITU-T Q.6/SG16 started work on H.26L (L Long
Range) - July 2001 H.26L demonstrated at MPEG (Moving
Picture Experts Group) call for technology - December 2001 ITU-T VCEG (Video Coding Experts
Group) and ISO/IEC MPEG started a joint project
Joint Video Team (JVT) - May 2003 Final approval from ISO/IEC and ITU-T
- The standard is named H.264 by ITU-T and MPEG-4
part 10 by ISO/IEC - Fidelity Range Extensions (August 2004) Amendment
1 - Transport of MPEG-4 AVC on MPEG-2 TS Amendment 3
12Introduction
- Purpose of H.264 / MPEG-4 part 10
- Higher coding efficiency than previous standards,
MPEG-1,2,4 part 2, H.261, H.263 - Simple syntax specifications
- Seamless integration of video coding into all
current protocols - More error robustness
- Various applications like video broadcasting,
video streaming, video conferencing, D-Cinema,
HDTV - Network friendliness
- Balance between coding efficiency, implementation
complexity and cost - based on state-of the-art
in VLSI design technology
13Introduction
- H.264 / MPEG-4 part 10 Architecture
14Introduction
- Applications of H.264 / MPEG-4 part 10 A Broad
range of applications for video content including
but not limited to the following - Video Streaming over the internet
- CATV Cable TV on optical networks, copper, etc.
- DBS Direct broadcast satellite video services
- DSL Digital subscriber line video services
- DTTB Digital terrestrial television broadcasting,
cable modem, DSL - ISM Interactive storage media (optical disks,
etc.) - MMM Multimedia mailing
- MSPN Multimedia services over packet networks
- RTC Real-time conversational services
(videoconferencing, videophone, etc.) - RVS Remote video surveillance
- SSM Serial storage media (digital VTR, etc.)
- D Cinema Content contribution, content
distribution, studio editing, post processing
15Introduction
- Profiles and Levels for particular applications
- Profile a subset of entire bit stream of
syntax, - different decoder design based on the
Profile - Four profiles Baseline, Main, Extended and High
Applications
Profile
Video Conferencing Videophone
Baseline
Digital Storage Media Television Broadcasting
Main
Streaming Video
Extended
High
Content contribution Content distribution Studio
editing Post processing
16Introduction
- Specific coding parts for the Profiles
17Introduction
- Common coding parts for the Profiles
- I slice (Intra-coded slice) the coded slice by
using prediction only from decoded samples within
the same slice - P slice (Predictive-coded slice) the coded
slice by using inter prediction from
previously-decoded reference pictures, using at
most one motion vector and reference index to
predict the sample values of each block - CAVLC (Context-based Adaptive Variable Length
Coding) for entropy coding -
18Introduction
- Coding parts for Baseline Profile
- Common parts I slice, P slice, CAVLC
- FMO Flexible macroblock order macroblocks may
not necessarily be in the raster scan order. The
map assigns macroblocks to a slice group - ASO Arbitrary slice order the macroblock
address of the first macroblock of a slice of a
picture may be smaller than the macroblock
address of the first macroblock of some other
preceding slice of the same coded picture - RS Redundant slice This slice belongs to the
redundant coded data obtained by same or
different coding rate, in comparison with
previous coded data of same slice -
19Introduction
- Coding parts for Main Profile
- Common parts I slice, P slice, CAVLC
- B slice (Bi-directionally predictive-coded slice)
the coded slice by using inter prediction from
previously-decoded reference pictures, using at
most two motion vectors and reference indices to
predict the sample values of each block - Weighted prediction scaling operation by
applying a weighting factor to the samples of
motion-compensated prediction data in P or B
slice - CABAC (Context-based Adaptive Binary Arithmetic
Coding) for entropy coding -
20Introduction
- Coding parts for Extended Profile
- Common parts I slice, P slice, CAVLC
- SP slice the specially coded slice for
efficient switching between video streams,
similar to coding of a P slice - SI slice the switched slice, similar to coding
of an I slice - Data partition the coded data is placed in
separate data partitions, each partition can be
placed in different layer unit - Flexible macroblock order (FMO)
- Arbitrary slice order (ASO)
- Redundant slice (RS)
- B slice
- Weighted prediction
-
21Introduction
High
Extended
Main
Baseline
X
X
X
I P Slices
X
X
X
X
Deblocking Filter
X
X
X
X
¼ Pel Motion Compensation
X
X
X
X
Variable Block Size (16x16 to 4x4)
X
X
X
X
CAVLC/UVLC
X
X
X
Error Resilience Tools Flexible MB Order, ASO,
Red. Slices
X
SP/SI Slices
X
X
X
B Slice
X
X
X
Interlaced Coding
X
X
CABAC
X
Data Partitioning
22Introduction
23Introduction
- Level corresponding to processing power and
memory capability of a codec
24Introduction
- Parameter set limits for each Level
25Layered Structure
- Two Layers Network Abstraction Layer (NAL),
Video Coding Layer (VCL) - NAL
- Abstracts the VCL data hence the name Network
Abstraction Layer - Header information about the VCL format
- Appropriate for conveyance by the transport
layers or storage media - NAL unit (NALU) defines a generic format for use
in both packet based and bit-streaming systems - VCL
- Core coding layer
- Concentrates on attaining maximum coding
efficiency
26Layered Structure
27Layered Structure
- Supporting picture format 420 chroma sampling
- CIF
- Format
- QCIF
- format
28Video Coding Algorithm
- Block diagram for H.264 encoder
29Video Coding Algorithm
- Block diagram for H.264 Decoder
30VC Algorithm Intra Prediction
- Exploits Spatial redundancy between adjacent
macroblocks in a frame - 4 x 4 luma block
- 9 prediction modes 8 Directional predictions
and 1 DC prediction - (vertical 0, horizontal 1, DC 2,
diagonal down left 3, diagonal down right 4, - vertical right 5, horizontal down
6, vertical left 7, horizontal up 8)
samples a, b, , p the predicted ones for the
current block, above and left samples A, B, , M
previously reconstructed ones
31VC Algorithm Intra Prediction
- Example of 4 x 4 luma block
- Sample a, d predicted by round(I/4 M/2
A/4), round(B/4 C/2 D/4) for mode 4 - Sample a, d predicted by round(I/2 J/2),
round(J/4 K/2 L/4) for mode 8
32VC Algorithm Intra Prediction
- 16 x 16 luma
- 4 prediction modes
- (vertical 0, horizontal 1, DC 2, plane 3)
- Plane works well in smoothly varying luminance.
- A linear plane function is fitted to the upper
(H) and left side (V) samples - (8x8) luma (FRExt only) similar to 4x4 luma with
low pass filtering of the predictor to improve
prediction performance
Plane
33VC Algorithm Intra Prediction
- Chroma always operates using full MB prediction
- (8x8) 420 Format
- (8x16) 422
- (16x16) 444
- (Similar to 16x16 luma block but different mode
order) - 4 Prediction modes
- (DC 0, Horizontal 1, Vertical 2, Plane 3)
34VC Algorithm Inter Prediction
- Exploits temporal redundancy
- Prediction of variable block sizes
- Sub-pel motion compensation
- Deblocking filter
- Management of multiple reference pictures
35VC Algorithm Inter Prediction
- Prediction of variable block size
- A MB can be partitioned into smaller block sizes
- 4 cases for 16 x 16 MB, 4 cases for 8 x 8 Sub-MB
- Large partition size homogeneous areas, small
detailed areas - Cannot mix the two partitions .i.e. cannot have
16x8 and 4x8 partitions - When sub-MB partition (8x8) is selected, the
(8x8) block can be further partitioned
36VC Algorithm Inter Prediction
- Sub-pel motion compensation
- Better compression performance than integer-pel
MC - Expense of increased complexity
- Outperforms at high bit rates and high
resolutions
37VC Algorithm Inter Prediction
- Sub-pel accuracy
-
-
- A distinct MV can be sent for each sub-MB
partition. ME can be based on multiple pictures
that lie in the past or in the future in display
order. Reference picture for ME is selected at
the MB partition level. Sub-MB partitions within
the same MB partition must use the same reference
picture.
38VC Algorithm Inter Prediction
- Half-pel interpolated from neighboring
integer-pel samples using a 6-tap Finite Impulse
Response filter with weights (1, -5, 20, 20, -5,
1)/32 - Quarter-pel produced using bilinear
interpolation between neighboring half- or
integer-pel samples
b round((E-5F20G20H-5IJ)/32) a
round((Gb)/2)
39VC Algorithm Inter Prediction
- Deblocking filter Adaptive
- To reduce the blocking artifacts in the block
boundary and prevent the propagation of
accumulated coded noise - Filtering is applied to horizontal or vertical
edges of 4 x 4 blocks in a macroblock, adaptively
on the several levels (slice, block-edge,
sample)
40VC Algorithm Inter Prediction
- Management of multiple reference pictures
- To take care of marking some stored pictures as
unused and deciding which pictures to delete
from the buffer
Bitstream Output
Video Input
Transform Quantization
Entropy Coding
-
Inverse Quantization Inverse Transform
Intra/Inter Mode Decision
Motion Compensation
Intra Prediction
Deblocking Filtering
Picture Buffering
Motion Estimation
41VC Algorithm Transform Quantization
- Transform
- Integer transform, multiplier free additions
and shifts in 16-bit arithmetic - Hierarchical structure 4 x 4 Integer DCT
Hadamard transform
Assignment of the indices of DC (dark samples) to
luma 4 x 4 block, the numbers 0, 1, , 15 are
the coding order for (4x4) integer DCT transform
(0,0), (0,1), (0,2), , (3,3) are DC coefficients
of each 4x4 block
Hadamard transform is applied only when (16x16)
intra prediction mode is used with (4x4) IntDCT.
Similarly for the chroma MB size for chroma
depends on 420, 422 and 444 formats
42VC Algorithm Transform
- 4 x 4 integer DCT
- X input pixels, Y output coefficients
- Y(Cf x CfT) Ef
434x4 Inverse IntDCT
Here
In both forward and inverse transforms QP
(Quantization step) is embedded in matrices Ef
and Ei
44VC Algorithm Transform
- Luma DC coefficients for Intra 16x16 MB
- 16 DC coefficients of 16 (4x4) blocks are
transformed using Walsh Hadamard transform
YD
where // rounding to the nearest integer
45VC Algorithm Transform
- Chroma DC coefficients Intra pediction mode (4x4)
IntDCT - Walsh Hadamard transform 2 x 2 DC coefficients
, 420
V
U
16
17
2x2 DC
22
19
23
18
AC
20
21
24
25
For 422 and 444 chroma formats Hadamard block
size is increased.
46VC Algorithm Transform
- Block diagram emphasizing transform
Bitstream Output
Video Input
Transform Quantization
Entropy Coding
-
Inverse Quantization Inverse Transform
- 4 x 4 integer DCT transform
-
- H
-
- - Hadamard transform of DC coefficients
- for 16 x 16 Intra luma and 8 x 8 chroma blocks
-
Intra/Inter Mode Decision
Motion Compensation
Intra Prediction
Deblocking Filtering
Picture Buffering
Motion Estimation
47VC Algorithm Quantization
- Multiplication operation for the exact transform
is combined with the multiplication of scalar
quantization - Encoder post-scaling and quantization
- Decoder inverse quantization and pre-scaling
X quantizer input Y quantizer output Qstep
quantization parameter, a total of 52 values,
doubles in size for every increment
of 6 in QP 8 for bits per decoded sample. FRExt
expands QP beyond 52 by 6 for each additional bit
of decoded sample SF scaling term
48VC Algorithm Transform, Quantization
- Rescale and Inverse transform
- Intra (16x16) prediction mode only
49VC Algorithm Entropy Coding
- All syntax elements other than residual transform
coefficients are encoded by the Exp-Golomb codes
(UVLC) - Scan order to read the residual data (quantized
transform coefficients) zig-zag, alternate - Context-based Adaptive Variable Length Coding
(CAVLC) in All Profiles - Context-based Adaptive Binary Arithmetic Coding
(CABAC) in Main Profile
Alternate scan
Zig-zag scan
50- Exponential Golomb codes (for data elements other
than tansform coefficients these codes are
actually fixed, and are also called Universal
Variable Length Codes (UVLC))
51- These are variable length codes with a regular
construction - M Zeroes 1 INFO
- INFO is an M-bit carrying information.
- The first codeword as no leading zero or
trailing info. - Code words 1 and 2 have a single-bit INFO field,
code words 3-6 have a two-bit INFO field and so
on. - The length of each Exp-Golomb codeword is (2M1)
bits. - M Floor (Log2 code_num 1)
- INFO code_num 1 2M
52Decoding
- Read in M leading zeroes followed by 1
- Read in M-bit INFO field
- Code_num 2M INFO 1
- (For codeword 0, INFO and M are zero)
- CAVLC Codes transform coefficients
- CABAC Codes transform coefficients and MV
- All other syntax elements are coded with the
Exp_Golomb codes
53VC Algorithm Entropy Coding
- CAVLC handles the zero and /-1 coefficients as
the different manner with the levels of
coefficients. The total numbers of zeros and /-1
are coded. For the other coefficients, their
levels are coded. - Encoding steps
- step 1 encode the total number of nonzero
coefficients and /-1 (trailing ones) values - step 2 encode the sign of each trailing one in
reverse order - step 3 encode the levels of the remaining
non-zero coefficients in reverse order - step 4 encode the total number of zeros before
the last coefficient - step 5 encode each run of zeros
- H.264 maintains 11 different sets of codes (4 for
of coefficients and 7 for the actual
coefficients) - These are adopted to the current stream or
context (thus CAVLC)
54VC Algorithm Entropy Coding
order
0 1 2 3 4 5 6 7 8 9 16
c0 c1 c2 0 1 1 0 1 0 0 0
coeff.
Step 1 encode for no. of nonzero total
coefficients and 1 or 1 (trailing ones)
from look-up table
no. of nonzero total coefficients 6 (order 0,
1, 2, 4, 5, 7) no. of trailing ones 3 (order
4, 5, 7)
Step 2 encode for sign of trailing one in
reverse order
- (order 7) , (order 5), (order 4)
Step 3 encode for level of remaining non-zero
coefficients in reverse order
c2 (order 2), c1, c0
Step 4 encode for total no. of zeros before the
last coefficient
2 (order 3, 6)
Step 5 encode for run of zeros in reverse order
1 (order 6-5), 0 (order 4), 1 (order 3-2)
55VC Algorithm Entropy Coding
- CABAC utilizes the arithmetic coding, also in
order to achieve good compression, the
probability model for each symbol element is
updated. Both MV and residual transform
coefficients are coded by CABAC. - Encoding steps
- step 1 context modeling Choose a suitable
model - step 2 binarization If a symbol is non-binary
valued it will be mapped into a sequence of
binary decisions called bins - step 3 binary arithmetic coding using
probability estimates provided by context modeling
56CABAC increases compression efficiency by 10
over CAVLC but computationally more intensive
57VC Algorithm B Slice
- Generalized Bidirectional prediction
- Supports not only forward/backward prediction
pair, but also forward/forward and
backward/backward pairs - Direct mode
- Derives reference picture, block size, and motion
vector data from the subsequent inter picture. - Weighted prediction
- Scaling operation by applying a weighting factor
to the samples of motion-compensated prediction
data in P or B slice. - Pictures coded using B slices can be used as
references for decoding of subsequent pictures in
decoding order (with an arbitrary relationship to
such pictures in display order)
58VC Algorithm B Slice
- Generalized Bidirectional prediction
- Multiple reference pictures mode
- Two forward references proper for a region just
before scene change - Two backward references proper for a region
just after scene change
59VC Algorithm B Slice
- Direct mode
- Forward / backward pair of bi-directional
prediction - Prediction signal is calculated by a linear
combination of two blocks that are determined by
the forward and backward motion vectors pointing
to two reference pictures.
mvL0 tb ? mvCol / td mvL1 (td tb) ?
mvCol / td where mvCol is a MV used in the
co-located MB of the subsequent picture
60VC Algorithm B Slice
- Weighted prediction
- Different weights of reference signals for
gradual transitions from scene to scene, i.e.,
fade to black (the luma samples of the scene
gradually approach zero), fade from black - Different weighted prediction method for a
macroblock of P slice or B slice - A prediction signal p for B slice is obtained by
different weights from two reference signals, r1
and r2. - p w1 ? r1 w2 ? r2
- where w1 and w2 are weighting factors
- Implicit type the factors are calculated based
on the temporal distance between the pictures - Explicit type the factors are transmitted in
the slice header
61VC Algorithm SP and SI Slices (Extended profile
only)
- Switched slice
- SP slice the specially coded slice for
efficient switching between video streams,
similar to coding of a P slice - SI slice the switched slice, similar to coding
of an I slice
Allows bit stream switching and additional
functionalities such as random access, fast
forward, reverse and stream splicing.
62Error Resilience
- Parameter setting
- Flexible macroblock ordering (FMO)
- Redundant slice methods
- Switched slice SP/SI
- Data partitioning
- Arbitrary Slice Order ASO
Only in Extended Profile
63Data partitioning slices (Extended profile only)
- Coded data of a slice is placed in three separate
data partitions A,B C. - A has slice header and header data for each MB in
the splice - B has coded residual data for intra and SI slice
MBs - C has coded residual data for inter coded MB
- Place each partition A, B C in a separate NAL
unit and transport separately
64Error Resilience Parameter setting
- The sequence parameter set contains all
information related to a sequence of pictures - a picture parameter set contains all information
related to all the slices belonging to a single
picture. - The encoder chooses the appropriate picture
parameter set to use by referencing the storage
location in the slice header of each coded slice.
H.264 Encoder
H.264 Decoder
VCL Data transfer with PS 3
1
2
3
3
2
1
Reliable Parameter Set Exchange
- Parameter Set 3
- Video format NTSC
- Motion Resolution ¼
- Enc CABAC
- Frame width 11
65Error Resilience FMO
- Flexible macroblock ordering allows to assign
macroblocks to slices in an order other than the
scan order. - Assume that all macroblocks of the picture are
allocated either to slice group 0 or slice group
1, and the macroblocks in each slice group are
dispersed through the picture. - If the packet containing the information of slice
group 1 is lost during transmission, then the
lost macroblock can be recovered by the error
concealment mechanism, since every lost
macroblock has several spatial neighbors that
belong to the other slice. - ASO is similar to FMO. Randomizes data prior to
transmission. Errors are distributed more
randomly over the video frames rather than in a
single block of data.
66Error Resilience Redundant Slice
- Redundant slices allow to place one or more
redundant representations of the same
macroblocks. - For example, the primary representation can be
coded with a low quantization parameter (hence in
good quality), whereas the redundant slice can be
coded with a high quantization parameter (hence,
in a much coarser quality, but also utilizing
fewer bits). - A decoder reacts to redundant slices by
reconstructing only the primary slice, if it is
available, and discarding the redundant slice.
However, if the primary slice is missing, the
redundant slice can be reconstructed.
67Comparison of Coding Efficiency
- Subjective verification test
- Comparison of the H.264 Baseline Profile (BP) and
MPEG-4 part 2 Simple Profile (SP) for the
multimedia definition (MD). The numbers in the
table indicate the coding efficiency improvement
achieved by the H.264 where the codecs being
compared provide statistically equivalent picture
quality. The letter T indicates that H.264
achieved transparency. - H.264 Baseline Profile achieves a coding
efficiency improvement of 2 times or greater in
14 out of 18 statistically conclusive cases.
68Comparison of Coding Efficiency
- Subjective verification test
- Comparison of H.264 Main Profile (MP) and MPEG-4
Part 2 Advanced Simple Profile (ASP) for the MD. - H.264 Main Profile achieves a coding efficiency
improvement of 2 times or greater in 18 out of 25
statistically conclusive cases.
69Comparison of Coding Efficiency
- Subjective verification test
- Comparison of H.264 Main Profile and MPEG-2 for
the Standard Definition (SD) - When compared to MPEG-2 HiQ (real-time High
Quality), H.264 Main Profile achieves a coding
efficiency improvement of 1.5 times or greater in
8 out of 12 statistically conclusive cases. - When compared to MPEG-2 TM5, H.264 Main Profile
achieves a coding efficiency improvement of 1.8
times or greater in 9 out of 12 statistically
conclusive cases.
70Comparison of Coding Efficiency
- Subjective verification test
- Comparison of H.264 Main Profile and MPEG-2 for
the High Definition (HD) - When compared to MPEG-2 HiQ, H.264 Main Profile
achieves a coding efficiency improvement of 1.7
times or greater in 7 out of 9 statistically
conclusive cases. - When compared to MPEG-2 TM5, H.264 Main Profile
achieves a coding efficiency improvement of 1.7
times or greater in 8 out of 9 statistically
conclusive cases.
71Comparison of Coding Efficiency
- Objective test
- PSNR (between original and reconstructed
pictures) and bitrate saving results of Tempete
CIF 15Hz sequence for the video streaming
application
HLP High Latency Profile ASP Advanced
Simple Profile H.26L H.264 Main Profile
72Comparison of Coding Efficiency
- Objective test
- PSNR and bitrate saving results of Paris CIF
15Hz sequence for the video conferencing
application
CHC Conversational High Compression SP Simple
Profile ASP Advanced Simple Profile H.26L
H.264 Baseline Profile
73Conclusions
- H.264 outperforms over the previous standards
- Comparison of standards
Feature/Standard MPEG-1 MPEG-2 MPEG-4 part 2 (visual) H.264/MPEG-4 part 10
Macroblock size 16x16 16x16 (frame mode) 16x8 (field mode) 16x16 16x16
Block Size 8x8 8x8 16x16, 16x8, 8x8 16x16, 8x16, 16x8, 8x8, 4x8, 8x4, 4x4
Transform 8x8 DCT 8x8 DCT 8x8 DCT/Wavelet 4x4, 8x8 Int DCT 4x4, 2x2 Hadamard
Quantization Scalar quantization with step size of constant increment Scalar quantization with step size of constant increment Vector quantization Scalar quantization with step size increase at the rate of 12.5
Entropy coding VLC VLC VLC VLC, CAVLC, CABAC
Motion Estimation Compensation Yes Yes Yes Yes, more flexible Up to 16 MVs per MB
Playback Random Access Yes Yes Yes Yes
74Conclusions
- Comparison of standards (continued)
Feature/Standard MPEG-1 MPEG-2 MPEG-4 part 2 (visual) H.264/MPEG-4 part 10
Pel accuracy Integer, ½-pel Integer, ½-pel Integer, ½-pel, ¼-pel Integer, ½-pel, ¼-pel
Profiles No 5 8 4
Reference picture one one one multiple
Bidirectional prediction mode forward/backward forward/backward forward/backward forward/forward forward/backward backward/backward
Picture Types I, P, B, D I, P, B I, P, B I, P, B, SP, SI
Error robustness Synchronization concealment Data partitioning, FEC for important packet transmission Synchronization, Data partitioning, Header extension, Reversible VLCs Data partitioning, Parameter setting, Flexible macroblock ordering, Redundant slice, Switched slice
Transmission rate Up to 1.5Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps
Compatibility with previous standards n/a Yes Yes No
Encoder complexity Low Medium Medium High
75Conclusions
- Currently the commercial H.264 codecs are widely
developed by several companies for replacing /
complementing existing products. - Related companies
- UBVideo website http//www.ubvideo.com
- LSI Logic website http//www.lsilogic.com
- Microsoft website http//www.microsoft.com
- Envivio website http//www.envivio.com
- Broadcom website http//www.broadcom.com
- Nagravision website http//www.nagravision.com
- Philips website http//www.philips.com
- Polycom website http//www.polycom.com
- PixelTools Corporation website
http//www.pixeltools.com - Amphion website http//www.amphion.com
76Conclusions
- Related companies (continued)
- Ligos Corporation website http//www.ligos.com
- LifeSize website http//www.lifesize.com
- Netvideo website http//www.netvideo.com
- Motorola website http//www.motorola.com
- Vanguard Software Solutions website
http//www.vsofts.com - STMicroelectronics website http//us.st.com
- MainConcept website http//www.mainconcept.com
- Impact Labs Inc. website http//www.impactlabs.co
m - Sorenson media AVC Pro codec (H.264)
- Blu-Ray Disc Association (BDA) MPEG-4 AVC High
Profile and Microsofts VC-1 video codec (based
on Windows Media Video 9 codec) mandatory
(blu-ray Disc BD-ROM specification)
77Conclusions
- Related group
- MPEG website http//www.mpeg.org
- JVT website ftp//standards.polycom.com
- www.mpegif.org
- Test software
- http//iphome.hhi.de/suehring/tml/download
- H.264/AVC JM Software http//bs.hhi.de/suehring/
tml/download - Test sequences
- http//ise.stanford.edu/video.html
- http//kbs.cs.tu-berlin.de/stewe/vceg/sequences.h
tm - http//www.its.bldrdoc.gov/vqeg
- ftp.tnt.uni-hannover.de/pub/jvt/sequences/
- http//trace.eas.asu.edu/yuv/yuv.html
78Conclusions
- H.264 licensing MPEG LA and Via Licensing are
now coordinating the licensing terms,
decoder-encoder royalties for product
manufacturers and participation fees for video
streaming services regardless of Profile(s) - MPEG LA website http//www.mpegla.com
- Via Licensing http//www.vialicensing.co
m - FRExtensions
- to 422 and 444 chroma formats
- 12 bit resolution for medical imaging
- Scalable coding/ Lossless coding for digital
cinema application - High fidelity coding for the next generation
optical discs - Extension for various applications H. Schwartz,
D. Marpe and T. Wiegand, SNRscalable extension
of H.264/AVC, ICIP 2004, vol. , pp. ,
Singapore, Oct. 2004. - FINAL STAGES OF APPROVAL
- Standard systems and file format support
specifications - Standardizing reference software implementation
- Standardizing conformance bit streams and
specifications
79Contacts for Further Information
- JVT documents and software on open ftp website
ftp//standards.polycom.com - http//iphome.hhi.de/suehring
- JVT reflector subscription
- http/mail.imtc.org/cgi-bin/lyris.pl?enterjvt-ex
perts - JVT reflector e-mail
- jvt-experts_at_mail.imtc.org
- JVT management team
- Chair Gary Sullivan (garysull_at_microsoft.com)
- Co-chair Ajay Luthra (aluthra_at_motorola.com)
- Co-chair Thomas Wiegand (wiegand_at_hhi.de)
- Dr. K. R . Rao, UTA rao_at_uta.edu
- Dr. S. K. Kwon, Dongeui University
skkwon_at_dongeui.ac.kr - Ms. A. Tamhankar, T-Mobile arundhati_at_ieee.org
- Karsten.suehring_at_hhi.fraunhofer.de
80References
1 MPEG-2 ISO/IEC JTC1/SC29/WG11 and ITU-T,
ISO/IEC 13818-2 Information Technology-Generic
Coding of Moving Pictures and Associated Audio
Information Video, ISO/IEC and ITU-T, 1994.
2 MPEG-4 ISO/IEC JTCI/SC29/WG11, ISO/IEC 14
4962000-2 Information on Technology-Coding of
Audio-Visual Objects-Part 2 Visual, ISO/IEC,
2000. 3 H.263 International
Telecommunication Union, Recommendation ITU-T
H.263 Video Coding for Low Bit Rate
Communication, ITU-T, 1998. 4 H.264
International Telecommunication Union,
Recommendation ITU-T H.264 Advanced Video
Coding for Generic Audiovisual Services, ITU-T,
2003. 5 T. Stockhammer, M. Hannuksela, and S.
Wenger, H.26L/JVT Coding Network Abstraction
Layer and IP-based Transport, IEEE ICIP 2002,
Rochester, New York, Vol. 2, pp. 485-488, Sep.
2002.
816 P. List, A. Joch, J. Lainema, G. Bjontegaard,
and M. Karczewicz, Adaptive Deblocking Filter,
IEEE Trans. CSVT, Vol. 13, pp. 614-619, July
2003. 7 K. R. Rao and P. Yip, Discrete Cosine
Transform, Academic Press, 1990. 8 I. E.G.
Richardson, H.264 and MPEG-4 Video Compression
Video Coding for Next-generation Multimedia,
Wiley, 2003. 9 H. S. Malvar, A. Hallapuro, M.
Karczewicz, and L. Kerofsky, Low-Complexity
Transform and Quantization in H.264/AVC, IEEE
Trans. CSVT, Vol. 13, pp. 598-603, July
2003. 10 S. W. Golomb, Run-Length Encoding,
IEEE Trans. on Information Theory, IT-12, pp.
399-401, December 1966. 11 D. Marpe, H.
Schwarz, and T. Wiegand, Context-Based Adaptive
Binary Arithmetic Coding in the H.264/AVC Video
Compression Standard, IEEE Trans. CSVT, Vol. 13,
pp. 620-636, July 2003.
8212 M. Flierl and B. Girod, Generalized B
Picture and the Draft H.264/AVC Video-Compression
Standard, IEEE Trans. CSVT, Vol. 13, pp.
587-597, July 2003. 13 M. Karczewicz and R.
Kurceren, The SP- and SI-Frames Design for
H.264/AVC, IEEE Trans. CSVT, Vol. 13, pp.
637-644, July 2003. 14 S. Wenger, H.264/AVC
Over IP, IEEE Trans. CSVT, Vol. 13, pp. 645-656,
July 2003. 15 ISO/IEC JTC1/SC29/WG11, Report
of The Formal Verification Tests on AVC
(ISO/IEC14496-10 ITU-T Rec. H.264),
MPEG2003/N6231, December 2003. 16 M. Ghanbari,
Standard Codecs Image Compression to Advanced
Video Coding, Hertz, UK IEE, 2003. 17 A.
Joch, F. Kossentini, H. Schwarz, T. Wiegand, and
G. J. Sullivan, Performance Comparison of Video
Coding Standards using Lagrangian Coder Control,
IEEE ICIP 2002, Rochester, New York, Vol. 2, pp.
501-504, Sept. 2002.
8318 T. Wiegand, G. J. Sullivan, G. Bjontegaard,
and A. Luthra, Overview of the H.264/AVC Video
Coding Standard, IEEE Trans. CSVT, Vol. 13, pp.
560-576, July 2003. 19 MPEG website
http//www.mpeg.org 20 JVT website
ftp//standards.polycom.com 21 MPEG LA website
http//www.mpegla.com 22 H.264 / AVC JM
Software http//bs.hhi.de/suehring/tml/download
23 UBVideo website http//www.ubvideo.com 24
LSI Logic website http//www.lsilogic.com 25
Microsoft website http//www.microsoft.com 26
Envivio website http//www.envivio.com 27
PixelTools Corporation website
http//www.pixeltools.com 28 Nagravision
website http//www.nagravision.com 29 Philips
website http//www.philips.com
8430 Polycom website http//www.polycom.com 31
MainConcept website http//www.mainconcept.com
32 Amphion website http//www.amphion.com 33
Ligos Corporation website http//www.ligos.com 3
4 LifeSize website http//www.lifesize.com 35
Broadcom website http//www.broadcom.com 36
Netvideo website http//www.netvideo.com 37
Motorola website http//www.motorola.com 38
http//www.mediaware.com 39 Impact Labs Inc.
website http//www.impactlabs.com 40 Vanguard
Software Solutions website http//www.vsofts.com
41 STMicroelectronics website http//us.st.com
www.thomson.net 42 www.conexant.com (H.264
decoder ICs _ HDTV SDTV) 43 www.pixtree.com
8544 BT Exact--http//www.btexact.bt.com/ 45
DemoGaFrX--www.dolby.com 46 Equator--http//www.
equator.com/ 47 Moonlight--www.elecard.com 48
Sand Video--www.broadcom.com/ 49
VideoLocus-http//www.lsilogic.com/technologies/in
dustry_standards/mpeg_based_standards_h_264.html
50 WW Communications (and DSP
Research)--http//www.wwcoms.com/ 51 Cisco
Systems -- www.cisco.com 52 Deutsche Telekom--
http//www.telekom3.de/en-p/home/cc-startseite.htm
l
86- 53 FastVDO-- http//www.fastvdo.com/
- 54 Glance Networks---http//www.glance.net
- 55 RADVISION-- www.radvision.com/
- 56 Sun Microsystems--http//www.sun.com/
- 57 S. Srinivasan et al, Windows media video
9 Overview and applications, Signal Processing
Image Communication, vol.19, pp. 851-875, Oct.
2004. - 57a G. Sullivan and T. Wiegand, Video
compression from concepts to H.264/AVC
standard, Proc. IEEE, vol.93, pp. 18-31, Jan.
2005. - 57b C. Gomila, The H. 264/MPEG -4 AVC video
coding standard, Short tutorial, EURASIP News
Letter, vol. 15, pp. 19-34, June 2004. - 58 http//ecs.itu.ch
87- 59 N. Kamaci and Y. Altunbasak, Performance
comparison of the emerging H.264 video coding
standard with the existing standards, IEEE ICME,
pp. , Baltimore, MD, July 2003. - 60 H. Schwartz, D. Marpe and T. Wiegand,
SNRscalable extension of H.264/AVC, ICIP 2004,
vol. , pp. , Singapore, Oct. 2004. - 61 G. J. Sullivan, P. Topiwala and A. Luthra
The H.264/AVC advanced video coding standard
Overview and introduction to the fidelity range
extensions, SPIE Conf. on applications of
digital image processing XXVII, vol. 5558, pp.
53-74, Aug. 2004. - 62 J. Ostermann et al, Video coding with
H.264/AVC Tools, performance and complexity,
IEEE CAS Magazine, vol. pp.7-34, I quarter,
2004. - 63 W. Gao et al, AVS The Chinese
next-generation video coding standard, NAB 2004,
Las Vegas, NV, April 2004. - 64 http//www.imtc.org/activity_groups/
JVT-EXPERTS LIST (FAQ)
88- 65 H.264 / AVC reference SOFWARE 9.3
- 66 http//iphome.hhi.de/suehring/tml/download/jm
93.zip - 67 S. Kumar et al Overview of error resiliency
schemes in H.264/AVC standard, JVCIR, Special
Issue on H.264/AVC, VOL. , pp. , June-Aug. 2005. - 68 www.stmicroelectronics.com WMV 9 and HD
H.264/AVC decoder chip (STB7100) - 69 a. Concept Main
- http//www.mainconcept.com/index_flash.shtml
- b. Mpegable
- http//www.mpegable.com/show/home.html
- c. Moonlight
- http//www.moonlight.co.il/cons_xmuxer.php
- Moonlights codec is one of the popular ones in
the industry and it supports AAC. All the codecs
have a trial version for download and also sample
video clips are available.
89- 70 ST Thomson, Broadcom and Ateme
- http//www.ateme.com/products/h264.php
- have decoder chips for H.264. Ateme has real
time single chip H.264 Main profile encoder
(FPGA) - 71 Moscow State University has published a
study of current implementation of H.264
standard, including a widely-used implementation
of MPEG-4 ASP as a reference. - The study is available at
- http//compression.ru/video/codec_comparison/mpe
g-4_avc_h264_en.html - Some of the results and observations in the
study may be interesting to H.264/AVC community. -
- Another interesting test has been performed in
December 2004. - http//www.doom9.org/codecs-104-1.htm The
methodology is completely different than the one
used by the Moscow State University. - It features H264, WM9, RV10, VP6 and MPEG-4 ASP.
90- http//www.avc-alliance.org
- http//ftp3.itu.int/av-arch/jvt-site
- Http//www.dvdforum.org/29cmtg-resolution.htm\
- High Profile is now officially mandatory for HD
DVD Video (DVD - Forum). - http//tinyurl.com/3u9ww (up to 3 recommendations
can be downloaded per year) - http//tinyurl.com/6dnck (ISO/IEC 14493-10 -
MPEG-4 part 10 published standard costs CHF
260.00 Swiss Franks.)
91Fidelity Range Extensions
- Slices in a picture are compressed as follows
- ? "Intra" spatial (block based) prediction
- o Full-macroblock luma or chroma prediction 4
modes (directions) for prediction - o 8x8 (FRExt-only) or 4x4 luma prediction 9
modes (directions) for prediction - 422, 444 Formats
- gt 8 bit depths
- (8x8) integer DCT
- HVS weighting matrices
- Transform bypass lossless mode uses prediction
and entropy coding of prediction errors - Residual color transform
- Source editing such as Alpha blending
- High bit rates use RGB color format Y Cg Co
- High resolution
92- ? "Inter" temporal prediction block based
motion estimation and compensation - o Multiple reference pictures
- o Reference B pictures
- o Arbitrary referencing order
- o Variable block sizes for motion compensation
- Seven block sizes
- 16x16, 16x8, 8x16, 8x8, 8x4, 4x8 4x4
- o 1/4-sample luma interpolation (1/4 or
1/8th-sample chroma interpolation) - o Weighted prediction
- o Frame or Field based motion estimation for
interlaced scanned video
93? Interlaced coding features o Frame-field
adaptation Picture Adaptive Frame Field
(PicAFF) Choice of compression (frame or field)
is selected a the frame level MacroBlock
Adaptive Frame Field (MBAFF) o Field scan ?
Lossless representation capability o Intra PCM
raw sample-value macroblocks o Entropy-coded
transform-bypass lossless macroblocks
(FRExt-only) In the MBAFF, choice of compression
(frame or field) is selected at the
two-vertical-pair-MB pair.
94? 8x8 (FRExt-only) or 4x4 Integer Inverse
Transform (conceptually similar to the well-known
DCT) ? Residual color transform for efficient
RGB coding without conversion loss or bit
expansion (FRExt-only) ? Scalar quantization ?
Encoder-specified perceptually weighted
quantization scaling matrices (FRExt-only) ?
Logarithmic control of quantization step size as
a function of quantization control parameter
95- ? Deblocking filter (within the motion
compensation loop) - ? Coefficient scanning
- o Zig-Zag (Frame)
- o Field (alternate scan)
- ? Lossless Entropy coding
- o Universal Variable Length Coding (UVLC) using
Exp-Golomb codes - o Context Adaptive VLC (CAVLC)
- o Context-based Adaptive Binary Arithmetic
Coding (CABAC)
96- ? Error Resilience Tools
- o Flexible Macroblock Ordering (FMO)
- o Arbitrary Slice Order (ASO)
- o Redundant Slices
- ? SP and SI synchronization pictures for
streaming and other uses
97- ? Various color spaces supported (YCbCr of
various types, YCgCo, RGB, etc. especially in
FRExt) - ? 420, 422 (FRExt-only), and 444
(FRExt-only) color formats - ? Auxiliary pictures for alpha blending
(FRExt-only) - Each slice need not use all these tools.
Depending upon the subset of these tools, a slice
can be I, P, B, SP or SI. A picture may contain
different slice types.
98Slice I (Intra) P (Predicted) B
(Bidirectionally predicted) (Reference for
temporal prediction or non-reference) SP
(Switching P) SI (Switching I)
99I Slice (MB in I slice and intra MB in P and B
slices) Spatial intra prediction 9 directional
modes for (4x4) or (8x8) blocks. Apply (4 x4)
or (8x8) IntDCT to Intra prediction errors.
Note (8x8) IntDCT for FRExt-only. After
(8x8) IntDCT, HVS weighting is applied to
coefficients (FRExt-only).
100Quantized transform coefficients are scanned
(zigzag or field) and then entropy coded (CAVLC
or CABAC) PICAFF Field processing similar to
frame mode MBAFF If MB pair in field mode
(frame mode), field (frame) neighbors are
used for spatial prediction.
101- I Slice (Spatial Prediction)
- (16x16) Luma Corresponding chroma block size
for full MB prediction - (8x8) luma prediction (FRExt-only)
- (4x4) Luma prediction
102- For (16x16) luma, full MB prediction has four
modes - Vertical pels in MB predicted from pels just
above of MB - Horizontal pels in MB predicted from pels just
left of MB - DC pels in MB are predicted as average value of
the neighboring pels - Planar Prediction
- Assume MB covers diagonally increasing luma
values. - Predictor is formed based upon the planar
equation.
103- Chroma spatial prediction (operates on entire MB)
- 420 (8x8) Similar to (16x16) Luma MB
prediction -
- 422 (8x16) Vertical, Horizontal, DC,
Planar - 444 (16x16)
104FRExt Only
For (8x8) luma intra prediction Nine Intra_8x8
prediction modes similar to the nine modes for
Intra_4x4
105FRExt Only
Integer 8x8 Transform (luma only)
106FRExt Only HVS Weighting Matrices
107HVS Weighting Matrices
- Scaling matrix reflecting visual perception is
simply a multiplier applied during the inverse
quantization. (This itself is a multiplication) - Weighting matrices can be customized separately
for - 4x4 Intra Y
- 4x4 Intra Cb, Cr
- 4x4 Inter Y
- 4x4 Inter Cb, Cr
- 8x8 Intra Y
- 8x8 Inter Y
108FRExt Only
- Two scans similar to 4x4 transform switched for
frame/field coding - Coefficient scanning is based on the decreasing
variances and to maximize number of zero-valued
coefficients along the scan
Frame Zig-Zag
Field
109Examples of parameters to be encoded
- Parameters Description
- Sequence, picture and Headers and parameters
- slice-layer syntax elements
- Macroblock type mb_type Prediction method for
each coded macroblock - Coded block pattern Indicates which blocks
within a macroblock contain coded
coefficients - Quantiser parameter Transmitted as a delta value
from the previous value of QP - Reference frame index Identify reference
frame(s) for inter prediction - Motion vector Transmitted as a difference (mvd)
from predicted motion vector - Residual data Coefficient data for each 4x4 or
2x2 block
110Exponential Golomb Codes (for data elements other
than transform coefficients these codes are
actually fixed, and are also called Universal
Variable Length Codes (UVLC))
111- These are variable length codes with a regular
construction - M Zeros 1 INFO
- INFO is an M-bit field carrying information.
- The first codeword has no leading zero or
trailing INFO. - Code words 1 and 2 have a single-bit INFO field,
code words 3-6 have a two-bit INFO field and so
on. -
-