ThreadParallel MPEG2, MPEG4 and H.264 Video Encoders for SoC MultiProcessor Architecture - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

ThreadParallel MPEG2, MPEG4 and H.264 Video Encoders for SoC MultiProcessor Architecture

Description:

A number of lossy video compression standards have been ... Trellis quantization. Custom quantization matrices. 21. Methodology. Thread-parallel MPEG-4 (2/8) ... – PowerPoint PPT presentation

Number of Views:194

Avg rating:3.0/5.0

Slides: 39

Provided by: vcCsNt

Category:

more less

Transcript and Presenter's Notes

Title: ThreadParallel MPEG2, MPEG4 and H.264 Video Encoders for SoC MultiProcessor Architecture

1
Thread-Parallel MPEG-2, MPEG4 and H.264 Video
Encoders for SoC Multi-Processor Architecture

Tom R. Jacobs, Vassilios A. Chouliars,
and David J. Mulvaney

IEEE Transactions on Consumer Electronics
2
Outline

Introduction
Background knowledge
Main purpose
Previous work
Methodology
Experimental results
Conclusions

3
IntroductionBackground Knowledge (1/5)

A number of lossy video compression standards
have been developed.
MPEG-1, MPEG-2, MPEG4-PART2, H.264
In order to maintain image quality and reduce
bit-rates

Additional computation and power consumption
4
IntroductionBackground Knowledge (2/5)

Such processing-intense consumer application
algorithms are generally implemented in
System-On-Chip (SOC) devices.
Parallelism
DLP ? Data-Level Parallelism
TLP ? Thread-Level Parallelism

5
IntroductionBackground Knowledge (3/5)

Data-Level Parallelism (DLP)
Distributing the data across different parallel
processing nodes.

Program if CPU"a" then low_limit1
upper_limit5 else if CPU"b" then
low_limit6 upper_limit10 end if do i
low_limit , upper_limit Task on d(i) end do
... end program
6
IntroductionBackground Knowledge (4/5)

Processing node
Processing node
1
2
7
10
3
4
5
6
8
9
Data array D of size 10
7
IntroductionBackground Knowledge (5/5)

Thread-Level Parallelism (TLP)
TLP is the parallelism inherent in an application
that runs multiple threads at once.
Benefit-
Distributing the workload of a single
high-performance processor among a number of
slower and simpler processor cores.

8
IntroductionMain Purpose (1/2)

Utilizing Thread-Level Parallel (TLP) techniques
to improve the performance on video coding.
Reduce DIC (Dynamic Instruction Count).
How to improve?
Workload distribution among a number of
parallel-executing processors.

9
IntroductionMain Purpose (2/2)

The results presented demonstrate that reductions
in dynamic instruction count can be achieved.

10
Previous Work

The majority of this research is focused on
coarse-granularity TLP exploitation, with
distribution the workload most commonly at GOP
level.

Little inter-node communication
Multi-threading
GOP
GOP
GOP
GOP
GOP
GOP
11
Previous Work

In 1995, K. Shen, L. A. Rowe, and E.J. Delp
implemented parallel MPEG-1 at GOP level.
In 1996, S. Bozoki, S. J. P. Westen, R. L.
Lagendijk and J. Biemond performed a comparison
between GOP and slice level on MPEG-1.

12
Previous Work

In 1997, A. Bilas, J. Fritts and J. P. Singh
evaluated the performance of MPEG-2 decoders
using shared memory system.
Akramullah, Ahmad and Liou implemented a threaded
MPEG-2 encoder at the MB level by using local
memory.

13
MethodologyOverview

The threaded MPEG-2 , MPEG-4 and H.264
implemented were compiled on multi-context
instruction simulator (MT-ISS) based on
SimpleScalar infrastructure.
The most important issue
Data dependancies between processors.
Avoid race hazards.

14
MethodologyRace hazards
Expected condition
Error condition
Thread 1
Thread 2
Thread 1
Thread 2
0
1
1
2
0
0
1
1
i1
i1
i1
i1
Race hazards
0
1
2
0
1
1
Integer i
Integer i
15
MethodologyThread-parallel MPEG-2 (1/5)

Test model 5 (TM5) of MPEG-2 encoder is used.
Computation analysis (QCIF)
DIST1 ? 5273 of total DIC for a search window
of 6 to 62 pels respectively.
FullSearch ? 3.523.2 of total DIC.
Can be improved by less complex algorithmic ME
method. (such as 3-step, 4-step, diamond)
FDCT, and IDCT ? 2.121 of total DIC.

16
MethodologyThread-parallel MPEG-2 (2/5)
17
MethodologyThread-parallel MPEG-2 (3/5)

Motion Estimation
Kernel implementation can take advantage of data
parallel techniques.
Store the information in mbinfo structure for
motion compensation.
Maintain exclusivity of all variables during the
parallel sections.

18
MethodologyThread-parallel MPEG-2 (4/5)

Forward transform
FDCT first scans the MBs on a row-by-row basis,
process these MBs in a row individually.
Determine prediction error and applies the DCT to
the block.
Thread-parallel transform function can be
performed in block-level.

19
MethodologyThread-parallel MPEG-2 (5/5)

Inverse transform
IDCT scans the MBs first row-by-row and then
block-by-block.
Due to the absence of data dependencies between
blocks
? Can executed as parallel.

20
MethodologyThread-parallel MPEG-4 (1/8)

The implementation is based on XviD project with
Advanced Simple Profile (ASP).
Bidirectional frames
Quarter-pel motion compensation
Global motion compensation
Trellis quantization
Custom quantization matrices

21
MethodologyThread-parallel MPEG-4 (2/8)

Computation analysis (QCIF)

22
MethodologyThread-parallel MPEG-4 (3/8)

The nature of XivD encoder
Intra-frame encoding
Inter-frame encoding

23
MethodologyThread-parallel MPEG-4 (4/8)

Intra-frame encoding
FrameCodeI (row-by-row for each MBs)
Parallelize the loop for encoding the MBs in a
row of the image.
MB data structure ? pMB.
Shared memory array.
The highest DIC metric in FrameCodeI is
MBTransQuantIntra.

24
MethodologyThread-parallel MPEG-4 (5/8)

MBTransQuantIntra
Forward transformation, quantization and inverse
transformation.
Shared data structure ? pEnc
Includes a count of quantization values.
Serial code section.
Transform specific MB pixel data into the
frequency domain independently.
MBPrediction and MBCoding
Responsible for VLC and write to bitstream.

25
MethodologyThread-parallel MPEG-4 (6/8)

Inter-frame encoding
FrameCodeP
Part 1
? Motion Estimation
Part 2
? Transformation
? Quantization
? MC

26
MethodologyThread-parallel MPEG-4 (7/8)

Motion Estimation
Determine a MV for every MB and applies certain
criteria to indicate when Intra coding should be
used.
Scanning in raster line order.
Two kind of the process
Motion prediction from current frame.
ME relative to reference frames.

27
MethodologyThread-parallel MPEG-4 (8/8)

Motion Prediction
Examining the MVs in neighbouring MBs and
determining an initial estimate for ME.

Ideal pattern
typical pattern
TLP pattern
?
?
?
?
?
?
?
?
?
?
28
MethodologyH.264 (1/6)

Using x264 for implementation.
Frame slicing
Main problems of using MB-level
Wide variation in processor workload.
The modification of prediction algorithm is
needed.

29
MethodologyH.264 (2/6)

Slice group in H.264
A group of MBs in a frame.
Can be encoded or decoded separatedly from the
remainder of the frame.
Not allowing motion prediction cross slice
boundaries.
Drawback
The required bit-rate increase.

30
MethodologyH.264 (3/6)

Comparison of different slice number

31
MethodologyH.264 (4/6)

Comparison of different slice number

32
MethodologyH.264 (5/6)

Different resolution with 4 slices

33
MethodologyH.264 (6/6)

Computation analysis

34
Experimental ResultsMPEG-2
Search Range
35
Experimental ResultsMPEG-4
Quality Setting
36
Experimental ResultsH.264
Quantization Parameter
37
Experimental ResultsComparative results
38
Conclusions