ThreadParallel MPEG2, MPEG4 and H.264 Video Encoders for SoC MultiProcessor Architecture - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

ThreadParallel MPEG2, MPEG4 and H.264 Video Encoders for SoC MultiProcessor Architecture

Description:

A number of lossy video compression standards have been ... Trellis quantization. Custom quantization matrices. 21. Methodology. Thread-parallel MPEG-4 (2/8) ... – PowerPoint PPT presentation

Number of Views:194
Avg rating:3.0/5.0
Slides: 39
Provided by: vcCsNt
Category:

less

Transcript and Presenter's Notes

Title: ThreadParallel MPEG2, MPEG4 and H.264 Video Encoders for SoC MultiProcessor Architecture


1
Thread-Parallel MPEG-2, MPEG4 and H.264 Video
Encoders for SoC Multi-Processor Architecture
  • Tom R. Jacobs, Vassilios A. Chouliars,
  • and David J. Mulvaney

IEEE Transactions on Consumer Electronics
2
Outline
  • Introduction
  • Background knowledge
  • Main purpose
  • Previous work
  • Methodology
  • Experimental results
  • Conclusions

3
IntroductionBackground Knowledge (1/5)
  • A number of lossy video compression standards
    have been developed.
  • MPEG-1, MPEG-2, MPEG4-PART2, H.264
  • In order to maintain image quality and reduce
    bit-rates

Additional computation and power consumption
4
IntroductionBackground Knowledge (2/5)
  • Such processing-intense consumer application
    algorithms are generally implemented in
    System-On-Chip (SOC) devices.
  • Parallelism
  • DLP ? Data-Level Parallelism
  • TLP ? Thread-Level Parallelism

5
IntroductionBackground Knowledge (3/5)
  • Data-Level Parallelism (DLP)
  • Distributing the data across different parallel
    processing nodes.

Program if CPU"a" then low_limit1
upper_limit5 else if CPU"b" then
low_limit6 upper_limit10 end if do i
low_limit , upper_limit Task on d(i) end do
... end program
6
IntroductionBackground Knowledge (4/5)

Processing node
Processing node
1
2
7
10
3
4
5
6
8
9
Data array D of size 10
7
IntroductionBackground Knowledge (5/5)
  • Thread-Level Parallelism (TLP)
  • TLP is the parallelism inherent in an application
    that runs multiple threads at once.
  • Benefit-
  • Distributing the workload of a single
    high-performance processor among a number of
    slower and simpler processor cores.

8
IntroductionMain Purpose (1/2)
  • Utilizing Thread-Level Parallel (TLP) techniques
    to improve the performance on video coding.
  • Reduce DIC (Dynamic Instruction Count).
  • How to improve?
  • Workload distribution among a number of
    parallel-executing processors.

9
IntroductionMain Purpose (2/2)
  • The results presented demonstrate that reductions
    in dynamic instruction count can be achieved.

10
Previous Work
  • The majority of this research is focused on
    coarse-granularity TLP exploitation, with
    distribution the workload most commonly at GOP
    level.

Little inter-node communication
Multi-threading
GOP
GOP
GOP
GOP
GOP
GOP
11
Previous Work
  • In 1995, K. Shen, L. A. Rowe, and E.J. Delp
    implemented parallel MPEG-1 at GOP level.
  • In 1996, S. Bozoki, S. J. P. Westen, R. L.
    Lagendijk and J. Biemond performed a comparison
    between GOP and slice level on MPEG-1.

12
Previous Work
  • In 1997, A. Bilas, J. Fritts and J. P. Singh
    evaluated the performance of MPEG-2 decoders
    using shared memory system.
  • Akramullah, Ahmad and Liou implemented a threaded
    MPEG-2 encoder at the MB level by using local
    memory.

13
MethodologyOverview
  • The threaded MPEG-2 , MPEG-4 and H.264
    implemented were compiled on multi-context
    instruction simulator (MT-ISS) based on
    SimpleScalar infrastructure.
  • The most important issue
  • Data dependancies between processors.
  • Avoid race hazards.

14
MethodologyRace hazards
Expected condition
Error condition
Thread 1
Thread 2
Thread 1
Thread 2
0
1
1
2
0
0
1
1
i1
i1
i1
i1
Race hazards
0
1
2
0
1
1
Integer i
Integer i
15
MethodologyThread-parallel MPEG-2 (1/5)
  • Test model 5 (TM5) of MPEG-2 encoder is used.
  • Computation analysis (QCIF)
  • DIST1 ? 5273 of total DIC for a search window
    of 6 to 62 pels respectively.
  • FullSearch ? 3.523.2 of total DIC.
  • Can be improved by less complex algorithmic ME
    method. (such as 3-step, 4-step, diamond)
  • FDCT, and IDCT ? 2.121 of total DIC.

16
MethodologyThread-parallel MPEG-2 (2/5)
17
MethodologyThread-parallel MPEG-2 (3/5)
  • Motion Estimation
  • Kernel implementation can take advantage of data
    parallel techniques.
  • Store the information in mbinfo structure for
    motion compensation.
  • Maintain exclusivity of all variables during the
    parallel sections.

18
MethodologyThread-parallel MPEG-2 (4/5)
  • Forward transform
  • FDCT first scans the MBs on a row-by-row basis,
    process these MBs in a row individually.
  • Determine prediction error and applies the DCT to
    the block.
  • Thread-parallel transform function can be
    performed in block-level.

19
MethodologyThread-parallel MPEG-2 (5/5)
  • Inverse transform
  • IDCT scans the MBs first row-by-row and then
    block-by-block.
  • Due to the absence of data dependencies between
    blocks
  • ? Can executed as parallel.

20
MethodologyThread-parallel MPEG-4 (1/8)
  • The implementation is based on XviD project with
    Advanced Simple Profile (ASP).
  • Bidirectional frames
  • Quarter-pel motion compensation
  • Global motion compensation
  • Trellis quantization
  • Custom quantization matrices

21
MethodologyThread-parallel MPEG-4 (2/8)
  • Computation analysis (QCIF)

22
MethodologyThread-parallel MPEG-4 (3/8)
  • The nature of XivD encoder
  • Intra-frame encoding
  • Inter-frame encoding

23
MethodologyThread-parallel MPEG-4 (4/8)
  • Intra-frame encoding
  • FrameCodeI (row-by-row for each MBs)
  • Parallelize the loop for encoding the MBs in a
    row of the image.
  • MB data structure ? pMB.
  • Shared memory array.
  • The highest DIC metric in FrameCodeI is
    MBTransQuantIntra.

24
MethodologyThread-parallel MPEG-4 (5/8)
  • MBTransQuantIntra
  • Forward transformation, quantization and inverse
    transformation.
  • Shared data structure ? pEnc
  • Includes a count of quantization values.
  • Serial code section.
  • Transform specific MB pixel data into the
    frequency domain independently.
  • MBPrediction and MBCoding
  • Responsible for VLC and write to bitstream.

25
MethodologyThread-parallel MPEG-4 (6/8)
  • Inter-frame encoding
  • FrameCodeP
  • Part 1
  • ? Motion Estimation
  • Part 2
  • ? Transformation
  • ? Quantization
  • ? MC

26
MethodologyThread-parallel MPEG-4 (7/8)
  • Motion Estimation
  • Determine a MV for every MB and applies certain
    criteria to indicate when Intra coding should be
    used.
  • Scanning in raster line order.
  • Two kind of the process
  • Motion prediction from current frame.
  • ME relative to reference frames.

27
MethodologyThread-parallel MPEG-4 (8/8)
  • Motion Prediction
  • Examining the MVs in neighbouring MBs and
    determining an initial estimate for ME.

Ideal pattern
typical pattern
TLP pattern
?
?
?
?
?
?
?
?
?
?
28
MethodologyH.264 (1/6)
  • Using x264 for implementation.
  • Frame slicing
  • Main problems of using MB-level
  • Wide variation in processor workload.
  • The modification of prediction algorithm is
    needed.

29
MethodologyH.264 (2/6)
  • Slice group in H.264
  • A group of MBs in a frame.
  • Can be encoded or decoded separatedly from the
    remainder of the frame.
  • Not allowing motion prediction cross slice
    boundaries.
  • Drawback
  • The required bit-rate increase.

30
MethodologyH.264 (3/6)
  • Comparison of different slice number

31
MethodologyH.264 (4/6)
  • Comparison of different slice number

32
MethodologyH.264 (5/6)
  • Different resolution with 4 slices

33
MethodologyH.264 (6/6)
  • Computation analysis

34
Experimental ResultsMPEG-2
Search Range
35
Experimental ResultsMPEG-4
Quality Setting
36
Experimental ResultsH.264
Quantization Parameter
37
Experimental ResultsComparative results
38
Conclusions
  • The DIC metric of MPEG-2, MPEG-4, and H.264 can
    be greatly reduced by TLP.
  • For HD sequences, the improvement is around 84,
    92, 96 respectively.
  • TLP has become more significant for each new
    generation of video encoders.
Write a Comment
User Comments (0)
About PowerShow.com