Use of ThreadLevel Parallelism in Architectural Complexity Reduction of MPEG2 Encoder PowerPoint PPT Presentation

presentation player overlay
1 / 13
About This Presentation
Transcript and Presenter's Notes

Title: Use of ThreadLevel Parallelism in Architectural Complexity Reduction of MPEG2 Encoder


1
Use of Thread-Level Parallelism in Architectural
Complexity Reduction of MPEG2 Encoder
  • T. R. Jacobs
  • V. A, Chouliaras
  • D. J. Mulvaney
  • J. L. Nunez
  • Loughborough University

2
Contents
  • Compression and MPEG2
  • Thread-Level Parallelism the hows and whys
  • Development environment
  • Testing
  • Results
  • Conclusions
  • Questions

3
Compression and MPEG2
  • Uncompressed video data 175 Mbit/s
  • DVD and DVB max 9.8 Mbit/s
  • Motion Picture Export Group 2
  • Asymmetric encoding/decoding process
  • DCT block based compression
  • 8 8 pixel blocks
  • 16 16 pixel Macroblocks (MB)

4
MPEG2 Dataflow
  • I frame
  • fdct 8x8 blocks (lossless)
  • Quantize (lossy)
  • B / P frame
  • Recover previous frame (iQuant, idct)
  • Search for Macroblocks (MB), produce Motion
    Vectors (MV)
  • Create estimate from recovered MV
  • Fdct and quant errors between estimated and real
    frame
  • Variable Length Code the quant errors and MV

5
Thread-Level Parallelism 1
  • Spread workload over a number of processors
  • Decrease number of loop iterations
  • Calculate original loop number

6
Thread-Level Parallelism 2
  • Control of variables
  • Ensure mutual exclusivity
  • Uses of local and shared variables
  • Shared private space arrays
  • static arrayMAX_THREAD
  • arraycontext x
  • Serial code inside parallel loop
  • One thread performs serial code using each
    contexts data
  • if (context0)
  • for (z0zltMAX_THREADz)
  • a arrayz
  • Synchronisation
  • Barriers

7
Development Environment
  • Linux workstation
  • MPEG2 Test Model 5
  • Open source
  • SimpleScalar simulator
  • MIPS processor simulator
  • Ideal multi-processor environment
  • Produces instruction count per processor

8
Testing
  • Relative complexity
  • Context zero instruction count 100
  • Single threaded instruction count
  • Three test video sequences
  • 25 frames
  • 352 288
  • Nine different processor counts
  • Seven search window ranges

9
Cups, Deadline and Paris
10
Results
11
Optimal Context Number
Threaded loop run once only iteration 1
  • Iteration width /
  • (16 MAX_THREAD)
  • Iteration block_count / MAX_THREAD
  • Optimal context number Width

  • 16
  • Optimal context number chroma

  • block count
  • 4.2.0 6
  • 4.2.2 8
  • 4.4.4 12

12
Conclusions
  • Exploit natural parallelism
  • Achieve complexity savings of 96
  • More saving possible through vectorising

13
Questions?
Write a Comment
User Comments (0)
About PowerShow.com