Application Exploration Key Learnings Christopher Rodrigues, Sara Sadeghi, Christopher Kung, John St

About This Presentation

Title:

Application Exploration Key Learnings Christopher Rodrigues, Sara Sadeghi, Christopher Kung, John St

Description:

'Critical Recurrence' in Precomputation. High-level. Detailed. OBSTACLES (JPEG) ... No data or control dependences between the elements to parallelize (either loop ... – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 2

Provided by: wmr77

Category:

more less

Transcript and Presenter's Notes

Title: Application Exploration Key Learnings Christopher Rodrigues, Sara Sadeghi, Christopher Kung, John St

1
Application Exploration Key LearningsChristophe
r Rodrigues, Sara Sadeghi, Christopher Kung, John
StrattonIan Steiner, Sain-Zee Ueng, Shane Ryoo,
Wen-Mei Hwu GSRC Soft Systems
Abstract
Key Observations
LAME has four encoding modes Constant Bit Rate,
Average Bit Rate, Variable Bit Rate with
reservoir enabled, and Variable Bit Rate with
reservoir disabled.

Media applications are compute-intensive, soft
real-time applications. In single-threaded form,
they can be highly demanding on both compute
speed and memory bandwidth. Yet at the
algorithmic level they are parallel, opening the
possibility of using parallel computation to run
them on cheap, general-purpose hardware.
We know that media codecs are parallel by design.
But what needs to be done to a common,
sequential implementation in order to make it
parallel? To answer this question, we
hand-parallelized some implementations of common
media coders and decoders.

Several well-known parallelizing transformations
can be applied to a program
Data Distribution, Pipelining, Task Distribution
These transformations depend on the program being
in the right form
No data or control dependences between the
elements to parallelize (either loop iterations
or statements)
Enabling transformations eliminate benign
dependences
Identify latent parallelism, then expose with
enabling transformations
Goal is to make transformations realizable in the
compiler, guided by analysis and/or programmer
interaction
Other issues are still obstacles in some programs
Recursive data structures
Unfamiliar forms of control

Constant and Average Bit Rate have a low level of
available parallelism.
Data flow suggests fission
Control Flow
Data flow properties maintained
There is a similar dependency due to the static
variable resv_size in Variable Bit Rate, unless
the user selects to disable the reservoir at
invocation.
ISOLATING CODE PATHS (LAME)
Data Flow
PRIVATIZATION (MP3Dec)
Loop Fission
Side exit redirected to IDCT loop
Side exit prevents fission
Critical Recurrence in Precomputation
Iterations reflect side exit
OBSTACLES (JPEG)
Front End
Back End
Precomputation block depends on the previous
invocations of synth_1to1
Data buffers need to be saved for Compute
Intensive
Compute Intensive only depends on Precomputation
Program Dependence Graphs
High-level
Detailed

Exposes high-level parallelism for one user
option for maximum performance
Does not increase performance of other code
paths
Increases code footprint

Problem complexity does not scale linearly for
achieving high levels of parallelism
Static. Dependence between calls
Narrowing b0 selection using pointers
RETILING (MPEG)
Initial transformations prepare loops for
parallelization

Re-initialize parts of buffs array
not possible to prove that previous writes are
killed

Execution Ordering
Fuse
Data Distribute
Source Code
Retile
for (i0 iltblocks i) mvi ...
Motion Estimation
mv
Writes in the second loop are not in the same
order as reads in the third loop. Retiling the
third loop makes the access patterns match,
allowing the loops to execute in parallel with
respect to each other.