Optimizing task and data representations - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Optimizing task and data representations

Description:

Thanks to Gabriel Coutinho and William Osborne. Multimedia systems: getting larger ... support CPU, DSP and FPGA optimisation. automatic task restructuring into ... – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Slides: 28
Provided by: cast57
Category:

less

Transcript and Presenter's Notes

Title: Optimizing task and data representations


1
Optimizing task and data representations
  • Tim Todman
  • Imperial College
  • London
  • 18 January 2008

2
Overview
  • 1. Outline of hArtes project
  • 2. Task transformation
  • 3. Case study ray tracing
  • 4. Data representation optimisation
  • 5. Conclusion
  • Thanks to Gabriel Coutinho and William Osborne

3
1. The hArtes Project
  • Multimedia systems getting larger
  • Increase flexibility
  • Reduce time to market
  • C annotations as intermediate language

4
Three key compilation stages
Partitioning (WP2.1)?
  • Task Transformation
  • support CPU, DSP and FPGA
    optimisation
  • automatic task restructuring into
    efficient architecture
  • Data Representation Optimisation
  • mainly for FPGA
  • trade-offs in accuracy, speed, area,
    power consumption
  • Task Mapping and Scheduling
  • decides which task runs on which
    processing element
  • optimises cost metrics

System Parameterisation and optimisation (WP2.3.4)
?
Task Transformation (WP2.2.3)?
Data Representation Optimisation (WP2.2.4)?
Cost Estimation and Metrics (WP2.2.5)?
Task Mapping and Scheduling (WP2.3.1)?
Code Generation (WP2.3.3)?
5
2. Exploring Task transformation
  • Transform a single node in task graph
  • Start obvious, but may lead to inefficient
    design
  • End non-obvious, better implementation
  • Source-to-source transformations from start to
    end
  • Domain-specific language to implement transforms
  • Compact description of transform
  • Abstract from implementation in compiler
    framework
  • Automate housekeeping functions (e.g. Visitor
    pattern)?

6
Requirements CML language
  • Aim compact transformation description
  • Describe transformations on
  • Abstract Syntax Tree (AST)?
  • Data Flow Graph (DFG)?
  • Support transformations specific to
  • Application domain embedded media
  • Target technology CPU DSP FPGA
  • Allow parameterisable transforms
  • e.g. unrolling factor
  • Interface to data representation optimisation
  • data representation optimisation as transform
  • Facilitate cost estimate e.g. number of
    registers

7
2. CML design flow
Partitioning
Code (COpenMP)?
Requirements
Library of transformations (CML)?
Transform engine
Data representation transform
Code (COpenMP)?
Cost estimate
Partitioning / code generation
8
CML for task transformations
  • Basic CML 3 parts to a transform
  • Pattern syntax to match, label elements
  • Conditions based on dataflow
  • Resulting pattern to substitute
  • Proposed novel aspects of extended CML
  • Systematic description of dataflow conditions
  • Parameterised transforms
  • Features for labelling subpatterns
  • Probabilities for machine learning
  • Extend CML code matching DFGs
  • s1-gts2 matches true dependence arc from s1 to s2
  • s1 -/gt s2 matches antidependence arc from s2 to
    s2
  • s1 -_at_-gt s2 matches output dependence arc from s1
    to s2

9
Related work
  • CoSy compiler framework
  • build compiler for new architectures
  • cost criteria for instruction selection
  • Machine learning (Mike OBoyle)?
  • find optimal set of transforms (from a small
    library)?
  • possible use in task transformation
  • transform ordering add suggested following
    transforms
  • initial probability for machine learning

10
3. Case study ray tracing
  • A classical computer graphics algorithm
  • Also has applications in
  • Seismology
  • Acoustics
  • Strengths photorealistic, global illumination
  • Weaknesses diffuse reflections, soft shadows
  • Very computationally expensive
  • Sublinear time complexity

11
Example images
From PoVray (hof.povray.org)?
Our raytracer
12
Ray tracing characteristics
  • Very processor intensive
  • Naturally recursive
  • Massively parallel each pixel is independent
  • For each pixel, rays depend on results of
    previous rays
  • Ray-object intersection calculations dominate the
    computation time
  • Ray-object intersection calculations are
    relatively simple

13
Basic algorithm
Light source
Reflected ray
  • Trace rays for each screen pixel
  • If ray hits object
  • Trace to light sources (shadow rays)?
  • Trace reflection
  • Trace refraction
  • End tracing when below threshold

Shadow ray
Refracted ray
Camera
Object
14
Dataflow in ray tracing
  • Ray-object intersector

C
R2
s
d
Bank 1 Ray directions d
Bank 2 Ray start points s
Bank 3 Sphere C, R2
FPGA


sqrt
Bank 4 Intersection results dist1, dist2

dist1
dist2
15
Dataflow software and hardware
  • Process results
  • of batch n
  • Generate batch
  • n1 in shared
  • memory

Bank 1 Ray directions d
Bank 2 Ray start points s
Bank 3 Sphere C, R2
write
FPGA
Bank 4 Intersection results dist1, dist2
read
Software
Hardware
16
Call graph of depth-first ray tracing
Main
After Heckbert
  • Screen
  • Generate primary rays

Secondary rays
  • Trace
  • If intersection then shade point
  • Shade
  • Test for shadows
  • Compute illumination
  • Recurse for reflection and refraction

Shadow rays
Intersect
17
Depth-first poor match for hardware
  • Ray-object intersection calculations tightly
    coupled to rest of algorithm
  • Hardware called for small batches of rays
  • Limits pipelining
  • Most time spent communicating over bus
  • Solution transform algorithm
  • Marshal batches of independent rays together
  • Runs much slower in software, but much faster in
    hardware

18
Call graph of breadth-first ray tracing
Main
Calculate Pixel colours Visit each ray tree
Trace rays
Add rays to buffer Visit ray tree roots in order
Intersect Ray batch
Process Intersection Results
Calculate final colour traverse ray tree
Process ray Results
Secondary rays
19
Automate the restructuring
  • Aim depth-first to breadth-first algorithm
  • Restructure to intersect rays in large batches
  • Standard passes
  • Hoisting initialisation
  • Loop interchange
  • Index normalisation
  • Custom passes specific recursive structure to
    iteration
  • Arrays replace stacks
  • For-loops with extra variables as guards
  • Custom passes strip mining of rays from data
    structure
  • Marshal into batches for intersection
  • Parameterise by hardware buffer size
  • Split loops to separate buffer fill, intersect
    and read back

20
Performance estimate
  • Hardware 16MHz, result every three cycles
  • 5 x106 intersections per second (ips)?
  • Software 2 x106 ips
  • 100MB/s bus with 10ms startup latency
  • Application
  • 640 by 480 pixels, 20 objects
  • Depth-first
  • 6140 seconds
  • needs bus read / write per pixel
  • Breadth-first
  • Batch size 1024 gt 6.8 seconds
  • Batch size 10240 gt 1.3 seconds

21
5. Data representation optimization
Uniform vs variable word-lengths
Area / Slices
Design
  • Independent of input data

22
Static word-length optimization
  • Multi-stage
  • - Range analysis
  • - Low-effort
  • - High-effort
  • Guaranteed accuracy
  • - Reduce area
  • - Increase speed
  • - Reduce power consumption

23
Our approach
  • Range analysis
  • Interval Affine Instrumentation
  • Precision analysis
  • Partitioned, heuristic algorithm
  • Accuracy
  • Genetic Algorithm increases accuracy
  • Extension dynamic analysis
  • Input range analysis
  • Black-box function analysis
  • Branch analysis

24
Range analysis instrumentation
  • Loops
  • instrument code
  • calculate number of iterations
  • Benefits
  • Increased accuracy
  • reduced area
  • Interval Arithmetic
  • Simplistic
  • no correlation information
  • Affine Arithmetic
  • correlation information used
  • can produce misleading results

while (acc lt in_x)? acc acc 1
while (acc lt in_x)? analyze_loop(loop_1) ac
c acc 1 analyze_end(loop_1)
25
Ray tracing accuracy vs precision
26
Trade-offs
  • Reduce precision
  • reduce area
  • higher speed
  • reduce power consumption

27
5. Conclusion
  • Task Transformations
  • Automation of transformation
  • Proposed extensions to CML
  • Case study Ray tracing
  • Map ray tracing to hardware
  • Manually use breadth first transform for best use
    of slow bus
  • Estimate hardware for complex scenes
  • Data Representation Optimisation
  • Interval arithmetic affine arithmetic
  • Instrumented source code
  • Partitioned heuristic algorithm
  • Same accuracy, 200 times faster
  • Applications Ray tracing, Molecular dynamics,
    String simulations
Write a Comment
User Comments (0)
About PowerShow.com