Compilation Targets - PowerPoint PPT Presentation

About This Presentation
Title:

Compilation Targets

Description:

Pipelined MADD units. Multiplexed Register File. MUL. ADD. Register Fifo. Jan 30, 2003 ... 4-vector fp32 SIMD instruction set. Gathers allowed inside kernels ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 16
Provided by: william507
Category:

less

Transcript and Presenter's Notes

Title: Compilation Targets


1
Compilation Targets
  • Ian Buck, Francois Labonte
  • February 04, 2003

2
GPU Architectural Differences
  • No SRF
  • Pipelined MADD units
  • Multiplexed Register File

MUL ADD
Register Fifo
3
GPU Architectural Differences
  • No SRF
  • Pipelined MADD units
  • Multiplexed Register File

MUL ADD
MUL ADD
MUL ADD
MUL ADD
Register Fifo
Register Fifo
Register Fifo
Register Fifo
4
GPU Architectural Differences
  • No SRF
  • Multiplexed Register File
  • Data Parallelism
  • Arithmetic Intensity
  • Gather inside kernels

MUL ADD
Register Fifo
5
GPU Programming Model
6
GPU Programming Model
  • Positives
  • 4-vector fp32 SIMD instruction set
  • Gathers allowed inside kernels
  • High level compilers (Cg HLSL)

7
GPU Programming Model
  • Negatives
  • No exposed SRF
  • Limited Scatter capabilities
  • No branching
  • No retained state between stream elements

8
GPU Compilation Target
  • Compile Brook kernels to Cg
  • Streams Textures
  • Roll Operators into gathers
  • Stencil, Group
  • Compile stream graph into large kernels

9
GPU Compilation Target
  • Challenges
  • Reductions require lg(N) passes
  • Scatter requires host assist
  • May be fixed soon
  • Limtied resources
  • registers
  • inputs / outputs
  • instruction counts
  • Needs generalized RDS

10
GPU Compilation Target
  • Questions
  • How does a GPU fit into the SVM?
  • Texture memory SRF?
  • Do we allow gather operations inside of kernels?
  • Multinode issues?
  • Not a shared memory machine.

11
Smart Memories
  • Original Smart Memories
  • 4 CPUs in a quad could be configured as a 4
    cluster machine working in SIMD
  • Control node was one processor node
  • Memory tiles could be configured as SRF banks,
    kernel instruction memory stream buffers.

12
Smart Memory Implementation Status
  • Instead of creating the whole processor core,
    Smart Memories is looking at using a processor
    core from Tensilica
  • Tensilica provides extensible (add instructions)
    synthesizable processor cores.
  • The status of streaming is uncertain because
  • Until this is resolved, it is not worthwhile
    discussing

13
X86 Workstation cluster - Diff
  • No SRF per se
  • Could try to exploit cache as SRF (similar to
    Sandias Sierra)
  • Indexing in kernels is possible
  • Though degrades performance if outside the cache
  • Conditionals branches are possible, predication
    not (single cluster)
  • SIMD instructions SSE/MMX provide extra ILP
  • Simultaneous Multithreading Chance to overlap
    memory and kernel execution.

14
Multinode issues
  • Not shared memory environment
  • Do we need software address translation?
  • Would be simpler to implement on SGI Origin or
    Flash
  • ScatterOps across multiple nodes need to go
    through the CPU of the concerned memory location

15
Compilation Paths
  • Brook -gt Mattan/Jayanth compiler -gt SVM -gt
    pthreads
  • Brook on multiple threads - Christos
Write a Comment
User Comments (0)
About PowerShow.com