Compilers and Applications - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Compilers and Applications

Description:

Dave Judd, Ronny Krashinsky, Randi Thomas, Samson Kwok, Simon Yau, Kar Ming Tang, ... Dense linear algebra [Simon Yau] Considering other DIS benchmarks, such as ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 14
Provided by: yel3
Category:

less

Transcript and Presenter's Notes

Title: Compilers and Applications


1
Compilers and Applications
  • Kathy Yelick
  • Dave Judd, Ronny Krashinsky, Randi Thomas, Samson
    Kwok, Simon Yau, Kar Ming Tang,
  • Adam Janin, Thinh Nguyen
  • Computer Science Division
  • UC Berkeley

2
Compiling for VIRAM
  • Long-term success of DIS technology depends on
    simple programming model, i.e., a compiler
  • Needs to handle significant class of applications
  • IRAM multimedia, graphics, speech and image
    processing
  • ISTORE databases, signal processing, other DIS
    benchmarks
  • Needs to utilize hardware features for
    performance
  • IRAM vectorization
  • ISTORE scalability of shared-nothing programming
    model

3
IRAM Compilers
  • IRAM/Cray vectorizing compiler Judd
  • Production compiler
  • Used on the T90, C90, as well as the T3D and T3E
  • Being ported (by SGI/Cray) to the SV2
    architecture
  • Has C, C, and Fortran front-ends (focus on C)
  • Extensive vectorization capability
  • outer loop vectorization, scatter/gather, short
    loops,
  • VIRAM port is under way
  • IRAM/VSUIF vectorizing compiler Krashinsky
  • Based on VSUIF from Corinna Lees group at
    Toronto which is based on MachineSUIF from Mike
    Smiths group at Harvard which is based on SUIF
    compiler from Monica Lams group at Stanford
  • This is a research compiler, not intended for
    compiling large complex applications
  • It has been working since 5/99.

4
IRAM/Cray Compiler Status
Vectorizer
Code Generators
Frontends
C
PDGCS
C90
C
IRAM
Fortran
  • MIPS backend developed in this year
  • Validated using a commercial test suite for code
    generation
  • Generated code run through vas assembler
  • Vector backend recently started
  • Testing with vsim under way this week
  • Leveraging from Cray
  • Automatic vectorization
  • Basic instruction scheduling framework

5
ISTORE Compiler
Optimizer
C compiler
Code Gen
Java
tc
C comm
t3e
cc
Titanium
ISTORE
  • Titanium language is an extension of Java
  • tc is the Titanium compiler
  • Recent progress
  • improved portability of generated code and the
    compiler itself, including port to Cray parallel
    machines
  • additions to generate annotations on C code to
    improve fine-grained parallelism (on Tera MTA)
    and vectorization
  • New benchmarking efforts
  • database primitives sorting, hash-join and
    index-nested-loop join
  • 3d FFT and linear solvers (LU)

6
Applications
  • Hand-written kernels for single-chip VIRAM
  • focus on multimedia kernels, see IRAM hardware
    talk
  • Compiled programs for single-chip VIRAM
  • 2 examples from IRAM/VSUIF decryption and mvm
  • most effort devoted to IRAM/Cray compiler
  • Performance benchmarks for ISTORE
  • 3d FFT
  • Others
  • SAM benchmarks for ISTORE

7
Automatic Vectorization
  • Vectorizing compilers very successful on
    scientific applications
  • not entirely automatic, especially for C/C
  • good tools for training users
  • Multimedia applications have
  • shorter vector lengths
  • can sometime exploit outer loop vectorization for
    longer vectors
  • often leads to non-unit strides
  • tree traversals could be written as
    scatter/gather (breadth-first),
  • although automating this is far from solved

e.g., image compression
8
IRAM/VSUIF Decryption (IDEA)
lanes
  • IDEA Decryption operates on 16-bit ints
  • Compiled with IRAM/VSUIF (with unrolling by hand)
  • Note scalability of both lanes and data width

9
VIRAM/VSUIF Matrix/Vector Multiply
  • VIRAM/VSUIF does reasonably well on long loops
  • 256x256 single matrix
  • Compare to 1600 Mflop/s (peak without multadd)
  • Note BLAS-2 (little reuse)
  • 350 on Power3 and EV6
  • Problems specific to VSUIF
  • hand strip-mining results in short loops
  • reductions
  • no multadd support

mvm
vmm
10
3D FFT on ISTORE
  • Performance of large 3D FFTs depend on 2 factors
  • speed of 1D FFT on a single node (next slide)
  • network bandwidth for transposing data
  • 1.3 Tflop FFT possible w/ 1K IRAM nodes and .5
    TB/s bw

11
1D FFT on IRAM
  • FFT study on IRAM Randi Thomas
  • hand-coded and scheduled
  • use of ISA features to make in-register FFTs fast
    (128 point)
  • bit-reversal time not included will also use ISA
    support

12
Other ISTORE Applications
  • Working on several performance applications for
    ISTORE
  • Database primitives sorts, joins, scans, etc.
    Kar Ming Tang
  • RT_STAP
  • QR Decomposition vectorizes easily, partially
    complete in IRAM/VSUIF
  • Conjugate Gradient Samson Kwok
  • Dominated by sparse matrix-vector multiply
  • Current performance 500/250 Mflops
    (single/double) on VIRAM
  • Compare to 10s of Mflops on most RISC machines
  • Dense linear algebra Simon Yau
  • Considering other DIS benchmarks, such as MoM

13
Conclusions
  • Significant compiler progress
  • Cray collaboration key Dave Judd UCB _at_ Eagan
  • Good tech transfer model
  • Vector code gen and instruction scheduling next
    steps
  • Even VSUIF version indicates reasonable
    performance
  • Commercial-quality compiler will allow non-toy
    applications, e.g., Speech
  • Benchmarks
  • Have been used to help with final ISA design
  • Simulated results validate performance claims
  • Models show real advantage to Intelligence in
    Memory (and Disk)
  • Machines scale and with simpler programming and
    optimization model than conventional
    multiprocessors
Write a Comment
User Comments (0)
About PowerShow.com