Stanford Streaming Supercomputer (SSS) Project Meeting - PowerPoint PPT Presentation

About This Presentation
Title:

Stanford Streaming Supercomputer (SSS) Project Meeting

Description:

Bill Dally, Pat Hanrahan, and Ron Fedkiw. Computer Systems Laboratory ... convolve. convolve. Depth Map. Operations within a kernel operate on local data ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 19
Provided by: william507
Category:

less

Transcript and Presenter's Notes

Title: Stanford Streaming Supercomputer (SSS) Project Meeting


1
Stanford Streaming Supercomputer (SSS)Project
Meeting
  • Bill Dally, Pat Hanrahan, and Ron FedkiwComputer
    Systems LaboratoryStanford University
  • October 2, 2001

2
Agenda
  • Introductions (now)
  • Vision subset of ASCI review slides
  • Goals for the quarter
  • Schedule of meetings for the quarter

3
Computation is inexpensive and plentiful
nVidea GeForce3 80 Gflops/sec 800 Gops/sec
Velio VC3003 1Tb/s I/O BW
DRAM lt 0.20/MB
4
But supercomputers are very expensive
  • Cost more per GFLOPS, GUPS, and GByte than low
    end machines
  • Hard to achieve high fraction of peak performance
    on global problems
  • Based on clusters of CPUs that are scaling at
    only 20/year vs. 50 historically

5
Microprocessors no longer realize the potential
of VLSI
52/year
19/year
301
74/year
1,0001
30,0001
6
Streaming processors leverage emerging technology
  • Streaming supercomputer can achieve
  • 20/GFLOPs, 2/M-GUPS
  • Scalable to PFLOPS and 1013 GUPS
  • Enabled by
  • Stream architecture
  • Exposes and exploits parallelism and locality
  • High arithmetic intensity (ops/BW)
  • Hides latency
  • Efficient interconnection networks
  • High global bandwidth
  • Low latency

7
What is stream processing?
Operations within a kernel operate on local data
Kernels can be partitioned across chips to
exploit control parallelism
Image 0
convolve
convolve
Depth Map
SAD
Image 1
convolve
convolve
Streams expose data parallelism
8
Why does it get good performance easily?
9
Architecture of a Streaming Supercomputer
10
Streaming processor
11
A layered software system simplifies stream
programming
12
Domain-specific languageexample Marble shader
in RTSL
float turbulence4_imagine_scalar (texref noise,
float4 pos) fragment float4 addr1 pos
fragment float4 addr2 pos 2, 2, 2, 1
fragment float4 addr3 pos 4, 4, 4, 1
fragment float4 addr4 pos 8, 8, 8, 1
fragment float val val (0.5)
texture(noise, addr1)0 val val
(0.25) texture(noise, addr2)0 val val
(0.125) texture(noise, addr3)0 val
val (0.0625) texture(noise, addr4)0
return val
float3 marble_color(float x) float x2 x
sqrt(x1.0).7071 x2 sqrt(x) return .30
.6x2, .30 .8x, .60
.4x2
surface shader float4 shiny_marble_imagine
(texref noise) float4 Cd lightmodel_diffuse(
0.4, 0.4, 0.4, 1 , 0.5, 0.5, 0.5, 1 )
float4 Cs lightmodel_specular( 0.35, 0.35,
0.35, 1 , Zero, 20) fragment float y
fragment float4 pos Pobj 10, 10, 10, 1 y
pos1 3.0 turbulence4_imagine_scalar(noise,
pos) y sin(ypi) return
(marble_color(y), 1.0f Cd Cs)
13
Stream-level application descriptionexample
SHARP Raytracer
Camera
Grid
Triangles
Rays
Rays
Hits
VoxID
Rays
Rays
Pixels
  • Computation expressed as streams of records
    passing through kernels
  • Similar to computation required for Monte-Carlo
    radiation transport

14
Expected application performance
  • Arithmetic-limited applications
  • Includes applications where domain decomposition
    can be applied
  • Like TFLO and LES
  • Expected to achieve a large fraction of peak
    performance
  • Communication-limited applications
  • Such as applications requiring matrix solution Ax
    b
  • At the very least will benefit from high global
    bandwidth
  • We hope to find new methods to solve matrix
    equations using streaming

15
Conclusion
  • Computation is cheap yet supercomputing is
    expensive
  • Streams enable supercomputing to exploit
    advantages of emerging technology
  • by exposing locality and concurrency
  • Order of magnitude cost/performance improvement
    for both arithmetic-limited and
    communication-limited codes
  • 20/GFLOPS and 2/M-GUPS
  • Scalable from desktop (1 TFLOPS) to machine room
    (1 PFLOPS)
  • A layered software system using domain-specific
    languages simplifies stream programming
  • MCRT, ODEs, PDEs
  • Early results on graphics and image processing
    are encouraging

16
Plan for AY2001-2002
17
Project Goals for Fall Quarter AY2001-2002
  • Map two applications to the stream model
  • Fluid flow (TFLO), and molecular dynamics
    candidates
  • Define a high-level stream programming language
  • Generalize stream access without destroying
    locality
  • Draft strawman SSS architecture and identify key
    issues

18
Meeting Schedule Fall Quarter AY2001-2002
  • Goal shared knowledge base and vision across the
    project
  • 10/9 TFLO (Juan)
  • 10/16 RTSL (Bill M.)
  • 10/23 Molecular Dynamics (Eric)
  • 10/30 Imagine and its programming system
    (Ujval)
  • 11/6 C, ZPL, etc SPL brainstorming (Ian)
  • 11/13 Metacompilation (Ben C.)
  • 11/20 Application followup (Ron/Heinz)
  • 11/27 Strawman architecture (Ben S.)
  • 12/4 Streams vs. CMP (Blue Gene/Light, etc)
    (Bill D.)
Write a Comment
User Comments (0)
About PowerShow.com