Title: Stanford Streaming Supercomputer (SSS) Project Meeting
1Stanford Streaming Supercomputer (SSS)Project
Meeting
- Bill Dally, Pat Hanrahan, and Ron FedkiwComputer
Systems LaboratoryStanford University - October 2, 2001
2Agenda
- Introductions (now)
- Vision subset of ASCI review slides
- Goals for the quarter
- Schedule of meetings for the quarter
3Computation is inexpensive and plentiful
nVidea GeForce3 80 Gflops/sec 800 Gops/sec
Velio VC3003 1Tb/s I/O BW
DRAM lt 0.20/MB
4But supercomputers are very expensive
- Cost more per GFLOPS, GUPS, and GByte than low
end machines - Hard to achieve high fraction of peak performance
on global problems - Based on clusters of CPUs that are scaling at
only 20/year vs. 50 historically
5Microprocessors no longer realize the potential
of VLSI
52/year
19/year
301
74/year
1,0001
30,0001
6Streaming processors leverage emerging technology
- Streaming supercomputer can achieve
- 20/GFLOPs, 2/M-GUPS
- Scalable to PFLOPS and 1013 GUPS
- Enabled by
- Stream architecture
- Exposes and exploits parallelism and locality
- High arithmetic intensity (ops/BW)
- Hides latency
- Efficient interconnection networks
- High global bandwidth
- Low latency
7What is stream processing?
Operations within a kernel operate on local data
Kernels can be partitioned across chips to
exploit control parallelism
Image 0
convolve
convolve
Depth Map
SAD
Image 1
convolve
convolve
Streams expose data parallelism
8Why does it get good performance easily?
9Architecture of a Streaming Supercomputer
10Streaming processor
11A layered software system simplifies stream
programming
12Domain-specific languageexample Marble shader
in RTSL
float turbulence4_imagine_scalar (texref noise,
float4 pos) fragment float4 addr1 pos
fragment float4 addr2 pos 2, 2, 2, 1
fragment float4 addr3 pos 4, 4, 4, 1
fragment float4 addr4 pos 8, 8, 8, 1
fragment float val val (0.5)
texture(noise, addr1)0 val val
(0.25) texture(noise, addr2)0 val val
(0.125) texture(noise, addr3)0 val
val (0.0625) texture(noise, addr4)0
return val
float3 marble_color(float x) float x2 x
sqrt(x1.0).7071 x2 sqrt(x) return .30
.6x2, .30 .8x, .60
.4x2
surface shader float4 shiny_marble_imagine
(texref noise) float4 Cd lightmodel_diffuse(
0.4, 0.4, 0.4, 1 , 0.5, 0.5, 0.5, 1 )
float4 Cs lightmodel_specular( 0.35, 0.35,
0.35, 1 , Zero, 20) fragment float y
fragment float4 pos Pobj 10, 10, 10, 1 y
pos1 3.0 turbulence4_imagine_scalar(noise,
pos) y sin(ypi) return
(marble_color(y), 1.0f Cd Cs)
13Stream-level application descriptionexample
SHARP Raytracer
Camera
Grid
Triangles
Rays
Rays
Hits
VoxID
Rays
Rays
Pixels
- Computation expressed as streams of records
passing through kernels - Similar to computation required for Monte-Carlo
radiation transport
14Expected application performance
- Arithmetic-limited applications
- Includes applications where domain decomposition
can be applied - Like TFLO and LES
- Expected to achieve a large fraction of peak
performance - Communication-limited applications
- Such as applications requiring matrix solution Ax
b - At the very least will benefit from high global
bandwidth - We hope to find new methods to solve matrix
equations using streaming
15Conclusion
- Computation is cheap yet supercomputing is
expensive - Streams enable supercomputing to exploit
advantages of emerging technology - by exposing locality and concurrency
- Order of magnitude cost/performance improvement
for both arithmetic-limited and
communication-limited codes - 20/GFLOPS and 2/M-GUPS
- Scalable from desktop (1 TFLOPS) to machine room
(1 PFLOPS) - A layered software system using domain-specific
languages simplifies stream programming - MCRT, ODEs, PDEs
- Early results on graphics and image processing
are encouraging
16Plan for AY2001-2002
17Project Goals for Fall Quarter AY2001-2002
- Map two applications to the stream model
- Fluid flow (TFLO), and molecular dynamics
candidates - Define a high-level stream programming language
- Generalize stream access without destroying
locality - Draft strawman SSS architecture and identify key
issues
18Meeting Schedule Fall Quarter AY2001-2002
- Goal shared knowledge base and vision across the
project - 10/9 TFLO (Juan)
- 10/16 RTSL (Bill M.)
- 10/23 Molecular Dynamics (Eric)
- 10/30 Imagine and its programming system
(Ujval) - 11/6 C, ZPL, etc SPL brainstorming (Ian)
- 11/13 Metacompilation (Ben C.)
- 11/20 Application followup (Ron/Heinz)
- 11/27 Strawman architecture (Ben S.)
- 12/4 Streams vs. CMP (Blue Gene/Light, etc)
(Bill D.)