Stanford Streaming Supercomputer (SSS) Winter Quarter 2002-2003 Wrapup Meeting - PowerPoint PPT Presentation

About This Presentation
Title:

Stanford Streaming Supercomputer (SSS) Winter Quarter 2002-2003 Wrapup Meeting

Description:

First year goal was met: demonstrated feasibility on single node. Feedback from ... They listened politely, but little traction. Need more convincing evidence ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 22
Provided by: william507
Category:

less

Transcript and Presenter's Notes

Title: Stanford Streaming Supercomputer (SSS) Winter Quarter 2002-2003 Wrapup Meeting


1
Stanford Streaming Supercomputer (SSS) Winter
Quarter 2002-2003 Wrapup Meeting
  • Bill Dally, Computer Systems LaboratoryStanford
    University
  • March 11, 2003

2
Year 2 Overview
  • Where we are today
  • First year goal was met demonstrated feasibility
    on single node
  • Feedback from site visit team was very positive
  • Potential for a big impact on scientific
    computing
  • But still much to do!
  • Key FY03 goals
  • Get long-term software infrastructure in place
  • Select approach, implement baseline Brook to SSS
    compiler
  • Multi-node versions that scale
  • Language, compiler, simulator
  • Tackle hard problems 3-D, Irregular
    neighborhoods/sparse matrix solve
  • Language support, numerics support, evaluate on
    simulator
  • Refine architecture
  • Cluster organization, aspect ratio, register
    organization, memory organization
  • Industrial Partner
  • Start serious discussions, outreach to build
    support, close partner in 04

3
Some concerns
  • Were doing a great job but
  • Losing a bit of focus and momentum
  • Tooling on the detail
  • Need to take a step back and reexamine the big
    picture
  • Need to raise our outside profile
  • Publish
  • Overview paper
  • Brook paper
  • Generate some more convincing evidence of
    advantages
  • Need a control for bandwidth measures
  • Update the web page
  • Visit the labs

4
Lets review our overall goal
  • Exploit capabilities of VLSI to realize
    cost-effective scientific computing.

5
Review What is the SSS Project About?
  • Exploit streams to give 100x improvement in
    performance/cost for scientific applications vs.
    cluster supercomputers
  • From 100 GFLOPS PCs to TFLOPS single-board
    computers to PFLOPS supercomputers
  • Use layered programming system to simplify
    development and tuning of applications
  • Stream languages
  • Streaming virtual machine
  • Demonstrated feasibility of streaming scientific
    computing in year 1
  • Refine architecture and programming system in
    year 2
  • Demonstrate realistic applications (3D,
    irregular)
  • Build usable compiler
  • Resolve architecture questions aspect ratio,
    conditional execution, sparse clusters, reg
    organization, memory system, etc
  • Build a prototype and demonstrate CITS
    applications in years 3-6
  • With industrial and government partners
  • Broaden our base of support

6
Industrial Partner Update
  • Candidates
  • Cray, IBM, Sun, HP, SGI, Intel
  • Initial discussion
  • Present SSS project and results to date
  • Discuss collaboration models
  • Identify next steps
  • Met with Cray, Sun, and SGI
  • They listened politely, but little traction
  • Need more convincing evidence
  • Need to address programming issue
  • Have to provide a path for legacy codes

7
Outreach
  • National Labs
  • Los Alamos
  • Livermore
  • Sandia
  • Other Government
  • NASA
  • DARPA
  • DoD (Charlie Holland)
  • AFOSR
  • User communities

8
Software Win 02 Goals
  • Brook
  • Define carefully the semantics of the operators
  • No progress
  • Work on views of memory abstraction
  • Proposed API will write up for next SW meeting
  • Support for partitioning, shared memory, naming,
    fitting into stream abstraction
  • Adopting UPC will write up for next SW meeting
  • Support for irregular neighborhoods
  • Failed to find an application
  • Multithreaded version (Christos)
  • Have simple model for multi-node written up
  • (NEW) Preliminary Brooktran spec
  • Concrete Winter goals Ian/Frank
  • Review of the language Pat
  • Partitioning (UPC)
  • Multi-node/Multi-threaded version
  • Irregular support w/ application
  • PPoPP paper
  • MD on BRT

9
Brook Spring 03 Goals
  • Refine semantics of operators
  • New version of spec
  • Implement views of memory API (UPC)
  • Find application for irregular structures
  • Dijkstra, incomplete LU
  • Dynamic structure
  • Start switching to new compiler
  • Brooktran spec/implementation
  • Implemented in Open64
  • Concern have lost metacompiler support

10
Software Win 02 Goals
  • SVM
  • Spec has evolved
  • Concensus between MIT, Texas, Stanford, USC
  • Implement multinode version
  • No progress
  • SVM to simulator path
  • No progress
  • Multi-thread

11
SVM Spring 03 Goals
  • Spec is complete and supports SSS
  • Revise single-node simulator
  • Multi-node simulator (prelim)

12
Software Win 02 Goals (3 of 3)
  • Start regular meetings Done
  • Compiler
  • Decide on flow from Brook-gtSVM-gtSSS Mattan
  • Done
  • Select base compiler Jayanth
  • ORC, Gnu, SUIF, Tendra, others
  • Done
  • Spike a simple program from Brook-gtSSS
    Mattan/Jayanth
  • Started modified front end operating on WHIRL
  • Brook to Nvidia
  • Optimizations Spring
  • Run time
  • Write a white paper

13
Compiler Spring 03 Goals
  • Complete feasibility study
  • Brook to C path
  • Parse Brook
  • Generate C
  • Optimizations
  • See Mattans document
  • Need to generate SVM code by mid summer
  • Parse Brooktran Alan, Fatica, Jayanth
  • Kernel scheduler MULADD Das
  • SVM to SSS Francois long term need plan

14
Application Win 02 Goals
  • StreamFLOFatica
  • Base version is complete
  • Not running on simulator
  • Early start on 3D version partitioning waiting
    on API def
  • StreamFEM Barth
  • Waiting on spec for partitioning
  • 3D arithmetic kernels done
  • Tridiagonal in Brook
  • StreamMD Eric/student
  • Ported GROMACS to the NV30 benchmarks
  • Performance dependent on number of registers
  • Doesnt work with CG compiler
  • Model applications Ron/Frank
  • Started
  • Look at Sierra, purple benchmarks ppm, sweep3D
    delay

15
Application Spring 03 Goals
  • StreamFLOFatica
  • Parse Brooktran F to WHIRL Alan, Fatica
  • Partitioned version multi-node UPC
  • 3D version
  • StreamFEM Barth
  • Simulate 3D
  • Sparse LUD
  • Partitioned version
  • StreamMD Eric/student
  • Hand-tune NV30 assembly code
  • GROMACS in Brook
  • Model applications Ron/Frank
  • C implementations of adaptive structures
  • Look at Sierra, purple benchmarks ppm, sweep3D
    delay

16
Architecture Win 02 Goals
  • Single-Node Simulator Jung-Ho, Knight
  • 64-bit support, MULADD, Scalar Processor
  • Not yet
  • Multi-Node Simulator Jung-Ho, Abhishek
  • Network model
  • Multi-node mechanisms
  • Not yet
  • Point Studies
  • Aspect ratio
  • SSE vs VLIW
  • Planning
  • Conditional execution Mattan/Ujval
  • Started
  • Sparse clusters
  • SRF organization Nuwan
  • Complete
  • Cache alternatives Jung Ho
  • Add and store study Jung Ho
  • Started

17
Architecture Spring 03 Goals
  • Multi-node simulator
  • Point Studies
  • Aspect ratio TIM
  • Conditional execution Mattan/Ujval
  • Sparse clusters Delay
  • SRF organization Nuwan
  • Refine
  • Cache alternatives Jung Ho
  • Add and store study Jung Ho
  • I/O ?
  • Iterative operations Francois
  • 64-bit delay
  • Scalar Processor delay

18
Special Win 02 Goals
  • Fix website Pat
  • Public and private websites
  • Name that computer
  • Mississippi
  • Axios
  • Submit names to Mattan
  • Bill, Pat, Bill to choose
  • Project Party Mattan Pats house

19
Name Resolution
  • From now on, the SSS is called
  • Merrimac

20
Winter Quarter Meeting Schedule
  • 4/1 Fedkiw Party
  • 4/8 Alan, Fatica Brooktran
  • 4/15 Kapasi Conditionals
  • 4/22 Fatica StreamFLO update
  • 4/29 Review Prep
  • 5/6 Review Prep
  • 5/13 Tim, Tim StreamFEM 3D
  • 5/20 Ian, Pat Brook Specification
  • 5/27 Mattan Bandwidth Comparison
  • 6/3 Jayanth Compiler
  • 6/10 Bill Wrapup

21
Papers
  • Arch
  • Indexable SRFs (Nuwan)
  • Streaming Supercomputer Overview (Tim K.)
  • Streaming on conventional CPUs (Mattan)
  • Conditionals (Ujval)
  • Remote Ops (Jung Ho)
  • Aspect Ratio (?)
  • Data parallel (SSE) vs. ILP (VLIW)
  • Software
  • Design of Brook (Ian)
  • Data parallel programming on graphics HW (Pat)
  • Brook to CG
  • Compiler
  • Apps
  • Gromacs
  • StreamFEM (Tim2)
  • Overview (Bill and Pat)
Write a Comment
User Comments (0)
About PowerShow.com