Title: Panel: Next Generation Grid Applications Barriers and Prospects for Petaflops Grid Nodes The Bright
1Panel Next Generation Grid Applications Barriers
and Prospects for Petaflops Grid NodesThe Bright
Spots
Presentation to the Cluster and Computational
Grids for Scientific Computing Workshop 2002
- Thomas Sterling
- California Institute of Technology
- September 11, 2002
2(No Transcript)
3(No Transcript)
4Earth Simulator An Opportunity, Not a Threat
5A Petaflops Todayan Earth Simulator View
- Cost 8 Billion
- By ASCI White, 10 Billion
- Footpad 600,000 square feet
- 100 tennis courts
- Flight decks of 3 Nimitz-class aircraft carriers
- Power almost 100 Mwatts
- 5 X Sum(all Top-500 machines)
6(No Transcript)
7LLNL Linux NetworX Cluster
- Fastest Linux supercomputer
- Installation at Lawrence Livermore National
Laboratory - System integrator Linux NetworX
- Delivery in Fall 2002
- 1,920 Intel Xeon processors at 2.4 GHz
- Peak performance 9.2 Teraflops
8Prospectsa conservative perspective
- Towards the Teraflops Rack
- Up to 42 1U modules per rack
- Approaching 10 Gflops (peak) performance
processors - Blade technology for dense packaging
- Green Destiny integrates 240 processor in a
single rack - Optical fiber-based SAN interconnect technology
- Infiniband
- gt 10 Gbps near term
- Tbps possible with WDM by end of decade
- Main memory
- lt 1 cents/Mbyte by end of decade
- gt 1M for a Petabyte
- But many applications require much less memory
capacity
9Petaflops in 2010 to 2012
- Flops performance gain 25X at system level
- Clock rate gt 10 GHz (chip wide)
- The rest is ILP and SOC
- Petaflops in lt 10K chips
- Memory capacity 64X
- Petabyte in 128K chips
- for 4M
- 7/yr speedup in access time
- Will take 30 times longer to read contents of
memory chip - Will take gt 100X longer in clock cycles (not
including communication latency) - I/O interface will grow slowly toward 1 Tbps
(maybe optics?) - 1 Petaflops System in 2010 cost between 50M
200M - Footpad 10,000 square feet
- Power is unclear
- Between 3 Mwatts and 25 Mwatts
10Barriers
- Still too big for cheap clusters
- Tactical issues
- Need more flops
- Power
- Reliability
- Efficiency e.g. latency, overhead
- Software environments
- System management
- Programming models and tools
- Strategic Issues
- Markets
- New component type opportunities
- New systems
11Strategic IssuesMarkets
- Commodity clusters cluster commodity products
- PC market is a replacement industry
- Current paradigm a dead end
- Exception is rapid visualization for video games
- Otherwise, little motivation for increase clock
speed - Server market is data bases and search engines
- Web driver is limited by external interface
bandwidth - Mass storage and disk caches, speed of disks not
processors - Mobile computing and embedded processing
- Reduce power and cost
- Wrong package for high scale integration
- Game machines
- Push the envelope of mass market computing
- Too special purpose for building block of Pflops
cluster computer - Possible conclusion were doomed
12On the S Curve,Where do you want to live?
- Incrementalism
- Safe, usually right
- Predictable but bounded
- Leverages existing investments
- Easy to sell
- Punctuated Equilibrium, Jump the curve
- Dangerous, usually wrong
- Unpredictable but unbounded
- Requires and creates new domains, must start from
scratch - Hard to sell
- Can change everything or be a profligate waste of
time, money
If you lived here, yould be home now
Jumping the S curve
Exploring the frontier
13Strategic IssuesNew Component Type Opportunities
- Todays micros worst way to build a computer
- IP ALU have stranglehold on rest of chip
- How to spend a billion transistors?
- Some new classes of components
- SMPoC, SoC (IBM, )
- Multithreaded architecture (Smith, Callahan,
Eghart, Sterling, ) - Streams (Daly, Keckler)
- Processor in Memory, PIM (Kogge, Hall, Sterling,
Brockman ) - Advantages
- Increase performance, and efficiency
- Reduce power, cost, size
- Trickle bounce
- Good for new mass markets
- HPC including clusters
14HTMT Petaflops Computer
15IBM Blue Gene / Cyclops
16Cascade Node
17Attributes of MIND Architecture
- Parcel active message driven computing
- Decoupled split-transaction execution
- System wide latency hiding
- Move work to data instead of data to work
- Multithreaded control
- Unified dynamic mechanism for resource management
- Latency hiding
- Real time response
- Virtual to physical address translation in memory
- Global distributed shared memory thru distributed
directory table - Dynamic page migration
- Wide registers serve as context sensitive TLB
- Graceful degradation for Fault tolerance
18MIND Node
memory address buffer
Parcel Interface
19(No Transcript)
20A Target MIND Pflops Systemin 2010
- 1 Petaflops PIM/MIND based system
- 256K MIND chips
- Actually peak gt 16 Petaflops
- 1 cubic meter
- 1 Petabyte
- 32 Mbyte/node
- Micro-channel cooling
- Zero maintenance
- Graceful degradation
- Reliability measured in half-life
- Latency management
- Multithreading
- Parcel message driven computation
- Percolation prestaging
21(No Transcript)
22Conclusions
- GRID is Good
- Does for machines now what the internet did for
people in 1970s - email, ftp, rlogin
- Bright-spot Grid nodes gt Pflops by 2010 and
beyond - Commodity clusters _at_ 1 peak-Petaflops, 2010 to
2012 - Footpad lt 10,000 square feet
- Power lt 5 Mwatts
- Cost approx. lt 80M
- Processor-in-Memory will accelerate Pflops Grid
Nodes - Completes/Fixes computer architecture
- Dramatically improves efficiency
- Enables reliability through graceful degradation
- Exaflops before 2020 through Continuum Computing
Medium Architecture (CCMA)
23The SIA CMOS Roadmap