Research Accelerator for MultiProcessing - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Research Accelerator for MultiProcessing

Description:

Compact Flash. Administration/ maintenance ports: 10/100 ... 8 compute modules (plus power supplies) in 8U rack mount chassis. 500-1000 emulated processors ... – PowerPoint PPT presentation

Number of Views:137
Avg rating:3.0/5.0
Slides: 22
Provided by: georgep6
Category:

less

Transcript and Presenter's Notes

Title: Research Accelerator for MultiProcessing


1
Research Accelerator for MultiProcessing
  • Dave Patterson, EECS, UC Berkeley
  • President, Assoc. for Computer Machinery
  • November 2005

RAMP 10 collaborators at Berkeley, Carnegie
Mellon University, MIT, Stanford,
University of Texas, and University of Washington
2
Conventional Wisdom (CW) in Computer Architecture
  • Old CW Multiplies are slow, Memory access is
    fast
  • New CW Memory wall Memory slow, multiplies
    fast (200 clocks to DRAM memory, 4 clocks for
    multiply)
  • Old CW Power is free, Transistors expensive
  • New CW Power wall Power expensive, Xtors free
    (Can put more on chip than can afford to turn
    on)
  • Old CW Uniprocessor performance 2X / 1.5 yrs
  • New CW Power Wall Memory Wall Brick Wall
  • Uniprocessor performance only 2X / 5 yrs
  • Sea change in chip design multiple cores (2X
    processors per chip / 2 years)
  • More simpler processors are more power efficient

3
Sea Change in Chip Design
  • Intel 4004 (1971) 4-bit processor,2312
    transistors, 0.4 MHz, 10 micron PMOS, 11 mm2
    chip
  • RISC II (1983) 32-bit, 5 stage pipeline, 40,760
    transistors, 3 MHz, 3 micron NMOS, 60 mm2 chip
  • 125 mm2 chip, 0.065 micron CMOS 2312 RISC
    IIFPUIcacheDcache
  • RISC II shrinks to 0.02 mm2 at 65 nm
  • Caches via DRAM or 1 transistor SRAM
    (www.t-ram.com) ?
  • Proximity Communication via capacitive coupling
    at gt 1 TB/s ?(Ivan Sutherland _at_ Sun / Berkeley)
  • Processor is the new transistor?

4
Problems with Sea Change
  • Algorithms, Programming Languages, Compilers,
    Operating Systems, Architectures, Libraries,
    not ready for 1000 CPUs / chip
  • Software people dont start working hard until
    hardware arrives
  • How do research in timely fashion on 1000 CPU
    systems in algorithms, compilers, OS,
    architectures, without waiting years between HW
    generations?

5
FPGAs as New Research Platform
  • As 25 CPUs can fit in Field Programmable Gate
    Array (FPGA), 1000-CPU system from 40 FPGAs?
  • 64-bit simple soft core RISC at 100MHz in 2004
    (Virtex-II)
  • FPGA generations every 1.5 yrs 2X CPUs, 2X clock
    rate
  • HW research community does logic design (gate
    shareware) to create out-of-the-box, Massively
    Parallel Processor runs standard binaries of OS
    and applications
  • Gateware Processors, Caches, Coherency, Ethernet
    Interfaces, Switches, Routers,
  • E.g., 1000 IBM Power cache-coherent supercomputer

6
Why RAMP Good for Research?
7
RAMP 1 Hardware
  • Completed Dec. 2004 (14x17 inch 22-layer PCB)
  • Module
  • FPGAs, memory, 10GigE conn.
  • Compact Flash
  • Administration/maintenance ports
  • 10/100 Enet
  • HDMI/DVI
  • USB
  • 4K/module w/o FPGAs or DRAM

8
Multiple Module RAMP 1 Systems
  • 8 compute modules (plus power supplies) in 8U
    rack mount chassis
  • 500-1000 emulated processors
  • Many topologies possible
  • 2U single module tray for developers
  • Disk storage disk emulator Network Attached
    Storage

9
RAMP Development Plan
  • Distribute systems internally for RAMP 1
    development
  • Xilinx agreed to pay for production of a set of
    modules for initial contributing developers and
    first full RAMP system
  • Others could be available if can recover costs
  • Release publicly available out-of-the-box MPP
    emulator
  • Based on standard ISA (IBM Power, Sun SPARC, )
    for binary compatibility
  • Complete OS/libraries
  • Locally modify RAMP as desired
  • Design next generation platform for RAMP 2
  • Base on 65nm FPGAs (2 generations later than
    Virtex-II)
  • Pending results from RAMP 1, Xilinx will cover
    hardware costs for initial set of RAMP 2 machines
  • Find 3rd party to build and distribute systems
    (at near-cost)
  • NSF/CRI proposal pending to help support effort
  • 2 full-time staff (one HW/gateware, one
    OS/software)
  • Look for grad student support from industrial
    donations

10
Gateware Design Framework
  • Insight almost every large building block fits
    inside FPGA today
  • what doesnt is between chips in real design
  • Supports both cycle-accurate emulation of
    detailed parameterized machine models and rapid
    functional-only emulations
  • Carefully counts for Target Clock Cycles
  • Units in any hardware design languages (will
    work with Verilog, VHDL, BlueSpec, C, ...)
  • RAMP Design Language (RDL) to describe plumbing
    to connect units

11
Gateware Design Framework
  • Design composed of units that send messages over
    channels via ports
  • Units (10,000 gates)
  • CPU L1 cache, DRAM controller.
  • Channels ( FIFO)
  • Lossless, point-to-point, unidirectional,
    in-order message delivery

12
Status
  • Submitted NSF proposal August
  • 10 more RAMP1 boards being fabricated
  • Asked IBM, Sun for commercial ISA, simple,
    industrial-strength, 64-bit HDL of CPU FPU
  • Working on design framework document
  • Biweekly teleconferences (8 since June)
  • RAMP 1 short course/board distribution for RAMP
    conspirators Jan 06 in Berkeley
  • FPGA workshop at HPCA Feb 06 in Austin

13
RAMP in RADS Internet in a Box
  • Building blocks also ? Distributed Computing
  • RAMP vs. Clusters (Emulab, PlanetLab)
  • Scale RAMP O(1000) vs. Clusters O(100)
  • Private use 100k ? Every group has one
  • Develop/Debug Reproducibility, Observability
  • Flexibility Modify modules (Router, SMP, OS)
  • Explore via repeatable experiments as vary
    parameters, configurations vs. observations on
    single (aging) cluster that is often idiosyncratic

14
Multiprocessing Watering Hole
Parallel file system
Dataflow language/computer
Data center in a box
Thread scheduling
Internet in a box
Security enhancements
Multiprocessor switch design
Router design
Compile to FPGA
Fault insertion to check dependability
Parallel languages
  • RAMP as next Standard Research Platform? (e.g.,
    VAX/BSD Unix in 1980s)
  • RAMP attracts many communities to shared artifact
    ? Cross-disciplinary interactions ? Accelerate
    innovation in multiprocessing

15
Supporters (wrote letters to NSF)
  • Gordon Bell (Microsoft)
  • Ivo Bolsens (Xilinx CTO)
  • Norm Jouppi (HP Labs)
  • Bill Kramer (NERSC/LBL)
  • Craig Mundie (MS CTO)
  • G. Papadopoulos (Sun CTO)
  • Justin Rattner (Intel CTO)
  • Ivan Sutherland (Sun Fellow)
  • Chuck Thacker (Microsoft)
  • Kees Vissers (Xilinx)
  • Doug Burger (Texas)
  • Bill Dally (Stanford)
  • Carl Ebeling (Washington)
  • Susan Eggers (Washington)
  • Steve Keckler (Texas)
  • Greg Morrisett (Harvard)
  • Scott Shenker (Berkeley)
  • Ion Stoica (Berkeley)
  • Kathy Yelick (Berkeley)

RAMP Participants Arvind (MIT), Krste Asanovíc
(MIT), Derek Chiou (Texas), James Hoe (CMU),
Christos Kozyrakis (Stanford), Shih-Lien Lu
(Intel), Mark Oskin (Washington), David
Patterson (Berkeley), Jan Rabaey (Berkeley), and
John Wawrzynek (Berkeley)
16
Conclusion
  • RAMP as system-level time machine preview
    computers of future to accelerate HW/SW
    generations
  • Trace anything, Reproduce everything, Tape out
    every day
  • Emulate Multiprocessor, Data Center, or
    Distributed Computer
  • FTP supercomputer overnight and boot in morning
  • Clone to check results (as fast in Berkeley as in
    Boston?)
  • Carpe Diem
  • Systems researchers (HW SW) need the capability
  • FPGA technology is ready today, and getting
    better every year
  • Stand on shoulders vs. toes standardize on
    design framework, multi-year Berkeley effort on
    FPGA platforms (Berkeley Emulation Engine, BEE2)
  • Architecture researchers get opportunity to
    immediately aid colleagues via gateware (as SW
    researchers have done in past)
  • Multiprocessor Research Watering Hole
    accelerate research in multiprocessing via
    standard research platform ? hasten sea change
    from sequential to parallel computing

17
Backup Slides
18
Why RAMP Attractive?
Priorities for Research Parallel
Computers Insight Commercial priorities
radically different from research
  • 1a. Cost of purchase
  • 1b. Cost of ownership (staff to administer it)
  • 1c. Scalability (1000 much better than 100 CPUs)
  • 4. Power/Space (machine room cooling, number of
    racks)
  • 5. Community synergy (share code, )
  • 6. Observability (non-obtrusively measure, trace
    everything)
  • 7. Reproducibility (to debug, run experiments)
  • 8. Flexibility (change for different experiments)
  • 9. Credibility (Faithfully predicts real hardware
    behavior)
  • 10. Performance (As long as experiments not too
    slow)

19
Uniprocessor Performance (SPECint)
  • VAX 25/year 1978 to 1986
  • RISC x86 52/year 1986 to 2002
  • RISC x86 20/year 2002 to present

20
Related Approaches (1)
  • Quickturn, Axis, IKOS, Thara
  • FPGA- or special-processor based gate-level
    hardware emulators
  • Synthesizable HDL is mapped to array for cycle
    and bit-accurate netlist emulation
  • RAMPs emphasis is on emulating high-level
    architecture behaviors
  • Hardware and supporting software provides
    architecture-level abstractions for modeling and
    analysis
  • Targets architecture and software research
  • Provides a spectrum of tradeoffs between speed
    and accuracy/precision of emulation
  • RPM at USC in early 1990s
  • Up to only 8 processors
  • Only the memory controller implemented with
    configurable logic

21
Related Approaches (2)
  • Software Simulators
  • Clusters (standard microprocessors)
  • PlanetLab (distributed environment)
  • Wisconsin Wind Tunnel (used CM-5 to simulate
    shared memory)
  • All suffer from some combination of
  • Slowness, inaccuracy, scalability, unbalanced
    computation/communication, target inflexibility
Write a Comment
User Comments (0)
About PowerShow.com