CGaAs PowerPC FXU - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

CGaAs PowerPC FXU

Description:

PUMA Project Goals. Develop CGaAs digital circuit techniques ... PUMA ISA. Subset of PowerPC ISA, integer instructions. GPR and essential supervisor registers ... – PowerPoint PPT presentation

Number of Views:129
Avg rating:3.0/5.0
Slides: 49
Provided by: carl290
Category:
Tags: cgaas | fxu | powerpc | puma

less

Transcript and Presenter's Notes

Title: CGaAs PowerPC FXU


1
CGaAs PowerPC FXU
  • Alan J. Drake, Todd D. Basso, Spencer M. Gold,
    Keith L. Kraver, Phiroze N. Parakh, Claude R.
    Gauthier, P. Sean Stetson, and Richard B. Brown
  • University of Michigan

2
Overview
  • CGaAs overview
  • PUMA project overview
  • Architectural studies
  • CMOS vs CGaAs implementations
  • Design methodology
  • Test results
  • Conclusions

3
Motivation for CGaAs
  • BiCMOS alternative
  • N and P transistors
  • Low power
  • High speed
  • Radiation hard microelectronics
  • Motorola CELESTRI
  • Over 1000 new LEO satellites in lt 5 years

4
CGaAs Cross-section
TiW Schottky gate
No Gate or Field Oxide
Oxygen isolation
Ohmic metal (Ni)
30Ã… i-GaAs
n
n
p
p
250Ã… i-AlGaAs
150Ã… i-InGaAsChannel
30Ã… i-GaAs
Si-delta doping
p-
p-
2000Ã… i-GaAs
PFET
NFET
i-AlGaAs
LTG GaAs
LEC semi-insulating GaAs
Brown98
No Wells
Fast Recombination Time
5
CGaAs Advantages
  • Radiation-hard
  • Total dose (108 rads)
  • SEU (10-10 upsets/bit-day)
  • Latchup (1012 rads)
  • Potential high-speed
  • HFET operation
  • Low-power
  • VDD 0.9 V to 1.5 V
  • Versatile
  • Complementary transistors
  • SCFL, P-DCFL, Domino, and Complementary Logic

6
CGaAs Disadvantages
  • High VT
  • ?0.55 V
  • Low Idsat
  • Gate leakage
  • Course design rules
  • Large ohmic contacts
  • Conservative active spacing
  • LDD style P doping
  • Course gate metal and M1
  • N to P sneak paths
  • Low integration levels

7
History of GaAs at U of M
  • Aurora project
  • MIPS architecture
  • R2000 and R3000
  • Vitesse GaAs E/D MESFET technology
  • HGaAs II and HGaAs III
  • Aurora III (1995)
  • Dual-issue, 0.8 IPC
  • 500 K transistors, 300 MHz, 40 W (simulated)

8
PUMA Project Goals
  • Develop CGaAs digital circuit techniques
  • Complementary and domino logic
  • Automated design tools
  • Optimize microarchitecture suitable for CGaAs
  • Low transistor integration level
  • 500K1Million transistors
  • Evaluate CGaAs for digital VLSI applications
  • Speed
  • Power
  • Area

9
PUMA System Block Diagram
PCI BUS
64 MB SDRAM
PC-board
PCI
(128)
CMOS chip
(128)

(128)
(128)
MMU
CGaAs chip
1 MB L2 i-cache
1 MB L2 d-cache
(128)
(128)
(32)
(128)
CGaAs uProcessor
(32)
16 KB L1 d-cache
MCM-D on ceramic
10
Microarchitecture Optimizations
  • Fixed-Point Unit (FXU)
  • 32-bit implementation
  • PUMA ISA
  • Subset of PowerPC ISA, integer instructions
  • GPR and essential supervisor registers
  • Limited exception model
  • Traditionally high-performance features
  • Cache
  • Instruction prefetching
  • Branch prediction
  • Superscalar execution

11
Baseline Microarchitecture
L1-Icache size line size assoc. miss penalty
Fetch prefetch branch pred.
RF
Decode
ROB entries
FPU rs latency
BRU rs latency
ALU rs latency
LSU rs latency
L1-Dcache size line size assoc. miss penalty
12
Cache Optimization Parameters
  • Optimize cache parameters
  • Size
  • 512KB32MB
  • Line width
  • 416 Bytes
  • Associativity
  • DM8 way
  • Instruction pre-fetching
  • 08 entries stream buffer
  • Latency
  • 04 cycles
  • Optimize load/store
  • Load forwarding

13
Superscalar Optimization
  • Dependencies
  • Accurate branch prediction
  • ISA
  • System configuration
  • Application program
  • Implementation details
  • Modified Tomasulo algorithm
  • Reorder buffer
  • Parameters
  • Degree of superscalar
  • 15 instructions wide
  • Reservation stations
  • 16 stations
  • Reorder buffer size
  • 232 entries

14
Branch Prediction
  • Two level prediction
  • Nine schemes analyzed
  • Single PHT
  • Global schemes
  • Per-set schemes
  • Per-address schemes
  • Custom simulation program
  • Branch level
  • Millions of instructions
  • Optimization criteria
  • Cost in bits
  • 1288192 bits

1st Level Branch History Register BHR
15
Branch Prediction
16
Instruction Translation
  • Compound PowerPC instructions
  • Multiple operations
  • Multiple source and destination registers
  • Unit-operations
  • One unit of computational time
  • Modify one architectural register
  • 15 overhead in raw count
  • -2.8 IPC performance loss

17
FXU Logical Pipeline
  • 8-stage pipeline
  • X has 3-cycle latency for load and store
    instructions
  • Forwarding via CDB and ROB

ICA
ICH
IB
D
Reorder Buffer
CDB
S
X
W1
W2
18
Proposed FXU Microarchitecture
  • 500,000 transistors
  • Dual-issue OOO superscalar
  • 2 reservation stations per functional unit
  • 8-entry reorder buffer
  • 1 KB on-chip icache / 16 KB off-chip dcache
    (3-cycle)
  • 256 B gshare (87)
  • 0.76 IPC
  • 27 improvement over pipeline
  • High efficiency (1.52 IPC/M-transistors)

19
CMOS FXU I Microprocessor
  • Purpose
  • Test architecture
  • Debug design tools
  • Characteristics
  • TSMC 0.35?m 3M CMOS
  • 830K transistors
  • 9.9x9.9 mm (7.5x6.8 mm core)
  • Development
  • 9 months
  • 10 graduate students

20
CMOS FXU I Block Diagram
BP
BTB
Fetch
L1-Icache
Decode
1 KB
128 bit
MMU
RF
Dispatch
ROB
BRU
ALU
LSU
32 bit
4 KB
21
CGaAs FXU II Block Diagram
BP
BTB
Fetch
L1-Icache
Decode
1 KB
128 bit
MMU
RF
Dispatch
ROB
BRU
ALU
LSU
32 bit
16 KB
22
CGaAs FXU II Microprocessor
  • Characteristics
  • Motorola 0.5 ?m 3M CGaAs
  • 380K transistors
  • 13.1x11.4 mm
  • Area I/O interface
  • 16 KB Dcache
  • Performance
  • 0.51 IPC
  • 25 MHz
  • 274 mW _at_ 1.3 V

23
PUMA Design Flow
Behavioral RTL model
  • Mixed tool set
  • Custom tools
  • Verilog checker
  • Random Test Generator
  • RAM compiler
  • Commercial tools
  • Verilog/VCS
  • EPOCH
  • TDS
  • Mentor IC Station
  • HSpice
  • PowerPC simulator

Manualsynthesis
Verification
Error
Yes
No
Gate-levelRTL model
Physical design
Fabrication
Testing
24
Design Verification and Testing
Testprogram
Random Test Generator
PowerPCcompiler
VCD output
RTL model
PowerPCinstructionsimulator
Checker
TDS
reg
mem
Standard Vectors
Conversionscript
Debug
Error
No
Yes
Test Vectors
HP82000
25
Physical Design Methodology
Verification
Verilog
Logic Synthesis
Structural Mapping
Place and Route
Floorplanning
Parasitic Extraction
geodb
Timing
EPOCH
Chip layout
Chip netlist
26
Physical Design Methodology
Verification
Verilog
Logic Synthesis
Structural Mapping
Place and Route
Floorplanning
Parasitic Extraction
geodb
Timing
EPOCH
Parasitic Extraction
Chip layout
Chip netlist
Cell Check
Verification
Mentor IC
Power/Clock Distribution
DRC
LVS
Final Layout
27
Standard Cell Generation
HSpice
  • HSpice analog analysis
  • 14 input gates
  • Optimized transistor sizes
  • Epoch Cell generation
  • Used CGaAs design rules
  • Basic CMOS cell generated
  • IC Station modification
  • Gate connection
  • DRC errors
  • Cell library
  • 30 types of complimentary gates
  • Drive strength
  • 16x standard gates
  • 164x buffers

EPOCH
geodb
Generate Cells
Masterport
Standard Cells
Standard Cells
IO Pads (gdsii)
Optimize Cells
Mentor IC
28
Process-Independent RAM Compiler
  • Generates RAM macrocells
  • Objectives
  • RAM comparisons
  • Performance
  • Area
  • Multiple processes
  • Cost/Benefit Analysis
  • Optimize CGaAs embedded RAM
  • Components Created
  • CGaAs Icache
  • 2 KB test RAM

29
RAM Compilation Methodology
Input Parameters
Process Description RAM Configuration
User-specifiedtarget cycle time
Power
Delay
Near-optimalpower-delay curve
RAM layout SPICE netlist
30
Test Features
  • Test limitations
  • Transistor count
  • I/O pin count
  • Test block
  • Pad ring
  • 32-bit inverter ring oscillator
  • 32-bit nand ring oscillator
  • 32-bit shift register
  • Clock tree output
  • Other features
  • Scan paths (FXU II only)
  • ALU, Decoder, Dispatch
  • Functionality Disables
  • Icache, Dcache, Exceptions, Pipeline

31
Test Flow
  • Test block
  • Disable caches and pipelining
  • Single instruction tests
  • Multiple instruction tests
  • Branch tests
  • Critical path
  • Enable pipeline
  • Enable instruction cache
  • Enable data cache
  • Power, frequency, voltage
  • Scan testing

32
FXU II VoltageFrequency Plot
33
FXU II VoltageCurrent Plot
34
Other Results
  • Problems
  • Caches
  • Data output
  • Sources
  • Gate leakage currents
  • Process parameters
  • ?n,p, Idn,p, VTp
  • Pseudo-DCFL RAM Decoders

35
Conclusions
  • CGaAs
  • FlexibleComplementary, P-DCFL, Domino
  • Low power
  • Radiation hard
  • Immature process
  • CGaAs FXU II
  • 25 MHz _at_ 1.3V
  • First attempt in new process
  • Contributions
  • Cost-efficient microarchitecture
  • Technology independent RAM compiler
  • Portable verification/test environment
  • Future work
  • SOI PUMA

36
(No Transcript)
37
CGaAs V. CMOS (Device)
38
CGaAs Delay Versus VDD
CGaAs, TFSOI, and CMOS Performance Comparison
Delay
VDD (V)
Brown98
39
CGaAs Pseudo-DCFL
  • Active load p-type transistor
  • Ratioed
  • Poor VOL, noise margins
  • Static power dissipation
  • High speed
  • Cost-effective

40
CGaAs Complementary Logic
  • Dual networks
  • Moderate speed
  • No static power dissipation
  • Switching power
  • Good noise margins
  • Expensive
  • Some tool compatibility

41
CGaAs Domino Logic
  • Two-phase operation
  • High speed
  • Complex functions
  • Cost-effective
  • High power
  • Non-inverting

42
CGaAs Digital Logic Families
43
FXU II Path Distribution
44
Computation Efficiency
45
CGaAs V. CMOS (Fan out)
46
CGaAs V. CMOS (Geometry)
  • CMOS is metal limited (FXU I)
  • CGaAs is transistor limited (FXU II)

47
RAM Compiler Methodology
Input Parameters
Process DescriptionIC design rulesSPICE
modelsSheet resistancesParasitic
capacitancesElectromigration rules RAM
ConfigurationCapacityAspect ratioRead/write
configTarget cycle time
User-specifiedtarget cycle time
Power
Delay
Near-optimalpower-delay curve
48
2 KB RAM Chip
  • Motorola 0.5 ?m CGaAs
  • Test chip
  • Same design as FXU II cache
  • 8 x 2048 bit RAM
  • Pseudo-DCFL decoder
Write a Comment
User Comments (0)
About PowerShow.com