RAMP Gold: Architecture and Timing Model - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

RAMP Gold: Architecture and Timing Model

Description:

RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanovi Parallel Computing Laboratory – PowerPoint PPT presentation

Number of Views:187
Avg rating:3.0/5.0
Slides: 19
Provided by: Yuns150
Category:

less

Transcript and Presenter's Notes

Title: RAMP Gold: Architecture and Timing Model


1
RAMP Gold Architecture and Timing Model
  • Andrew Waterman, Zhangxi Tan, Rimas Avizienis,
    Yunsup Lee, David Patterson, Krste Asanovic

Parallel Computing Laboratory University of
California, Berkeley
2
RAMP Gold Overview
Par Lab InfiniCore
  • Tiled CMP simulator
  • ISA SPARC V8
  • (ARM/Thumb-2 later?)
  • Split timing and function (both on FPGA)
  • Host-multithreaded
  • Runs on V5LX110T (XUP)

3
RAMP Gold Target Machine
64 cores
SPARC V8 CORE
SPARC V8 CORE
SPARC V8 CORE
SPARC V8 CORE

I
D
I
D
I
D
I
D
Shared L2 / Interconnect
DRAM
4
RAMP Gold v1 Target Features
  • 64 single issue in-order SPARCv8 processors
  • Simple, 5-stage pipeline
  • FPU
  • Cache Timing model
  • Configurable size, line size, associativity, miss
    penalty, shared/private
  • Change parameters without resynthesis

5
RAMP Gold Architecture
  • Mapping the target machine directly to an FPGA is
    inefficient
  • Solution split timing and functionality
    Multithreading
  • The timing logic decides how many target cycles
    an instruction sequence should take
  • Simulating the functionality of an instruction
    might take multiple host cycles

6
Function/Timing Split Advantages
  • Flexibility
  • Can configure target at runtime
  • Synthesize design once, change target model
    parameters at will
  • Efficient FPGA resource usage
  • Example 1 model a 2-cycle FPU in 10 host cycles
  • Example 2 model a 16MB L2 using only 256KB host
    BRAM to store tags/metadata
  • Enables multithreading

7
Split Timing and Function
Target Machine
Functional Model
Timing Model
CPU TM
CPU
CPU FM


L1 D
L1 D TM
L1 D FM
MEM
MEM FM
MEM TM
  • Functional model executes ISA correctly
  • Timing model determines how long a program takes
    to run

8

Split Timing and Function
Target Machine
Functional Model
Timing Model
CPU TM
CPU
CPU FM


L1 D
L1 D TM
MEM FM
MEM
MEM TM
  • Functional model executes ISA correctly
  • Timing model determines how long a program takes
    to run

9
TM FM from 30,000 ft
Memory Timing Model
ld/st address
stall
CPU Timing Model
L1 D Timing Model
stall
ld/st address store data
ld/st address store data
instruction complete
CPU Functional Model
Memory Functional Model
load data
instruction
10
TM FM from 3,000 ft
Memory Timing Model
ld/st address
stall
CPU TM
L1 D TM
stall
IF
CTRL
TM1
TM2
ld/st address, store data
ld/st address, store data
instruction complete
CPU FM
Memory Functional Model
WB
DEC
EX
MEM
instruction
load data
11
Example Target Load Miss
Memory Timing Model
6
4
ld/st address
stall
CPU TM
L1 D TM
7
4
stall
IF
CTRL
TM1
TM2
3
1
ld/st address, store data
ld/st address, store data
instruction complete
4
CPU FM
Memory Functional Model
WB
DEC
EX
MEM
instruction
2
load data
5
12
Timing-Driven Host Pipeline
T0
T1
T2
TARGET MEMORY TM/FM
ADD
LD
ST
CPU/D Timing Model
L1 D TM
TS
TM1
TM2
TM3
IF
Store Buffer
TID,INST
TID,ADDR
CPU Functional Model
Load Result Buffer
DE
EX
WB
MEM2
MEM1
13
Cache Modeling
  • The cache model maintains tag, state, protocol
    bits internally
  • Whenever the functional model issues a memory
    operation, the cache model determines how many
    target cycles to stall

tag
index
offset
associativity

tag, state
tag, state
tag, state



hit/miss
14
Multithreaded, Pipelined Cache TM
hit?


Address

tag, state
tag, state
tag, state
Index
15
Quick Dirty Validation
  • 32KB, 2-way L1 D, 64B lines
  • 256KB, 4-way L2, 64B lines

16
Status
  • Functional simple timing model work in HW
  • Running real programs (e.g. SPLASH2)
  • Near term future work
  • Move from current functional-first stall
    configuration to timing-driven described here
  • More interesting memory system timing model
  • Functional potpourri (FDIV, MMU, )

17
DEMO
  • Run OCEAN with different L1 D parameters

18
Questions?
  • Thank you!
Write a Comment
User Comments (0)
About PowerShow.com