Title: hunga@am.ics.keio.ac.jp
1??????????
- ??????????
- ????
- hunga_at_am.ics.keio.ac.jp
2????????????
- ????????
- ????????
- ?????????????????
- 0.1µm????????????
- ???????????????
- ??????????????????
- ?????????????????
- ?????VLIW??
- SMT (Simultaneous Multi?Threading)
- ?????????
- Reconfigurable Systems
3Intel Itanium
- 64bit??????IA-64???
- VLIW???????
- ??????????????????
- ???????????????
- ???????VLIW?????????????????????
- ????????
- ????????????????
4???VLIW(????????)
add r6_at_gprel(a),gp group1 r6a ldfpd
f1,f2r6 group2 f1a0,f2a1 ldfd
f3r5,16 f3a2 fma.d f4f1,f2,f3
group3 f4f1f2f3
5???VLIW(?????????)
group
Bundle(128bit)
Template(5bit)
Cycle Break
6????????
10????????6??????
Instruction Delivery
Operand Delivery
Execution
Front end
IPG
FET
ROT
EXP
REN
WLD
REG
EXE
DET
WRB
IPGInstruction Pointer generation FETFetch ROTR
otate EXPExpand RENRename WLDWord-line
decode REGRegister Read EXEExecute
DETExecute detect WRBWrite Back
7Block Diagram
L1 Instruction CacheFetch/Prefetch Engine
IA-32 Decode and Control
Off Chip L3 Cache
L2 Cache
Branch Prediction
B
B
B
M
M
I
I
F
F
RegisterStuck/Re-mapping
Score boad etc.
Branch Units
Integer MMU Units
FP Units
Bus Controller
8???????predication register
1???????????????????
cmp eax,ebx jne L30 mov ebx,CONST1 jmp
L31 L30 mov ebx,CONST2 L31
cmp.eq p7,p8r14,r15 (p7) movi
r15CONST1 (p8) movi r16CONST2
9Advanced Load
Advanced Load
ld4.a r2r33 add r34,r0 st4 r32r3 ld4.c
r2r33 add r5r2,r3
add r34,r0 st4 r32r3 ld4 r2r33 add
r5r2,r3
Check
st??????????????ALAT(Advanced Load Address
Table)?????????
10Speculative Load
add5 ld8.s r1r32 cmp.eq p6,p5r32,r0 (p6) a
dd r8-1,r0 (p6) br.ret (p5) chk.s
r1,return_error add r85,r1 br.ret
add5 cmp.eq r6,p5r32,r0 (p6) add
r8-1,r0 (p6) br.ret (p5) ld8 r1r32 add
r85,r1 br.ret
page fault?????load??????
11??????????
??????????????VLIW????????? ??????????????????????
12Software Pipelining
i
Loop LD F0,0(R1) ADDD F4,F0,F2 SD 0(R1),F4 SUB
I R1,R1,8 BNEZ R1,Loop
Loop SD 0(R1),F4 ADDD F4,F0,F2 LD
F0,-16(R1) SUBI R1,R1,8 BNEZ R1,Loop
i-1
i-2
??????????????(???DLX??????) ??????????????????
13Software Pipelining?????
- Modulo Schedule loop??
- loop counter?0?????????????????????????EC?0???????
- ???????????
- ?????????????????????????????
14???
- ?????????????????????
- ??????????????
15???
- ???VLIW????
- ?????????????RISC?????????????????
- ??????????????
- ????????????????????
- IA-32??????????
16Simultaneous Multithreading (SMT)
- ???????????
- ????????
- ????????????
- ??????????????????????
- SMT?????????
- ?????????????
17SMT???
Issue Slots
Issue Slots
Issue Slots
Clock Cycles
fine-grained multithreaded superscalar
superscalar
SMT
18??????????
Instruction Per Cycle(IPC) ?????
SPECInt SPECInt Apache Apache
OS ?? ?? ?? ??
superscalar 3.0 2.6 1.1
SMT 5.9 5.6 4.6
not OS intensive application
SPECInt
Apache
OS intensive application
19SMT???
- ???????????????????
- IPC ???
- ???????????
- OS(kernel)??????
- ????????????????????
20????????????????
- ???????????????????
- ILP(Instruction Level Parallelism)
- ??????????
- Trace Level Parallelism
- ??????????
- Thread Level Parallelism
- ??????????(??????)
- Process Level Parallelism
- ???????????????????????????
21????????vs. ????????
????????
????????
??Thread????
??????
????????????
????????
22Flynn???
- ???(Instruction Stream)?? M(Multiple)/S(Single)
- ????(Data Stream)??M/S
- SISD
- ???????(???????VLIW???)
- MISD?????(Analog Computer)
- SIMD
- MIMD
23SIMD
- ??????????????
- ??????
- Illiac-IV/????????????(???)
- CM-2???(???)
?????
???????
??????
24SIMD????
- ?????????????????????
- ILLIAC-IV,BSP,GF-11
- ??????CPU??????????
- ?????????1bit????bit?????????
- ICL DAP, CM-2,MP-2
- ??????????????????????(CmLisp???)
25CM-2??????
Flags
A
B
F
OP
C
Context
s
c
256bit memory
1bit serial ALU
26CM2?????????
4096???? 64K PE
??
1?????
Router
4x4 Processor Array
12links 4096 Hypercube connection
256bit x 16 PE RAM
27SIMD????
- ???SIMD
- ???????????????
- ?????????????????????????????
- ???SIMD
- ????????????
- CM2 ? CM5??????????????????????
- ??SIMD????????????????
28MIMD
- ??????????????????
- ?????
- ??????
- ?????????
?????
???
???(?????)
29?????????????
- UMA(Uniform Memory Access Model)
- ??????????????????????????????????
- NUMA(Non-Uniform Memory Access Model)
- ??????????????????
- NORA/NORMA(No Remote Memory Access Model)
- ??????????????????????
30UMA
- ???????????
- ??????????
- ??????OS?????
- ????????????
- ?????
- ???????
- ???????LSI??????????
31UMA????????
Main Memory
shared bus
PU
SMP(Symmetric MultiProcessor)???????? ??????????
32???????UMA
Local Memory
....
CPU
Interface
Switch
.
Main Memory
???????????????????????
33Stanfords Hydra
Considerations in the design of Hydra
CSL-TR-98-749,
DRAM Main Memory
I/O
34Daytona(Lucent)
- MESI Protocol
- RISCDSP
- Pipelined operation of bus and memory controller.
- 128bit STBus
- 0.25µm CMOS 4.5m6mm (small chip)
35Daytona(Lucent)
STBus
Memory and I/O Controller
semaphores
arbiter
36Power4(IBM)
- 0.18µm copper process, 400m?
- 17000M Tr.
- Inter-chip interface for MCM(Multi-Chip Module)
- TLP(Thread Level Parallelism)
- Design considering memory bandwidth
- Shared cache links
37Power4(IBM)
gt100GByte/s
gt333MHz gt10GByte/s
CPU1
CPU2
Chip-to-Chip Interconnect
L3 Tags
Chip-to-Chip Interconnect
gt500MHz gt35GByte/s
38MAJC
- Hierarchical structure
- Variable length VLIW processing element
- Shared cache
- I/O for inter-processor communication
- I/O for PCI,DRAM
- MAJC52000.22µm CMOS 220mm square
39MAJC(Microprocessor Architecture for Java
Computing SUN)
N-UPA
Rambus I/O
PCI I/O
Graphic Processor
Switch
Shared Cache
S-UPA
40NUMA
- ???????????????????????????????????????????
- ?????????????????????????????
- ??????
- UMA???????????????
- ???????????????????????
41??????
Node 0
0
Node 1
1
Interconnecton Network
2
Node 2
3
????????
Node 3
42NUMA???
- (???)NUMA
- ????????????????
- ??????????????????????????
- CC-NUMACache Coherent
- ?????????????
- ????????????
- COMACache Only Memory Architecture
- ???????????
- ???????????
43Earth Simulator (2002,NEC)
Peak performance 40TFLOPS
Interconnection Network (16GB/s x 2)
.
Node 0
Node 1
Node 639
44SGI Origin
Bristled Hypercube
Main Memory
Hub Chip
Network
Main Memory?Hub Chip?????????? 2PE?1Cluster
45DDM(Data Diffusion Machine)
D
...
46NORA/NORMA
- ??????????
- ????????????????
- ?????????????????????
???????????????????? ???????????????????????
47Fine grain
SIMD
Coarse grain
????????
?????? ?????
?????UMA ???????UMA
Simple NUMA CC-NUMA COMA
MIMD
NUMA
NORA
?????????
????????????? ???????? ???? ????
???
48?????????????
Data?x
?????
Data?y
????????????????????? ???????????????????
49??????? yAx
y0 y1 y2 y3
x0 x1 x2 x3
a
yi
yo
X
yo a x y i
x
50??????? yAx
a23 a32
a22
a12 a21
a11
X
x1
51??????? yAx
a33
a23 a32
a22
a12 a21
y1a11x1
X
X
x1
x2
52??????? yAx
a34 a43
a33
a23 a32
a22
y1a11 x1 a12 x2
y2a21 x1
X
x3
x1
x2
53??????? yAx
a44
a34 a43
a33
a23 a32
y2a21 x1 a22 x2
X
X
x2
x3
54??????? yAx
a44
a34 a43
y2a21 x1 a22 x2 a23 x3
a33
y3 a32 x2
X
x2
x3
55????????????????
- ??????????
- ???????
- ???????????
- 1980??Kung?????????????LSI?????????????????
56?????????????
- ??????????????????????????????
- ???????????????
- ????????????????????????????
57?????????
d
e
x
c
a
b
x
(ab)x(c(dxe))
58????????
??????
???????
???????
??????
????
????
?????
?????
?????????
????????
59?????????????
- ????????????(Dennis????)???????
- ??????????????????????????
- ??????
- ??????????????????????????
- ??????
60Reconfigurable System(Custom Computing Machine)
- SRAM??????????????????????????????????????????????
?????? - ??????????
- ?????????
- ?????????????????????????????????????????
????????????
61???FPGA/PLD
- ??????1000K???????(????????)
- ??????????????30MHz??????????
- ????????SRAM???
625??????
SRAM?FPGA (Field Programmable Gate Array)
2 F.F.
I/O
Logic Block
Switch
63SRAM?CPLD (Complex Programmable Logic Device)
I/O
Logic Block
Switch
64????Reconfigurable System
- ?????
- ???????????
- Splash 12, RM-I,II,III,IV, FLEMING
- ???????
- ?????????????????
- PRISM I,II? DISC II
65Reconfigurable System???
- ?????????
- ????????????
- Splash 12, RM-I,II,III,IV, RASH(??)?ATTRACTOR(NTT
) - ???????
- ?????????????????
- PRISM I,II?DISC-II?PipeRench?CHIMAERA,Chameleon??
66Reconfigurable System???
Stand Alone
Co-processor
New Device
1990?
?1?FPL
SPLASH
MPLD
PRISM-I
?1?Japanese FPGA/PLD Conf.
1992?
SPLASH-2
PRISM-II
RM-I
WASMII
1993?
?1?FCCM
RM-II
Cache Logic
RM-III
DISC
RM-IV
1995?
YARDS
Mult.Context FPGA
RM-V
DISC-II
HOSMII
ATTRACTOR
FIPSOC
Cont.Switch.FPGA
RASH
PipeRench
DRL
PCA
2000?
CHIMERA
Chamereon
67Splash-2 (Arnold? 92)
- ???????????
- ???????????DNA????????????????Cray-II?330???????
- ?????????????
- VHDL, ??C??????????
- Annapolis Micro Systems??????(WILDFIRE)
68Splash-II
- ??????????
- ???????????DNA????????????????Cray-II?330???????
- ???????
- VHDL,??C??????????
69RM-IV(????)
FPIC
Interface
70RASH(????)
CompactPCI bus
EXE- ???
CPU???
??????
RASH unit
Ethernet LAN
CD
1Unit ??6??EXE????CPU???(Pentium) ???Unit?????
This slide is supported by Dr.Nakajima of
Mitsubishi.
71?????????? 2??????? PCI??I/F SRAM?? DRAM????????
PCI-bus
PCI-bus I/F
SRAM (2MB)
PCI Local-bus
EXE-board controller
FPGA
FPGA
FPGA
FPGA
FPGA
FPGA
FPGA
FPGA
FPGA Altera FLEX10K100A (62K-158KGate)
72ATTRACTOR(NTT)
?????????(1Gbps)
ATM I/O
ATM SW
Buffer
RAM (LUT)
RISC
RISC
RISC
RISC
Ethernet
Compact PCI
MPU
ATM????? ????????
??????????
Mem.
????????????
73???????
- Core CPU????
- ????????????
- ???Core CPU????????????????????????????????
- NAPA, Garp, Chameleon, Chimaera, PipeRench
74PRISM II(Brown??)
Am2955 CPU
Data
Address Control
Boot ROM
Switch
DRAM
Burst Mode Memory Controller
DRAM
FPGA Module
FPGA Module
FPGA Module
???????????????? ??????????
75Garp (Hauser? 97)
Memory queue
- UCB???????
- MIPS???Reconfigurable Array?????????????
- ?????????????????????????????
- ???????Ultrasparc?43????
MIPS
Cache
Q
Q
Q
Crossbar
32bit buses x 5
Reconfigurable Array
76DISC (Wirthlin? 95)
FPGA 3 Processor Core
System Memory
- Brigham Young??
- ??????????????????????
- ??????????????????????
- ?????????????
- C????????????
- FPGA????????????????????????
FPGA 1 Bus I/F Configuration Controller
FPGA 2 Custom Instruction Space
Host P/C
77CHIMAERA (Ye? 2000)
- Northwestern??
- ????????????????????????????
- ??????????????9???????????
- Out of Order??
- 1020????
???????? ????
???? ????
???? ???
uP??
??????
78Chameleon(Chameleon?)
- Field Programmable System Level Integrated
Circuits (FPSLICs) - ????Reconfigurable Processing Fabric?RISC
Core?PCI Controller?Memory Controller?DMA
Controller?SRAM?1??????? - ??????????????????DSP?5-10????
79Chameleon CS2112
32-bit PCI Bus
64-bit Memory Bus
PCI Cont.
RISC Core
Memory Controller
128-bit RoadRunner Bus
DMA Subsystem
Configuration Subsystem
Reconfigurable Processing Fabric
160-pin Programmable I/O
80Reconfigurable Processing Fabric???
DPU
CTL
LM
Tile 0
Slice 3
108?DPU(Data Path Unit)?4??Slice(?3Tile)??? 1Tile
9DPU32bit ALU X 7 16bit 16bit??? X 2
81DPU???
OPC?Verilog??????? DPU???SIMD,??????
Instruction
Register Mask
Routing MUX
OP
Register
Barrel Shifter
Register
Register Mask
Routing MUX
82Reconfigurable System????
- SRAM?FPGA???????CPU,DSP???10????10???????
- ??????????
- ?????????????????????????
- ??????????????????????
83????
- ??????????????????
- ???????????
- ???????????????Reconfigurable Systems?????????????
??