PowerPoint Sunusu - PowerPoint PPT Presentation

1 / 36

About This Presentation

Title:

PowerPoint Sunusu

Description:

Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing ... The centerpiece of the Piranha architecture is a highly integrated processing ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 37

Provided by: esrak1

Category:

more less

Transcript and Presenter's Notes

Title: PowerPoint Sunusu

1
SINGLE CHIP MULTIPROCESSORS Computer
Architecture Term Paper (11.12.2003) Esra
KIRBAS 2002701357
1/36
2
Evaluation of Design Alternatives for a
Multiprocessor Microprocessor By Basem A.
Nayfeh, Lance Hammond and Kunle Olukotun. ISCA
23, 1996, pp. 67-77.
2/36
3

With the use of advanced integrated technology,
several options for design of high-performance
microprocessors are avaliable.
In multiproessor design option, a small of
processors are interconnected on a single-chip or
on a multi-chip-module (MCM) substrate.
We consantrate on single-chip multiprocessors.

3/36
4

Our goal is to study two proposed
cache-sharing mechanisms for single chip
multiprocessors
Shared Level-1 (L1) Cache Architecture
Shared Level-2 (L2) Cache Architecture
(Performance of these two architectures will be
compared with a single-bus based shared-memory
multiprocessor .)

4/36
5

A multiprocessor architecture whose interconnect
is closer to the CPUs in the memory hierarchy
will be able to exploit fine-grained parallelism
more efficiently than a multiprocessor
architecture whose interconnect is further away
from the CPUs in the memory hierarchy.
Try to achieve good performance on fine-grained
parallel applications without sacrificing the
performance of parallel independent jobs.

5/36
6

CPU CHARACTERISTICS
We use the same CPU with all the three
architectures.
2-way issue processor
Dynamic scheduling
Speculative execution
Non-blocking caches

6/36
7
7/36
8

2-way 16KB set-associative instruction and data
caches
32-entry centeralized instruction window
32-entry reorder buffer.

8/36
9
Shared L1-Cache Multiprocessor
9/36
10

Advantages of this Architecture
It provides the lowest latency for
interprocessor communication by using a
shared-memory address space.
Low latency for interprocessor communication
helps to achieve high performance in executing
fine-grained parallel applications.
Processors may fetch shared data into the cache
for each other.
It eleminates the cache coherence logic and
implicitly provides a sequentially consistent
memory without sacrificing the performance.

10/36
11

Disadvantages of this Architecture
Crossbar switching system increases the access
time of L1 cache. (We assume that average access
time is three.)
All of the memory referances will be entered L1,
so there may be some extra delays due to bank
conflicts.
If the processors are not executing fine-grained
parallel applications, then the miss rate will
increase.

11/36
12
Secondary cache and main memories are
uniprocessor like systems ?L2 (2 MB, 10-cycle
latency 2-cycle occupancy) ?Main
Memory 50-cycle latency 6-cycle occupancy
12/36
13
Shared L2-Cache Multiprocessor
13/36
14

Write-through primary caches access time is 1
cycle
Latency of L2-cache increses to 14 cycles due to
the cross-bar overhead.

14/36
15

L2 cache has four independent banks to increase
its bandwith and enable it to support four
independent access streams.
Data-path is 64-bit width.
occupancy is 4 cycles (for the transfer of
32-bit cache line)

15/36
16

Only memory accesses that miss in L1-cache will
have to deal with the problem of reduced
performance L2 cache.
MCM (multi chip module) technology can be used.
(for 1996)
?Main Memory
50-cycle latency
6-cycle occupancy

16/36
17

To keep the primary caches coherent, we need a
coherency protocol.
Simply, we assume that each primary cache uses a
write-through policy for shared data.
Additional hardware must be installed for this
issue.

17/36
18
Shared Main Memory Multiprocessor
18/36
19

Primary cache access time is 1 cycle.
Secondary cache access time is 12 cycles.
All CPUs must access main memory to communicate.

19/36
20
Ideal Memory Latencies of Three Architectures in
CPU Clock Cycles
20/36
21

SIMULATION ENVIRONMENT
SimOS simulation environment is used
IRIX 5.3 operating system is simulated
?Hand Parallelized Scientific and Engineering
Applications
?Compiler Parallelized Scientific and
Engineering Applications
?Multiprogramming Workload

21/36
22

2 kinds of simulations is done
Simple Simulation (no speculative execution,
dynamic scheduling, and non-blocking memory
referances)
Dynamic Superscalar Simulation

22/36
23
SIMPLE SIMULATION RESULTS (for high degree of
interprocessor communication) ?EAR
23/36
24
?EQNOTT
24/36
25
(for moderate degree of interprocessor
communication) ?VOLPACK
25/36
26
?FFT Kernel
26/36
27
(for low degree of interprocessor
communication) ?MULTIPROGRAMMING
WORKLOAD
27/36
28
?OCEAN
28/36
29
DYNAMIC SUPERSCALAR SIMULATION
RESULTS
29/36
30
In dynamic superscalar simulation, Shared-L1
cache performance can diminish substantially,
whereas Shared-L2 and shared-memory
architectures retain much of the relative
performance predicted by the simple simulation
results.
30/36
31
Piranha A Scalable Architecture Based on
Single-Chip Multiprocessing By Luiz Andre
Barroso, Kourosh Gharachorloo, Robert McNamara,
Andreas Nowatzyk, Shaz Qadeer, Barton Sano, Scott
Smith, Robert Stets, and Ben Verghese. ISCA 27,
2000, pp. 282-293
31/36
32

For Online Transaction Processing Systems
Standart ASIC design technology is used
The centerpiece of the Piranha architecture is a
highly integrated processing node, with eight
simple Alpha processor cores, seperate
instruction and data caches for each core, a
shared second level cache, eight memory
controllers, two coherence protocol engines, and
a network router all on a single chip.