PowerPoint Sunusu - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

PowerPoint Sunusu

Description:

Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing ... The centerpiece of the Piranha architecture is a highly integrated processing ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 37
Provided by: esrak1
Category:

less

Transcript and Presenter's Notes

Title: PowerPoint Sunusu


1
SINGLE CHIP MULTIPROCESSORS Computer
Architecture Term Paper (11.12.2003) Esra
KIRBAS 2002701357
1/36
2
Evaluation of Design Alternatives for a
Multiprocessor Microprocessor By Basem A.
Nayfeh, Lance Hammond and Kunle Olukotun. ISCA
23, 1996, pp. 67-77.
2/36
3
  • With the use of advanced integrated technology,
    several options for design of high-performance
    microprocessors are avaliable.
  • In multiproessor design option, a small of
    processors are interconnected on a single-chip or
    on a multi-chip-module (MCM) substrate.
  • We consantrate on single-chip multiprocessors.

3/36
4
  • Our goal is to study two proposed
    cache-sharing mechanisms for single chip
    multiprocessors
  • Shared Level-1 (L1) Cache Architecture
  • Shared Level-2 (L2) Cache Architecture
  • (Performance of these two architectures will be
    compared with a single-bus based shared-memory
    multiprocessor .)

4/36
5
  • A multiprocessor architecture whose interconnect
    is closer to the CPUs in the memory hierarchy
    will be able to exploit fine-grained parallelism
    more efficiently than a multiprocessor
    architecture whose interconnect is further away
    from the CPUs in the memory hierarchy.
  • Try to achieve good performance on fine-grained
    parallel applications without sacrificing the
    performance of parallel independent jobs.

5/36
6
  • CPU CHARACTERISTICS
  • We use the same CPU with all the three
    architectures.
  • 2-way issue processor
  • Dynamic scheduling
  • Speculative execution
  • Non-blocking caches

6/36
7
7/36
8
  • 2-way 16KB set-associative instruction and data
    caches
  • 32-entry centeralized instruction window
  • 32-entry reorder buffer.

8/36
9
Shared L1-Cache Multiprocessor
9/36
10
  • Advantages of this Architecture
  • It provides the lowest latency for
    interprocessor communication by using a
    shared-memory address space.
  • Low latency for interprocessor communication
    helps to achieve high performance in executing
    fine-grained parallel applications.
  • Processors may fetch shared data into the cache
    for each other.
  • It eleminates the cache coherence logic and
    implicitly provides a sequentially consistent
    memory without sacrificing the performance.

10/36
11
  • Disadvantages of this Architecture
  • Crossbar switching system increases the access
    time of L1 cache. (We assume that average access
    time is three.)
  • All of the memory referances will be entered L1,
    so there may be some extra delays due to bank
    conflicts.
  • If the processors are not executing fine-grained
    parallel applications, then the miss rate will
    increase.

11/36
12
Secondary cache and main memories are
uniprocessor like systems ?L2 (2 MB, 10-cycle
latency 2-cycle occupancy) ?Main
Memory 50-cycle latency 6-cycle occupancy
12/36
13
Shared L2-Cache Multiprocessor
13/36
14
  • Write-through primary caches access time is 1
    cycle
  • Latency of L2-cache increses to 14 cycles due to
    the cross-bar overhead.

14/36
15
  • L2 cache has four independent banks to increase
    its bandwith and enable it to support four
    independent access streams.
  • Data-path is 64-bit width.
  • occupancy is 4 cycles (for the transfer of
    32-bit cache line)

15/36
16
  • Only memory accesses that miss in L1-cache will
    have to deal with the problem of reduced
    performance L2 cache.
  • MCM (multi chip module) technology can be used.
  • (for 1996)
  • ?Main Memory
  • 50-cycle latency
  • 6-cycle occupancy

16/36
17
  • To keep the primary caches coherent, we need a
    coherency protocol.
  • Simply, we assume that each primary cache uses a
    write-through policy for shared data.
  • Additional hardware must be installed for this
    issue.

17/36
18
Shared Main Memory Multiprocessor
18/36
19
  • Primary cache access time is 1 cycle.
  • Secondary cache access time is 12 cycles.
  • All CPUs must access main memory to communicate.

19/36
20
Ideal Memory Latencies of Three Architectures in
CPU Clock Cycles
20/36
21
  • SIMULATION ENVIRONMENT
  • SimOS simulation environment is used
  • IRIX 5.3 operating system is simulated
  • ?Hand Parallelized Scientific and Engineering
    Applications
  • ?Compiler Parallelized Scientific and
    Engineering Applications
  • ?Multiprogramming Workload

21/36
22
  • 2 kinds of simulations is done
  • Simple Simulation (no speculative execution,
    dynamic scheduling, and non-blocking memory
    referances)
  • Dynamic Superscalar Simulation

22/36
23
SIMPLE SIMULATION RESULTS (for high degree of
interprocessor communication) ?EAR
23/36
24
?EQNOTT
24/36
25
(for moderate degree of interprocessor
communication) ?VOLPACK
25/36
26
?FFT Kernel
26/36
27
(for low degree of interprocessor
communication) ?MULTIPROGRAMMING
WORKLOAD
27/36
28
?OCEAN
28/36
29
DYNAMIC SUPERSCALAR SIMULATION
RESULTS
29/36
30
In dynamic superscalar simulation, Shared-L1
cache performance can diminish substantially,
whereas Shared-L2 and shared-memory
architectures retain much of the relative
performance predicted by the simple simulation
results.
30/36
31
Piranha A Scalable Architecture Based on
Single-Chip Multiprocessing By Luiz Andre
Barroso, Kourosh Gharachorloo, Robert McNamara,
Andreas Nowatzyk, Shaz Qadeer, Barton Sano, Scott
Smith, Robert Stets, and Ben Verghese. ISCA 27,
2000, pp. 282-293
31/36
32
  • For Online Transaction Processing Systems
  • Standart ASIC design technology is used
  • The centerpiece of the Piranha architecture is a
    highly integrated processing node, with eight
    simple Alpha processor cores, seperate
    instruction and data caches for each core, a
    shared second level cache, eight memory
    controllers, two coherence protocol engines, and
    a network router all on a single chip.

32/36
33
33/36
34
34/36
35
35/36
36
SIMULATION
36/36
Write a Comment
User Comments (0)
About PowerShow.com