L2 to OffChip Memory Interconnects for CMPs - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

L2 to OffChip Memory Interconnects for CMPs

Description:

Tilera's multiprocessor has 64 cores and only 4 memory controllers ... Tilera Tile64. x5. Tilera Tile64. Five physical mesh networks. UDN, IDN, SDN, TDN, MDN ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 19
Provided by: csBer
Category:

less

Transcript and Presenter's Notes

Title: L2 to OffChip Memory Interconnects for CMPs


1
L2 to Off-Chip Memory Interconnects for CMPs
  • Presented by Allen Lee
  • CS258 Spring 2008
  • May 14, 2008

2
Motivation
  • In modern many-core systems, there is significant
    asymmetry between the number of cores and the
    number of memory access points
  • Tileras multiprocessor has 64 cores and only 4
    memory controllers
  • PARSEC benchmarks suggest that off-chip memory
    traffic increases with the number of cores for
    CMPs
  • We explore mechanisms to lower latency and power
    consumption for processor-memory interconnect

3
Tilera Tile64
x5
4
Tilera Tile64
  • Five physical mesh networks
  • UDN, IDN, SDN, TDN, MDN
  • TDN and MDN are used for handling memory traffic
  • Memory requests transit TDN
  • Large store requests, small load requests
  • Memory responses transit MDN
  • Large load responses, small store responses
  • Includes cache-to-cache transfers and off-chip
    transfers

5
Tapered Fat-Tree
  • Good for many-to-few connectivity
  • Fewer hops ? Shorter latency
  • Fewer routers ? Less power, less area
  • Root nodes directly connect to memory controller
  • Replace MDN mesh network with two tapered
    fat-tree networks
  • One for routing requests up
  • One for routing responses down

6
Tile64 with Tapered Fat Tree
7
Memory Model
  • Directory-based cache coherence
  • Directory cache at every node
  • Off-chip directory controller
  • Tile-to-tile requests and responses transit the
    TDN
  • Off-chip memory requests and responses transit
    the MDN

8
TDN and MDN Traffic for L2 Read Misses
9
Synthetic Benchmarks
  • Statistical simulation
  • Model benchmarks from PARSEC suite
  • Based on off-chip traffic for 64-byte cache-line
    for 64 cores

Working Set Size
Small
Large
Sharing
More Less
10
(No Transcript)
11
Breakdown of Average Latency
  • Latency of memory intensive applications
    dominated by queuing delay.
  • Benchmarks with little off-chip traffic save on
    transit time.

12
Power Modeling
  • Orion power simulator for on-chip routers from
    Princeton University
  • Models switching power as sum of
  • Buffer power
  • Crossbar power
  • Arbitration power
  • Specify parameters
  • Activity factor, number of input and output
    ports, virtual channels, size of input buffer,
    etc.

13
Tilera MDN Routers
14
Tree Routers
15
Parameters
  • 100 nm CMOS process
  • VDD 1.0V
  • Clock Frequency 750 MHz
  • 32-bit flit width

16
(No Transcript)
17
(No Transcript)
18
Conclusion
  • Physical design of the tapered fat-tree is more
    difficult
  • The TFT topology can reduce memory latency and
    power dissipation for many-core systems
Write a Comment
User Comments (0)
About PowerShow.com