DRAM background - PowerPoint PPT Presentation

About This Presentation
Title:

DRAM background

Description:

DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08 – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 41
Provided by: csVirgin62
Category:
Tags: dram | background | dram

less

Transcript and Presenter's Notes

Title: DRAM background


1
  • DRAM background
  • Fully-Buffered DIMM Memory Architectures
    Understanding Mechanisms, Overheads and Scaling,
    Garnesh, HPCA'07
  • CS 8501, Mario D. Marino, 02/08

2
  • DRAM Background

3
Typical Memory
  • Busses address, command, data, DIMM (Dual
    In-Line Memory Module) selection

4
DRAM cell
5
DRAM array
6
DRAM device or chip
7
Command/data movement DRAM chip
8
Operations(commands)?
  • protocol, timing

9
Examples of DRAM operations(commands)?
10
(No Transcript)
11
  • The purpose of a row access command is to move
    data from the DRAMarrays to the sense
    amplifiers.
  • tRCD and tRAS

12
  • A column read command moves data from the array
    of sense amplifiers of a given bank to the memory
    controller.
  • tCAS, tBurst

13
  • Precharge separate phase that is a prerequisite
    for the subsequent phases of a row access
    operation (bitlines set to Vcc/2 or Vcc)?

14
  • Organization, access, protocols

15
  • Logical Channels set of physical channels
    connected to the same memory controller

16
Examples of Logical Channels
17
Rank set of banks
18
(No Transcript)
19
Row DRAM page
20
Width aggregating DRAM chips
21
Scheduling banks
22
Scheduling banks
23
Scheduling ranks
24
Open x Close page
  • Open-page data access to and from cells requires
    separate row and column commands
  • Favors accesses on the same row (sense aps open)?
  • Typical general purpose computers
    (desktop/laptop)?
  • Close-page
  • Intense amount of requests, favors random
    accesses
  • Large multiprocessor/multicore systems

25
Available Parallelism in DRAM System Organization
  • Channel
  • Pros performance
  • different logical channels, independent memory
    controllers
  • schedulling strategies
  • cons
  • Number of pins, power to deliver
  • Smart but not adaptive firmware

26
Available Parallelism in DRAM System Organization
  • Rank
  • pros
  • accesses can proceed in parallel in different
    ranks (busses availability)?
  • cons
  • Rank-to-rank switching penalties in high
    frequency
  • Globally synchronous DRAM (global clock)?

27
Available Parallelism in DRAM System Organization
  • Bank
  • Different banks (busses availability)?
  • Row
  • Only 1 row/bank can be active at any time period
  • Column
  • Depends on management (close-page / open-page)?

28
  • Paper Fully-Buffered DIMM Memory Architectures
    Understanding Mechanisms, Overheads and Scaling,
    Garnesh, HPCA'07

29
(No Transcript)
30
Issues
  • parallel bus scaling frequency, widths, length,
    depth (man hops gt latency )?
  • memory controllers increased CPUs, GPUs
  • DIMMs/channel (depth) decreases
  • 4DIMMs/channel in DDRs
  • 2 DIMMs/channel in DDR2
  • 1 DIMM/channel in DDR3
  • scheduling

31
Contributions
  • Applied DDR based memory controller policies in
    FBDIMM memory
  • Evaluation of Performance
  • Exploit FBDIMM depth rank (DIMM) parallelism
  • latency and bandwidth for FBDIMM and DDR
  • high utilization of the channels, FBDIMM
  • 7 in latency
  • 10
  • low utilization of the channels
  • 25 in latency
  • 10 in bandwidth

32
  • Northbound channel reads / Southbound-channel
    writes
  • AMB pass-through switch, buffer, serial/parallel
    converter

33
Methodology
  • DRAMsim simulator
  • Execution-driven simulator
  • Detailed models of FBDIMM and DDR2 based on real
    standard configurations
  • Standalone / coupled with M5/SS/Sesc
  • Benchmarks bandwidth-bound
  • SVM from Bio-Parallel (r90)?
  • SPEC-mixed 16 independent (rw 21)?
  • UA from NAS (rw 32)?
  • ART (SPEC-2000, OpenMP) (rw 21)?

34
Methodology cont
  • Different scheduling policies greedy, OBF,
    most/last pending and RIFF
  • 16-way CMP, 8MB L2
  • Multi-threaded traces gathered with CMPim
  • SPEC traces using Simplescalar with 1MB L2,
    in-order core
  • 1 rank/DIMM

35
  • High-bandwidth utilization
  • Better bandwidth FBDIMM
  • Larger latency

36
  • ART and UA latency reduction

37
  • Low utilization serialization cost
  • Depth FBDIMM scheduler offsets serialization

38
  • Overhead queue, south and rank availability
  • Single-rank higher latency

39
Scheduling
  • Best RIFF, priority on reads than writes

40
  • Bandwidth is less sensitive th Higher latency in
    open-page mode
  • More channels gt decreases channel utilization
Write a Comment
User Comments (0)
About PowerShow.com