Lecture 4: Memory Scheduling, Refresh - PowerPoint PPT Presentation

About This Presentation

Title:

Lecture 4: Memory Scheduling, Refresh

Description:

Lecture 4: Memory Scheduling, Refresh Topics: scheduling policies, refresh basics * * * * Scheduling Policies Basics Must honor several timing constraints for each ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 18

Provided by: RajeevBalas159

Learn more at: https://my.eng.utah.edu

Category:

more less

Transcript and Presenter's Notes

Title: Lecture 4: Memory Scheduling, Refresh

1
Lecture 4 Memory Scheduling, Refresh

Topics scheduling policies, refresh basics

2
Scheduling Policies Basics

Must honor several timing constraints for each
bank/rank
Commands PRE, ACT, COL-RD, COL-WR, REF,
Power-Up/Dn
Must handle reads and writes on the same DDR3
bus
Must issue refreshes on time
Must maximize row buffer hit rates and
parallelism
Must maximize throughput and fairness

3
Address Mapping Policies

Consecutive cache lines can be placed in the
same row
to boost row buffer hit rates
Consecutive cache lines can be placed in
different ranks
to boost parallelism
Example address mapping policies
rowrankbankchannelcolumnblkoffset
rowcolumnrankbankchannelblkoffset

4
Reads and Writes

A single bus is used for reads and writes
The bus direction must be reversed when
switching between
reads and writes this takes time and leads to
bus idling
Hence, writes are performed in bursts a write
buffer stores
pending writes until a high water mark is
reached
Writes are drained until a low water mark is
reached

5
Refresh

Example, tREFI (gap between refresh commands)
7.8us,
tRFC (time to complete refresh command) 350
ns
JEDEC must issue 8 refresh commands in a
8tREFI
time window
Allows for some flexibility in when each refresh
command
is issued
Elastic refresh issue a refresh command when
there is
lull in activity any unissued refreshes are
handled at the
end of the 8tREFI window

6
Maximizing Row Buffer Hit Rates

FCFS Issue the first read or write in the queue
that is
ready for issue (not necessarily the oldest in
program order)
First Ready - FCFS First issue row buffer hits
if you can

7
STFM Mutlu and
Moscibroda, MICRO07

When multiple threads run together, threads with
row
buffer hits are prioritized by FR-FCFS
Each thread has a slowdown S Talone /
Tshared, where T is
the number of cycles the ROB is stalled
waiting for memory
Unfairness is estimated as Smax / Smin
If unfairness is higher than a threshold, thread
priorities
override other priorities (Stall Time Fair
Memory scheduling)
Estimation of Talone requires some book-keeping
does an
access delay critical requests from other
threads?

8
PAR-BS Mutlu and
Moscibroda, ISCA08

A batch of requests (per bank) is formed each
thread can
only contribute R requests to this batch batch
requests
have priority over non-batch requests
Within a batch, priority is first given to row
buffer hits, then
to threads with a higher rank, then to older
requests
Rank is computed based on the threads memory
intensity
low-intensity threads are given higher
priority this policy
improves batch completion time and overall
throughput
By using rank, requests from a thread are
serviced in
parallel hence, parallelism-aware batch
scheduling

9
TCM Kim et al.,
MICRO 2010

Organize threads into latency-sensitive and
bw-sensitive
clusters based on memory intensity former gets
higher
priority
Within bw-sensitive cluster, priority is based
on rank
Rank is determined based on niceness of a
thread and
the rank is periodically shuffled with
insertion shuffling or
random shuffling (the former is used if there
is a big gap in
niceness)
Threads with low row buffer hit rates and high
bank level
parallelism are considered nice to others

10
Minimalist Open-Page Kaseridis et al.,
MICRO 2011

Place 4 consecutive cache lines in one bank,
then the next
4 in a different bank and so on provides the
best balance
between row buffer locality and bank-level
parallelism
Dont have to worry as much about fairness
Scheduling first takes priority into account,
where priority
is determined by wait-time, prefetch distance,
and MLP in
thread
A row is precharged after 50 ns, or immediately
following
a prefetch-dictated large burst

11
Other Scheduling Ideas

Using reinforcement learning Ipek et al., ISCA
2008
Co-ordinating across multiple MCs Kim et al.,
HPCA 2010
Co-ordinating requests from GPU and CPU
Ausavarungnirun et al., ISCA 2012
Several schedulers in the Memory Scheduling
Championship at ISCA 2012
Predicting the number of row buffer hits
Awasthi et al., PACT 2011

12
Refresh Basics

A cell is expected to have a retention time of
64ms
every cell must be refreshed within a 64ms
window
The refresh task is broken into 8K refresh
operations
a refresh operation is issued every tREFI
7.8 us
If you assume that a row of cells on a chip is
8Kb and
there are 8 banks, then every refresh operation
in a 4Gb
chip must handle 8 rows in each bank
Each refresh operation takes time tRFC 300ns
Larger chips have more cells and tRFC will grow

13
More Refresh Details

To refresh a row, it needs to be activated and
precharged
Refresh pipeline the first bank draws the max
available
current to refresh a row in many subarrays in
parallel each
bank is handled sequentially the process ends
with a
recovery period to restore charge pumps
Row on the previous slide refers to the size
of the
available row buffer when you do an Activate
an Activate
only deals with some of the subarrays in a
bank
refresh performs an activate in all subarrays
in a bank,
so it can do multiple rows in a bank in
parallel

14
Fine Granularity Refresh

Will be used in DDR4
Breaks refresh into small tasks helps reduce
read queuing
delays (see example)
In a future 32Gb chip, tRFC 640ns, tRFC_2x
480ns,
tRFC_4x 350ns note the high overhead from
the
recovery period

15
What Makes Refresh Worse

Refresh operations are issued per rank LPDDR
does
allow per bank refresh
Can refresh all ranks simultaneously this
reduces memory
unavailable time, but increases memory peak
power
Can refresh ranks in staggered manner
increases memory
unavailable time, but reduces memory peak power
High temperatures will increase the leakage rate
and
require faster refresh rates (gt 85 degrees C ?
3.9us tREFI)

16
Next Class

Refresh optimizations elastic refresh, refresh
pausing,
smart refresh, Flikker, preemptive command
drain,
refresh and commands together, etc.

17
Title

Bullet

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

World's Best PowerPoint Templates PowerPoint PPT Presentation

World's Best PowerPoint Templates - CrystalGraphics offers more PowerPoint templates than anyone else in the world, with over 4 million to choose from. Winner of the Standing Ovation Award for “Best PowerPoint Templates” from Presentations Magazine. They'll give your presentations a professional, memorable appearance - the kind of sophisticated look that today's audiences expect. Boasting an impressive range of designs, they will support your presentations with inspiring background photos or videos that support your themes, set the right mood, enhance your credibility and inspire your audiences.

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Lecture: DRAM Main Memory PowerPoint PPT Presentation

Lecture: DRAM Main Memory - Title: PowerPoint Presentation Author: Rajeev Balasubramonian Last modified by: RB Created Date: 9/20/2002 6:19:18 PM Document presentation format | PowerPoint PPT presentation | free to view

Lecture 16: Main Memory Innovations PowerPoint PPT Presentation

Lecture 16: Main Memory Innovations - Title: PowerPoint Presentation Author: Rajeev Balasubramonian Last modified by: RB Created Date: 9/20/2002 6:19:18 PM Document presentation format | PowerPoint PPT presentation | free to view

Lecture 14: DRAM Main Memory Systems PowerPoint PPT Presentation

Lecture 14: DRAM Main Memory Systems - Lecture 14: DRAM Main Memory Systems Today: cache/TLB wrap-up, DRAM basics (Section 2.3) * | PowerPoint PPT presentation | free to view

Lecture 1: Introduction and Memory Systems PowerPoint PPT Presentation

Lecture 1: Introduction and Memory Systems - Lecture 1: Introduction and Memory Systems CS 7810 Course organization: 7 lectures on memory systems 3 lectures on cache coherence and consistency | PowerPoint PPT presentation | free to view

Lecture: Memory Technology Innovations PowerPoint PPT Presentation

Lecture: Memory Technology Innovations - Lecture: Memory Technology Innovations Topics: memory schedulers, refresh, state-of-the-art and upcoming changes: buffer chips, 3D stacking, non-volatile cells, photonics | PowerPoint PPT presentation | free to view

CS61C C/Assembler Operators and Operands Lecture 2 PowerPoint PPT Presentation

CS61C C/Assembler Operators and Operands Lecture 2 - C/Assembler Operators and Operands Lecture 2 January 22, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs61c/schedule.html | PowerPoint PPT presentation | free to view

Lecture: Memory, Multiprocessors PowerPoint PPT Presentation

Lecture: Memory, Multiprocessors - Title: PowerPoint Presentation Author: Rajeev Balasubramonian Last modified by: Rajeev Balasubramonian Created Date: 9/20/2002 6:19:18 PM Document presentation format | PowerPoint PPT presentation | free to view

Lecture 14 Software Design for Low-Power PowerPoint PPT Presentation

Lecture 14 Software Design for Low-Power - Title: Testing in the Fourth Dimension Author: pagrawal Last modified by: bushnell Created Date: 11/3/2000 2:09:08 AM Document presentation format | PowerPoint PPT presentation | free to view

CS252 Graduate Computer Architecture Lecture 12 Vector Processing (Con PowerPoint PPT Presentation

CS252 Graduate Computer Architecture Lecture 12 Vector Processing (Con - Graduate Computer Architecture Lecture 12 Vector Processing (Con t) Branch Prediction John Kubiatowicz Electrical Engineering and Computer Sciences | PowerPoint PPT presentation | free to view

CS61C - Machine Structures Lecture 10 PowerPoint PPT Presentation

CS61C - Machine Structures Lecture 10 - Easy to make many different sized chips with very different costs: $10 to $5000 ... Retest everything whenever you make any changes ... | PowerPoint PPT presentation | free to view

Mainstream Computer System Components PowerPoint PPT Presentation

Mainstream Computer System Components - 8-way interleaved (8-banks) ~12.8 GBYTES/SEC (peak) ... Memory Bus Controllers Memory Disks Displays Keyboards Networks System Memory (DRAM) I/O Devices: North Bridge | PowerPoint PPT presentation | free to view

Short-term working memory PowerPoint PPT Presentation

Short-term working memory - Title: Attention Author: Baycrest User Last modified by: npark Created Date: 9/19/2001 1:25:25 PM Document presentation format: 35mm Slides Company | PowerPoint PPT presentation | free to view

CS61C - Machine Structures Lecture 10 - CS 61C L29 Final lecture (1 ) Garcia / Patterson Fall 2002. CS152 Computer Architecture and ... Provides generic mechanism for 'undoing' computation ... | PowerPoint PPT presentation | free to view

Lecture 2: Software Platforms PowerPoint PPT Presentation

Lecture 2: Software Platforms - Lecture 2: Software Platforms Anish Arora CIS788.11J Introduction to Wireless Sensor Networks Lecture uses s from tutorials prepared by authors of these platforms | PowerPoint PPT presentation | free to view

Lecture 14 Software Design for LowPower - B too large to store in registers - used memory transfers, instead. Loop rearrangement allowed intermediate B to stay in general register ... Allocate registers ... | PowerPoint PPT presentation | free to view

Lecture 15: DRAM Design PowerPoint PPT Presentation

Lecture 15: DRAM Design - ... must wait in the queue (tens of nano-seconds) and ... or both in at least some classes of computers 8 * Photonics A single waveguide carries light that ... | PowerPoint PPT presentation | free to view

Lecture 4 Psyco 350, A1 Fall, 2006 PowerPoint PPT Presentation

Lecture 4 Psyco 350, A1 Fall, 2006 - Psyco 350 Lec #4 Slide 1. Lecture 4 Psyco 350, A1. Fall, 2006 ... Working Memory: An Alterative to STM. Baddeley and Hitch's (1983) model. Central executive ... | PowerPoint PPT presentation | free to view

ECE 545 Lecture 5 Finite State Machines PowerPoint PPT Presentation

ECE 545 Lecture 5 Finite State Machines - Follows Some Program' or Schedule. Often Implemented as Finite State Machine ... Moore FSM Example 2: VHDL code (2) ECE 545 Introduction to VHDL. 35 ... | PowerPoint PPT presentation | free to view

LIS508 lecture 4: storage PowerPoint PPT Presentation

LIS508 lecture 4: storage - LIS508 lecture 4: storage & output devices Thomas Krichel 2002-10-21 | PowerPoint PPT presentation | free to view

Short-term working memory - Short-term working memory Students of memory (e.g., James, Galton) have long considered that there is a memory system that keeps in consciousness a small number of ideas | PowerPoint PPT presentation | free to view

Lecture 1: Course organization; Why are user interfaces hard to design and implement? and Types of User Interfaces PowerPoint PPT Presentation

Lecture 1: Course organization; Why are user interfaces hard to design and implement? and Types of User Interfaces - ... Direct Manipulation WIMP (Windows ... then give command Hollan argues this user feel more important to DM than Shneiderman's methods Direct Manipulation, ... | PowerPoint PPT presentation | free to view

Mainstream Computer System Components - CPU Core 2 GHz - 3.0 GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation Multiple FP, integer Fus, Dynamic branch prediction | PowerPoint PPT presentation | free to view

CS 2200 Lecture 23 Networking PowerPoint PPT Presentation

CS 2200 Lecture 23 Networking - Locks: Provide mutual exclusion. Condition variables: Provide synchronization ... Microcomputer Products releases the first mass-market modem, transmitting at 300 ... | PowerPoint PPT presentation | free to view

Shortterm working memory - ... and intermediate components of the serial position curve are lower in the ... pianists can sight read and play music and shadow a stream of prose (hear and ... | PowerPoint PPT presentation | free to view

CS 2200 Lecture 18 IO 1 PowerPoint PPT Presentation

CS 2200 Lecture 18 IO 1 - Invented for IBM Field Engineers. Contact. Slow speed. 17. The College of Computing ... This iPod mini... A 4 GB disk in a 2' x 3.6' x 0.5' space... 33. The ... | PowerPoint PPT presentation | free to view

Lecture 7: Pipelining - Lecture 7: Pipelining | PowerPoint PPT presentation | free to view