Title: DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings
1DFTL A flash translation layer employing
demand-based selective caching of page-level
address mappings
- A. gupta, Y. Kim, B. Urgaonkar, Penn StateASPLOS
2009 - Shimin Chen, Big Data Reading Group
2Introduction
- Goal improve performance of flash-based devices
for workloads with random writes - New Proposal DFTL (Demand-based FTL)
- FTL flash translation layer)
- FTL maintains a mapping table virtual ? physical
address
3Outline
- Introduction
- Background on FTL
- Design of DFTL
- Experimental Results
- Summary
4Basics of Flash Memory
- OOB (out-of-band) area
- ECC
- Logical page number
- State erased/valid/invalid
5Flash Translation Layer
- Maintain mapping
- Virtual address (exposed to upper level)?
physical address (on flash) - Use a small, fast SRAM for storing this mapping
- Hide erase operation to the above
- Avoiding in-place update
- Updating a clean page
- Performing garbage collection and erasure
- Note
- OOB has the physical ? virtual mapping
- FTL virtual ? physical mapping can be rebuilt (at
restart)
6Page-Level FTL
- Keep page to page mapping table
- Pro can map any logical page to any physical
page - Efficient flash page utilization
- Con mapping table is large
- E.g., 16GB flash, 2KB flash page, requires 32MB
SRAM - As flash size increases, SRAM size must scale
- Too expensive!
7Block-Level FTL
- Keep block to block mapping
- Pro small
- Mapping table size reduced by a factor of (block
size / page size) 64 times - Con page number offset within a block is fixed
- Garbage collection overheads grow
8Hybrid FTLs (a generic description)
LPN Logical Page Number
- Data blocks block-level mapping
- Log/update blocks page-level mapping
9Operations in Hybrid FTLs
- Update on data blocks write to log blocks
- Log region is small (e.g., 3 of total flash
size) - Garbage collection (gc)
- When no free log blocks are available, invoke gc
to merge log blocks with data blocks
10Full Merge can be Recursive thus Expensive
- Often resulted from random writes
11Outline
- Introduction
- Background on FTL
- Design of DFTL
- Experimental Results
- Summary
12DFTL Idea
- Avoid expensive full merges totally
- Do not use log blocks at all
- Idea
- Use page-level mapping
- Keep the full mapping on flash to reduce SRAM use
- Exploit temporal locality in workloads
- Dynamically load / unload page-level mappings
into SRAM
13DFTL Architecture
Global mapping table
14DFTL Address Translation
Case 1 request_LPN hits in cache mapping
table Done. Retrieve the mapping directly
Global mapping table
15DFTL Address Translation
Case 2 a miss in cache mapping table (CMT) If
(CMT is not full) then look up GDT
read the translation page fill in
CMT entry goto case 1
Global mapping table
16DFTL Address Translation
Case 3 a miss in cache mapping table (CMT) If
(CMT is full) then select CMT entry to evict
(LRU) write back dirty entry goto
case 2
Global mapping table
17Address Translation Cost
- Worst case cost (case 3)
- 2 translation page reads
- 1 translation page write
- Temporal locality
- More hits, fewer misses, fewer evictions
- CMT contains multiple mappings in a single
translation page - Batch updates
18Data Read
- Address translation LPN ? PPN
- Read the data page PPN
19Writes
- Current data block
- Updated data page is appended into current data
block - Current translation block
- Updated translation page is appended into current
translation block - Until number of free blocks lt GC_threshold
20Garbage Collection
15 Kawaguchi et al. 1995
21Garbage Collection
- If selected victim block is a translation block
- Copy valid page to a free translation block
- Update GTD (global translation directory)
- If selected victim block is a data block
- Copy valid page to a free data block
- Update the page-level translation for each data
block - Possibly update CMT entry (if so, done)
- Locate translation page, update it, change GTD
- Batch update opportunities if multiple page-level
translations are in the same translation page
22Benefits
- Page-level mapping
- No expensive full merge operations
- Better random write performance as a result
- But random writes are still worse than sequential
- more CMT misses, more translation page writes
- Data pages in a block are more scattered
- GC costs higher less opportunities for batch
updates
23Outline
- Introduction
- Background on FTL
- Design of DFTL
- Experimental Results
- Summary
24FTL Schemes Implemented
- FlashSim simulator
- The authors enhanced DiskSim
- Block-based FTL
- A state-of-the-art hybrid FTL (FAST FTL)
- DFTL
- An idealized page-based FTL
25Experimental Setup
- Model 32GB flash memory, 2KB page, 128KB block
- Timing is displayed in Table 1
26Traces Used in Experiments
27Block Erases
Baseline idealized page-level FTL
28Extra Read/Write Operations
63 CMT hits for financial
29Response Times (from tech report)
30CDF
31CDF
address translation overhead shows up
32CDF
FAST has a long tail
33Figure 10. Microscopic analysis
34Summary
- Demand-based page-level FTL
- Two-level page table
- (Flash) Translation page LPN to PPN entries
- (SRAM) Global translation directory translation
page entries - Mapping cache in SRAM