Title: Design of FlashBased DBMS: An InPage Logging Approach
1Design of Flash-Based DBMS An In-Page Logging
Approach
SIGMOD07
- Bongki Moon
- Department of Computer Science
- University of Arizona
- Tucson, AZ 85721, U.S.A.
- bkmoon_at_cs.arizona.edu
Sang-Won Lee School of Info Comm Eng
Sungkyunkwan University Suwon, Korea
440-746 wonlee_at_ece.skku.ac.kr
Presented By Yinan LI _at_ HKUST
2Outline
- Flash memory
- Disk-Based DBMS on Flash Memory
- Flash-Based DBMS In-Paging Logging approach
- My reviews
3Flash Memory
- Flash memory is a type of electrically-erasable
programmable read-only memory (EEPROM) - Page is the unit of read and write operations
- Typical value 2KB
- Write operation can only clear bits (change their
value from 1 to 0). - The only way to change value from 0 to 1 is erase
an entire region memory. - This region has fixed-size, called erase units,
erase block or just block. - Typical value 128KB for large flash memory
4Characteristics of Flash
- No in-place update
- the data item need to be erased first before
writing it again. - An erase unit (16KB or 128 KB) is much larger
than a sector. - No mechanical latency
- Flash memory is an electronic device without
moving parts - Provides uniform random access speed without
seek/rotational latency - Asymmetric read write speed
- Read speed is typically at least twice faster
than write speed - Write (and erase) optimization is critical
5Magnetic Disk vs Flash Memory
- Magnetic Disk Seagate Barracuda 7200.7
ST380011A - NAND Flash Samsung K9WAG08U1A 16 Gbits SLC NAND
- Unit of read/write 2KB, Unit of erase 128KB
6Category of Flash Memory
- NAND Vs. NOR Flash
- NOR high erase cost (several seconds), directly
addressable (access is by bit or byte) - NAND relative low erase cost (several ms),
access is by pages - MLC Vs. SLC NAND Flash
- MLC (Multiple Level Cell) it stores multiple
bits per cell, but significantly slower read and
write speeds 10x lower read/write lifetime - SLC (Single Level Cell) it stores only one
single bit per cell SLC flash has much better
performance, lifetime, and reliability properties
than MLC
7Small flash Vs Large flash
- Small flash memory was widely used for PDA, MP3,
mobile phone, sensor network - Advantages size, weight, shock resistance, power
consumption, noise - Typical size a few gigabytes
- Recently, some vendors develop large flash memory
called Flash SSD (Solid State Disk) - Mainly used for notebook PC. Apple AirBook /
Thinkpad X300 - Typical size gt 16G
8NAND flash system architecture
- Flash Translation Layer (FTL) software layer to
make NAND flash fully emulate magnetic disks. - Logical-to-physical mapping
- Garbage collection
- Power-off recovery
- Wear-leveling
- Bad block management
- Error correction code (ECC)
- Power management
9Different FTLs for Large and Small Flash Memory
- Page-mapping FTL (used for small flash memory)
- maintains the mapping information between the
logical page and the physical page separately - Log-structured achitecture
- Large memory for its mapping information
- must be reconstructed by scanning the whole flash
memory at start-up, and this may result in long
mount time - Block-mapping FTL
- Small memory for its mapping information
- Any update causes a whole block rewrite (that is
why random writes are so slow!) - In real production, there are some optimizations
for improving concentrated updates
10Flash memory for server application
- More recently, because of the advantages of flash
memory and the increasing capacity, there is a
new trend that use large flash memory for
database server application - Jim Gray said
- Tape is Dead,Disk is Tape,Flash is Disk!
11Outline
- Flash memory
- Disk-Based DBMS on Flash Memory
- Flash-Based DBMS In-Paging Logging approach
- My reviews
12Disk-Based DBMS on Flash Memory
- What happens if disk-based DBMS runs on Flash
memory? - Due to No In-place Update, it writes the whole
block into another clean block - Consume free blocks quickly causing frequent
garbage collection and erase
SQL Update / Insert / Delete
Update
Buffer Mgr.
Page 4KB
Erase Unit 128KB
Dirty Block Write
Flash Memory
Data Block Area
13Disk-Based DBMS Performance
- Run SQL queries on a commercial DBMS
- Sequential scan or update of a table
- Non-sequential read or update of a table (via
B-tree index) - Experimental settings
- Storage Magnetic disk vs M-Tron SSD (Samsung
flash chip) - Data page of 8KB
- 10 tuples per page, 640,000 tuples in a table
(64,000 pages, 512MB)
14Disk-Based DBMS Performance
- Read performance The result is not surprising
at all
- Hard disk
- Read performance is poor for non-sequential
accesses, mainly because of seek and rotational
latency - Flash memory
- Read performance is insensitive to access patterns
15Disk-Based DBMS Performance
- Hard disk
- Write performance is poor for non-sequential
accesses, mainly because of seek and rotational
latency - Flash memory
- Write performance is poor (worse than disk) for
non-sequential accesses due to out-of-place
update and erase operations - Demonstrate the need of write optimization for
DBMS running on Flash
16Outline
- Flash memory
- Disk-Based DBMS on Flash Memory
- Flash-Based DBMS In-Paging Logging approach
- My reviews
17In-Page Logging (IPL) Approach
- Design Principles
- Take advantage of the characteristics of flash
memory - Fast read speed
- Overcome the erase-before-write limitation of
flash memory - Minimize the changes to the DBMS architecture
- Limited to buffer manager and storage manager
18Design of the IPL
- Logging on Per-Page basis in both Memory and Flash
- An In-memory log sector can be associated with a
buffer frame in memory - Allocated on demand when a page becomes dirty
- An In-flash log segment is allocated in each
erase unit
update-in-place
Database Buffer
in-memory data page (8KB)
in-memory log sector (512B)
Flash Memory
15 data pages (8KB each)
Erase unit 128KB
log area (8KB) 16 sectors
.
.
The log area is shared by all the data pages in
an erase unit
19IPL Write
- Whenever an update is performed on a data page,
the in-memory copy of data page is update
immediately. In addition, IPL buffer manager
adds a log record to the in-memory log sector - When a dirty page is evicted by replacement
policy or the in-memory log sector is full, - the content of data page is not written to flash
memory. - Insteadly, In-memory log sector is written to
the in-flash log segment
update-in-place
Update / Insert / Delete
Sector 512B
physiological log
Page 8KB
Buffer Mgr.
Block 128KB
Flash Memory
Data Block Area
20IPL Read
- When a page is read from flash, the current
version is computed on the fly
Pi
Apply the physiological action to the copy read
from Flash (CPU overhead)
Buffer Mgr.
Re-construct the current in-memory copy
- Read from Flash
- Original copy of Pi
- All log records belonging to Pi (IO overhead)
data area (120KB) 15 pages
Flash Memory
log area (8KB) 16 sectors
21IPL Merge
- When all free log sectors in an erase unit are
consumed - Log records are applied to the corresponding data
pages - The current data pages are copied into a new
erase unit
22Why IPL can improve write performance of DBMS?
- The number of disk writes doesnt decrease
- Actually, wirtes may increase because
- It introduce excess disk writes if the log sector
is full - Merge operation introduce overhead
- Then why IPL can improve write performance?
- IPL overcome the erase-before-write property of
flash - Reduce the number of erase
23IPL Simulation with TPC-C
- TPC-C Log Data Generation
- Run a commercial DBMS to generate reference
streams of TPC-C benchmark - HammerOra utility used for TPC-C workload
generation - Each trace contains log records of physiological
updates as well as physical page writes - Average length of a log record 20 50B
- TPC-C Traces
- 100M.20M.10u 100MB DB, 20 MB buffer, 10
simulated users - 1G.20M.100u 1GB DB, 20 MB buffer, 100 simulated
users - 1G.40M.100u 1GB DB, 40 MB buffer, 100 simulated
users - Parameter setting
- Write (2KB) 200 us
- Merge (128KB) 20 ms
24Log Segment Size vs Merges
- TPC-C Write frequencies are highly skewed (and
low temporal locality) - Erase units containing hot pages consume log
sectors quickly - Could cause a large number of erase operations
- More storage but less frequent merges with more
log sectors
COMPUTER SCIENCE DEPARTMENT
25Estimated Write Performance
- Performance trend with varying buffer sizes
- The size of log segment was fixed at 8KB
- Estimated write time
- With IPL ( of sector writes) 200us ( of
merges) 20ms - Without IPL ? ( of page writes) 20ms
- ? is the probability that a page write causes
erase operation
26Support for Recovery
- IPL helps realize a lean recovery mechanism
- Additional logging transaction log and list of
dirty pages - Transaction Commit
- Similarly to flushing log tail
- An in-memory log sector is forced out to flash if
it contains at least one log record of a
committing transaction - No explicit REDO action required at system
restart - Transaction Abort
- De-apply the log records of an aborting
transaction - Use selective merge instead of regular merge,
because its irreversible - If committed, merge the log record
- If aborted, discard the log record
- If active, carry over the log record to a new
erase unit - To avoid a thrashing behavior, allow an erase
unit to have overflow log sectors - No explicit UNDO action required
27Conclusion
- Clear and present evidence that Flash can replace
Disk - IPL approach demonstrates its potential for TPC-C
type database applications by - Overcoming the erase-before-write limitation
- Exploiting the fast and uniform random access
- IPL also helps realize a lean recovery mechanism
28Outline
- Flash memory
- Disk-Based DBMS on Flash Memory
- Flash-Based DBMS In-Paging Logging approach
- My reviews
29Reviews
- IPL hurts read performance
- For each read operation, it has to read data page
and log sector page - Read performance will be about 2X slower
- No Experiment Result
- The authors only give the result through the I/O
access simulation - Simulation
- The data size of simulation is too small (1G).
- Didnt show the overall performance of TPC-C.
(most operations in TPC-C are read operations)
30Any Questions?