Conquest: Preparing for Life After Disks - PowerPoint PPT Presentation

About This Presentation
Title:

Conquest: Preparing for Life After Disks

Description:

[Riedel 1998; ZDNet 1999] 33. Second Design. A variant of extensible hash table for each directory ... [Riedel 1998; ZDNet 1999] 34. Additional Engineering Details ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 75
Provided by: andy241
Learn more at: https://lasr.cs.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: Conquest: Preparing for Life After Disks


1
Conquest Preparing for Life After Disks
  • CS239 Seminar
  • October 24, 2002
  • An-I Andy Wang
  • University of California, Los Angeles

2
Conquest Overview
  • File systems are optimized for disks
  • Performance problem
  • Complexity
  • Now we have tons of inexpensive RAM
  • What can we do with that RAM?

3
Conquest Approach
  • Combine disk and persistent RAM (e.g.,
    battery-backed RAM) in a novel way
  • Simplification
  • gt 20 fewer semicolons than ext2, reiserfs, and
    SGI XFS
  • Performance (under popular benchmarks)
  • 24 to 1900 faster than LRU disk caching

4
Outline of the Talk
  • Motivation
  • Conquest design (high level)
  • Conquest components
  • Performance evaluation
  • Conclusion

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

5
Motivation
  • Most file systems are built for disks
  • Problems with the disk assumption
  • Performance
  • Complexity

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

6
Hardware Evolution
CPU (50 /yr)
1 GHz
memory (50 /yr)
accesses per second (log scale)
1 MHz
1 KHz
disk (15 /yr)
1990
2000
1995
(1 sec 6 days)
(1 sec 3 months)
Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

7
Inside Pandoras Box
  • Disk arm
  • Disk platters
  • Access time seek time (disk arm)
  • rotational delay (disk platter)
  • transfer time

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

8
Disk Optimization Methods
  • Disk arm scheduling
  • Group information on disk
  • Disk readahead
  • Buffered writes
  • Disk caching
  • Data mirroring
  • Hardware parallelism

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

9
Complexity Bytes
Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

10
Storage Media Alternatives
/MB (log scale)
10-3
106
100
103
accesses/sec (log scale)
10-3
Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

11
Price Trend of Persistent RAM
102
101
/MB (log scale)
100
10-1
10-2
1995
2005
2000
year
Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

12
Old Order New World
  • Disk will stay around
  • Cost, capacity, power, heat
  • RAM as a viable storage alternative
  • PDAs, digital cameras, MP3 players
  • More architectural changes due to RAM
  • A big assumption change from disk
  • Rethink data structures, interfaces, applications

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

13
Getting a Fresh Start
  • What does it take to design and build a system
    that assumes ample persistent RAM as the primary
    storage medium?

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

14
Conquest Design
  • Design and build a disk/persistent-RAM hybrid
    file system
  • Deliver all file system services from memory,
    with the exception of high-capacity storage
  • Two separate data paths to memory and disk
  • Benefits
  • Simplicity
  • Performance

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

15
Simplicity
  • Remove disk-related complexities for most files
  • Make things simpler for disk as well
  • Less complexity
  • Fewer bugs
  • Easier maintenance
  • Shorter data paths

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

16
Performance
  • Overall
  • All management performed in memory
  • Memory data path
  • No disk-related overhead
  • Disk data path
  • Faster speed due to simpler access models

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

17
Conquest Components
  • Media management
  • Metadata representation
  • Directory service
  • Allocation service
  • Persistence support
  • Resiliency support

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

18
User Access Patterns
  • Small files
  • Take little space (10)
  • Represent most accesses (90)
  • Large files
  • Take most space
  • Mostly sequential accesses
  • Not characteristic of database applications

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

19
Files Stored in Persistent RAM
  • Small files (lt 1MB)
  • No seek time or rotational delays
  • Fast byte-level accesses
  • Contiguous allocation
  • Metadata
  • Fast synchronous update
  • No dual representations
  • Executables and shared libraries
  • In-place execution

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

20
Memory Data Path of Conquest
Conventional File Systems
storage requests
IO buffer management
IO buffer
persistence support
disk management
disk
Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

21
Large-File-Only Disk Storage
  • Allocate in big chunks
  • Lower access overhead
  • Reduced management overhead
  • No fragmentation management
  • No tricks for small files
  • Storing data in metadata
  • No elaborate data structures
  • Wrapping a balanced tree onto disk cylinders

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

22
Sequential-Access Large Files
  • Sequential disk accesses
  • Near-raw bandwidth
  • Well-defined readahead semantics
  • Read-mostly
  • Little synchronization overhead (between memory
    and disk)

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

23
Disk Data Path of Conquest
Conventional File Systems
Conquest Disk Data Path
storage requests
storage requests
IO buffer management
IO buffer management
IO buffer
battery-backed RAM
IO buffer
small file and metadata storage
persistence support
disk management
disk management
disk
disk
large-file-only file system
Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

24
Random-Access Large Files
  • Random access?
  • Common definition nonsequential access
  • A typical movie has 150 scene changes
  • MP3 stores the title at the end of the files
  • Near sequential access?
  • Simplifies large-file metadata representation
    significantly

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

25
Logical File Representation
File
Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

26
Physical File Representation
Name(s)
File
Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

27
Ext2 Data Representation
i-node
Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

28
Disadvantages with Ext2 Design
  • Designed for disk storage
  • Optimization for small files makes things complex
  • Random-access data structure for large files that
    are accessed mostly sequentially
  • Data access time dependent on the byte position
    in a file
  • Maximum file size is limited

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

29
Conquest Representation
  • Persistent RAM
  • Hash(file name) location of data
  • Offset(location of data)
  • Disk storage
  • Per-file, doubly linked list of disk block
    segments (stored in persistent RAM)

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

30
Advantages Conquest Design
  • Direct data access for in-core files
  • Worse case sequential memory search for random
    disk locations
  • Maximum file size limited by physical storage

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

31
Directory Service
  • Requirements
  • Fast sequential traversal (e.g., ls)
  • Fast random lookup (e.g., locate file x)
  • Hard links (apply multiple names to data)

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

32
First Design
  • A doubly hashed table for each directory
  • Conserves space
  • Problems
  • Dynamic resizing of directories
  • Need to handle the current file position
  • Important for rm -fr

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

33
Second Design
  • A variant of extensible hash table for each
    directory
  • An old data structure fits nicely

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

34
Additional Engineering Details
  • Popular hash functions randomize lower bits
  • Dynamic file positioning
  • Need to handle collisions
  • Memory overhead and complexity tradeoffs

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

35
Metadata Allocation
  • Requirements
  • Keep track of usage status of metadata entries
  • Avoid duplicate allocation with unique IDs
  • Fast retrieval of metadata with a given ID

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

36
Existing Memory Allocation
  • Services
  • Keep track of unallocated memory
  • No duplicate allocation of physical addresses
  • Hmm

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

37
Conquest Metadata Management
  • Metadata memory allocated by memory manager
  • Metadata ID physical address of metadata

ADDR 0xe000000 free
ADDR 0xe000038 in use
ADDR 0xe000070 free
ADDR 0xe0000A8 free
ADDR 0xe0000E0 free
ADDR 0xe000118 in use
Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

38
Persistence Support
  • Restore file system states after a reboot
  • Data
  • Metadata
  • Memory manager
  • Keep track of metadata allocation

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

39
Linux Memory Manager (1)
  • Page allocator maintains individual pages

Page allocator
Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

40
Linux Memory Manager (2)
  • Zone allocator allocates memory in power-of-two
    sizes

Page allocator
Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

41
Linux Memory Manager (3)
  • Slab allocator groups allocations by sizes to
    reduce internal memory fragmentation

Zone allocator
Page allocator
Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

42
Linux Memory Manager (4)
  • Difficult to restore the persistent states
  • Three layers of pointer-rich mappings
  • Mixing of persistent and temporary allocations

Slab allocator
Zone allocator
Page allocator
Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

43
Conquest Persistence
  • Create memory zones with own instantiations of
    memory managers

Slab allocator
Zone allocator
Page allocator
Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

44
Conquest Persistence
  • Encapsulate all pointers within each zone
  • Pointers can survive reboots
  • No serialization and deserialization
  • Swapping and paging
  • Disabled for Conquest memory zones
  • Enabled for non-Conquest zones

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

45
Resiliency Support
  • Instantaneous metadata commit
  • No fsck (ad hoc metadata consistency check)
  • Built-in checkpointing
  • Pointer-switch commit semantics

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

46
Implementation Status
  • Kernel module under Linux 2.4.2
  • Fully functional and POSIX compliant
  • Modified memory manager to support Conquest
    persistence
  • Need to overcome BIOS limitations for
    distribution
  • Looking for licensing opportunities

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

47
Performance Evaluation
  • Architectural simplification
  • Feature count
  • Performance improvement
  • Memory-only workload
  • Memory and disk workload

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

48
Conventional Data Path
  • Buffer allocation management
  • Buffer garbage collection
  • Data caching
  • Metadata caching
  • Predictive readahead
  • Write behind
  • Cache replacement
  • Metadata allocation
  • Metadata placement
  • Metadata translation
  • Disk layout
  • Fragmentation management

Conventional File Systems
storage requests
IO buffer management
IO buffer
persistence support
disk management
disk
Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

49
Memory Path of Conquest
  • Buffer allocation management
  • Buffer garbage collection
  • Data caching
  • Metadata caching
  • Predictive readahead
  • Write behind
  • Cache replacement
  • Metadata allocation
  • Metadata placement
  • Metadata translation
  • Disk layout
  • Fragmentation management

Conquest Memory Data Path
storage requests
Persistence support
battery-backed RAM
small file and metadata storage
  • Memory manager encapsulation

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

50
Disk Path of Conquest
  • Buffer allocation management
  • Buffer garbage collection
  • Data caching
  • Metadata caching
  • Predictive readahead
  • Write behind
  • Cache replacement
  • Metadata allocation
  • Metadata placement
  • Metadata translation
  • Disk layout
  • Fragmentation management

Conquest Disk Data Path
storage requests
IO buffer management
battery-backed RAM
IO buffer
small file and metadata storage
disk management
disk
large-file-only file system
Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

51
PostMark Benchmark (1)
  • Conquest is comparable to ramfs
  • At least 24 faster than the LRU disk cache
  • ISP workload (emails, web-based transactions)

40 to 250 MB working set with 2 GB physical RAM
Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

52
PostMark Benchmark (2)
  • When both memory and disk components are
    exercised, Conquest can be several times faster
    than ext2fs, reiserfs, and SGI XFS

10,000 files, 80 MB to 3.5 GB working set with 2
GB physical RAM
Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

53
PostMark Benchmark (3)
  • When working set gt RAM, Conquest is 1.4 to 2
    times faster than ext2fs, reiserfs, and SGI XFS

10,000 files, 80 MB to 3.5 GB working set with 2
GB physical RAM
Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

54
Sprite LFS Microbenchmarks (1)
  • Small-file benchmark
  • Operates on 10,000 1-KB files in three phases

Motivation Conquest Alternatives Conquest
Design Performance Evaluation Conclusion
55
Sprite LFS Microbenchmarks (2)
  • Modified large-file microbenchmark 10 1-MB
    files (Conquest in-core files)

Motivation Conquest Alternatives Conquest
Design Performance Evaluation Conclusion
56
Sprite LFS Microbenchmarks (3)
  • Modified large-file microbenchmark 10 1.01-MB
    files (Conquest on-disk files)

Motivation Conquest Alternatives Conquest
Design Performance Evaluation Conclusion
57
Sprite LFS Microbenchmarks (4)
  • Large-file microbenchmark 40 100-MB files
    (Conquest on-disk files)

Motivation Conquest Alternatives Conquest
Design Performance Evaluation Conclusion
58
Historys Mystery
  • Puzzling Microbenchmark Numbers

Geoffrey Kuenning If Conquest is slower than
ext2, I will toss you off of the balcony
Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

59
With me hanging off a balcony
  • Original large-file microbenchmark 1-MB file
    (Conquest in-core file)

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

60
Odd Microbenchmark Numbers
  • Why are random reads slower than sequential
    reads?

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

61
Odd Microbenchmark Numbers
  • Why are RAM-based file systems slower than
    disk-based file systems?

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

62
A Series of Hypotheses
  • Warm-up effect?
  • Maybe
  • Why do RAM-based systems warm up slower?
  • Bad initial states?
  • No
  • Pentium III streaming IO option?
  • No

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

63
Effects of Cache Footprint Sizes
  • Large cache footprint
  • Small cache footprint

footprint
footprint
Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

64
LFS Sprite Microbenchmarks
  • Modified large-file microbenchmark 10 1-MB
    files (Conquest in-core files)

faster random over sequential accesses due to
cache reuse
Motivation Conquest Alternatives Conquest
Design Performance Evaluation Conclusion
65
LFS Sprite Microbenchmarks (2)
  • Modified large-file microbenchmark 10 128-KB
    files (Conquest in-core files)

slower random over sequential accesses due to
the extra lseek
Motivation Conquest Alternatives Conquest
Design Performance Evaluation Conclusion
66
Lessons Learned
  • Faster than LRU caching, unexpected
  • Heavyweight disk handling
  • Severe penalty for accessing memory content
  • Matching user access patterns to storage media
    offers considerable simplification and better
    performance
  • Not an automatic result
  • Need careful design

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

67
More Lessons Learned
  • Effects of L2 caching become highly visible in
    memory workloads (modern workloads)
  • Cannot blindly apply existing disk-based
    microbenchmarks to measure memory performance of
    file systems
  • Need to consider states of L2 cache and memory
    behaviors at each stage of microbenchmarking

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

68
Additional Lessons Learned
  • Dont discuss your performance numbers next to a
    balconyunless

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

69
Related Work (1)
  • Disk caching
  • Assumption of scarce memory
  • Complex mechanisms to maintain consistency
  • Especially with the presence of metadata
  • RAM drives and RAM file systems
  • Not meant to be persistent
  • Use disk-related mechanisms
  • Limitations on storage capacity

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

70
Related Work (2)
  • Disk emulators
  • RAM storage accessed through SCSI interface
  • Ad hoc approaches
  • Manual transferring of files to and from ramfs
  • Capacity limitation
  • Background daemon to stage RAM files to a disk
  • Semantic and name space problems

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

71
Going Beyond Conquest (1)
  • Matching usage patterns with heterogeneous
    machines in the distributed domain
  • Specialized tasks for machines within a cluster
  • Preferably self-organizing and self-evolving
  • State-rich computing
  • Caching of runtime data structures
  • Similar to /tmp

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

72
Going Beyond Conquest (2)
  • Separate storage of metadata from data
  • Association of metadata with data of different
    fidelity
  • Opportunity for hierarchical replication across
    devices with different calibers
  • Benchmarking memory performance of file systems
  • Developing new memory benchmarks

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

73
Contributions
  • Demonstrated the feasibility of disk-memory
    hybrid file systems
  • Showed performance does not preclude simplicity
  • Pinpointed cache-related problems with modern
    benchmarks
  • Opened doors to many exciting areas of research

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

74
Conclusion
  • Conquest demonstrates how rethinking changes in
    underlying assumptions can lead to significant
    architectural and performance improvements
  • Radical changes in hardware, applications, and
    user expectations in the past decade should lead
    us to rethink other aspects of OS as well.

Motivation Conquest Design Conquest
Components Performance Evaluation Conclusion

75
Questions . . .
Conquest http//lasr.cs.ucla.edu/conquest Andy
Wang awang_at_cs.ucla.edu
Write a Comment
User Comments (0)
About PowerShow.com