Title: File Management
1File Management
UNIVERSITY of WISCONSIN-MADISONComputer Sciences
Department
CS 537Introduction to Operating Systems
Andrea C. Arpaci-DusseauRemzi H.
Arpaci-Dusseau Haryadi S. Gunawi
All file management strategies in the last 30
years Advantages and disadvantages
2Recap
Naming and Directories
- FS/Disk (bottom-up approach)
- Low-level disk management (last lecture)
- File management (today)
- Naming and directories (next lecture)
- Common theme
- I/O is much slower than processor and memory
- CPU speed improve 2x annually
- Seek time improve 1.2x annually
- Amdahls Law If continually improve only part of
application (e.g., processing), then achieve
diminishing returns in speedup - A process CPU-bound and I/O-bound
- Performance improvement for the CPU-bound part
will saturate if I/O-bound part is not improved - Tuning I/O performance is crucial
File management
Disk management
3Recap (LBN)
- A disk is commonly divided into several
partitions - In Linux, type df h /dev/hda1, /dev/hda2,
/dev/hda3, (for IDE) - In Linux, type df h /dev/sda1, /dev/sda2,
/dev/sda3, (for SCSI) - In Windows C, D, E, F
- Each partition is mapped to a region on the disk
- Each disk partition starts at LBN 0
- Max LBN partition size / block size
- Common block size 4 KB (8 sectors)
/dev/hda1
/dev/hda2
LBN
0
1
2
3
4
..
..
..
LBN
0
1
2
3
4
..
..
..
4emperor11 df h Filesystem Size
Used Avail Use Mounted on /dev/hda1
996M 453M 492M 48 / /dev/hda8
14G 171M 14G 2 /tmp /dev/hda5
3.9G 584M 3.2G 16 /var /dev/hda9
996M 710M 235M 76 /var/vice/cache /dev/hda7
996M 34M 911M 4
/var/tmp /dev/hda3 4.9G 143M 4.5G
4 /var/home /dev/hda2 9.7G 5.4G
3.9G 59 /usr tmpfs 1013M 0
1013M 0 /dev/shm AFS 8.6G
0 8.6G 0 /afs
emperor11 mount /dev/hda1 on /
type ext3 (rw) /dev/hda8 on /tmp type
ext3 (rw) /dev/hda5 on /var type ext3
(rw) /dev/hda9 on /var/vice/cache type ext3
(rw) /dev/hda7 on /var/tmp type ext3
(rw) /dev/hda3 on /var/home type ext3
(rw) /dev/hda2 on /usr type ext3
(rw) tmpfs on /dev/shm type tmpfs
(rw) AFS on /afs type afs (rw)
5Layers
- Human
- Jump to slide 20 of /tmp/slides.ppt ? Random
access - Say /tmp/ is mounted on /dev/hda4
- Powerpoint application
- Convert slide 20 to byte offset (e.g. 20000-th
byte) - System call
- read(/tmp/slides.ppt, byte offset 20000)
- File System
- Get the file information of /tmp/slides.ppt ?
inode 76 - Convert byte offset into block offset in a file
(e.g. block offset 20) - Get the block number at the block offset 20
(e.g. block number 6543) - To block layer read logical block number 6543
(logical wrt this partition /dev/hda4) - Block layer
- Converts LBN 6543 of /dev/hda4 to disk sector
- Block layer to device driver read sector
Today
6Workloads
- Motivation Workloads influence design of file
system (e.g. file management) - File characteristics (measurements of UNIX and
NT) - Most files are small (about 8KB)
- Most of the disk is allocated to large files
- (90 of data is in 10 of files)
- Access patterns
- Sequential Data in file is read/written in order
- Most common access pattern
- Random (direct) Access block without referencing
predecessors - Difficult to optimize
7Allocation Strategies
- How an inode manages its data blocks?
- Progression of different approaches
- Contiguous Allocation
- Extent-based
- Linked
- File-allocation Tables
- Indexed
- Multi-level Indexed
- Questions
- Inode simplicity?
- Inode space overhead (Wasted space for pointers
to data blocks)? - Amount of fragmentation (internal and external)?
- Ability to grow file over time?
- Seek cost for sequential accesses?
- Speed to find data blocks for random accesses?
8Contiguous Allocation
- Allocate each file to contiguous blocks
(contiguous LBNs) on disk - Example IBM OS/360 (30 years ago)
- Inode stores starting block (base) and size of
file - Inode A (base 2, size 3), Inode B (base 6,
size 4), Inode C (base 10, size 3) - Inode D (base 100, size 50), Read block
offset 20 ? read LBN 120 (base block offset) - OS allocates by finding sufficient free space
- Must predict future size of file Should space be
reserved?
- Inode simplicity?
- Simple, and very little overhead for storing base
and size - Inode space overhead (Wasted space for pointers
to data blocks)? - None only store two variables.
- Amount of external fragmentation?
- Horriblle external fragmentation (requires
periodic compaction ? undesirable) - Ability to grow file over time?
- Bad, may not be able grow file without moving it
- Seek cost for sequential accesses?
- Excellent performance because a file is always in
contiguous blocks - Speed to find data blocks for random accesses?
- Fast because its simple to calculate random
addresses (base block offset)
9Extent-Based Allocation
- Allocate multiple contiguous regions (extents)
per file - Meta-data stores a small array (2-6 entries)
- Each entry designates an extent
- Each entry starting block (base) and size
- Inode D Extent0 base0, size2, Extent1
base5, size1 - Inode B Extent0 base6, size4, Extent1
base13,size2
D
A
A
A
B
B
B
B
C
C
C
B
B
D
D
- Inode simplicity?
- Simple
- Inode space overhead (Wasted space for pointers
to data blocks)? - Yes, if we have large number of entries but the
file is small (only use the first couple of
entries) - Amount of external fragmentation?
- Helps with external fragmentation, but external
fragmentation can still be a problem - Ability to grow file over time?
- File can grow over time (until run out of
extents) - Seek cost for sequential accesses?
- Very good performance for sequential accesses
- An extent consists of contiguous blocks
- Speed to find data blocks for random accesses?
- Simple to calculate random addresses
10Linked Allocation
- Allocate linked-list of fixed-sized blocks
- Examples DEC TOPS-10 (1960s), Xerox Alto (First
PC early 1970s) - Inode stores the location of first block of file
(base) - Each block also contains pointer to next block
- Inode D base 0
- Block0 ? Block1 ? Block5 ?Block15 ?
Block17
- Inode simplicity?
- Simple (just store the head of the list)
- Inode space overhead (Wasted space for pointers
to data blocks)? - None.
- Amount of external fragmentation?
- None. Because a file could be in non-contiguous
blocks. - Ability to grow file over time?
- Files can be easily grown with no limit
- Seek cost for sequential accesses?
- Sequential access may not be good, especially if
the data blocks are scattered - Try to allocate blocks of file contiguously for
best performance - Speed to find data blocks for random accesses?
- Horrible. Cannot calculate random addresses w/o
reading previous blocks - i.e. read the n-th block of a file ? must perform
N I/Os to get to the n-th block
11File-Allocation Table (FAT)
- Variation of Linked allocation
- Example DOS (Gates and McDonald) in 1970s
- Keep linked-list information for all files in
on-disk FAT table - Cache FAT in main memory
- An inode stores the location of first block of
file (base) - Example
- Inode for file D starts at 0 (Ds blocks 0 1 5
15 17) - Inode for file A starts at 2 (As blocks 2 3 4)
- Comparison to Linked Allocation
- Same basic advantages and disadvantages
- Disadvantage Read from two disk locations for
every data read - Advantage Greatly improves random accesses
- (read from cached FAT table)
12Indexed Allocation
- Allocate fixed-sized blocks for each file
- Inode stores fixed-sized array of block pointers
- Allocate space for ptrs at file creation time
(e.g. 5 pointers max) - Inode D array contains 0, 1, 5, 15, 17
- Inode A array contains 2, 3, 4, x, x
D
A
A
A
B
B
B
B
C
C
C
B
B
D
D
D
D
B
- Inode simplicity?
- Simple (just an array of direct pointers which
point to the data blocks) - Inode space overhead (Wasted space for pointers
to data blocks)? - Large overhead. Waste space for unneeded pointers
(but most files are small) - Amount of external fragmentation?
- None. Because a file could be in non-contiguous
blocks. - Ability to grow file over time?
- Files can be easily grown with the limit of the
array size - Seek cost for sequential accesses?
- Sequential access may not be good because data
blocks could be scattered - Try to allocate blocks of file contiguously for
best performance - Speed to find data blocks for random accesses?
- Very fast read the block offset N is basicall
read LBN stored in ArrayN
13Multi-Level Indexed Files
- Principle Add another level of indirection
- Variation of Indexed Allocation
- Examples UNIX FFS-based file systems, Linux ext2
- Dynamically allocate hierarchy of pointers to
blocks as needed - Inode small number of pointers allocated
statically - 10 direct pointers (point to data blocks)
- 1 indirect pointer (points to an indirect block)
- If a block is 4 KB, and the size of a pointer is
4 B, we could have 1024 pointers in a block.
Hence the indirect block could point to at most
1024 data blocks - 1 double indirect pointer (points to a double
indirect block) - The double indirect points to 1024 indirect blocks
doubleindirect
indirect
indirect
tripleindirect
14Multi-level Indexed Files
- Examples (how many I/Os?)
- Read block offset 4
- Read the LBN stored in Inode.Direct4
- Read block offset 23 ( 10 13)
- 2 I/Os (read the indirect block, and then the
data block) - Read the indirect block pointed from the inode
- Read the direct pointer stored at entry 13 (i.e.
IndirectBlock13) - Read block offset 23, and then read block offset
33 - 2 I/Os for the first read, then indirect block is
cached, 1 I/O for the second read - Read block offset 4000 10 1024 1024 1024
918 - 3 I/Os
- Read the double indirect block from the inode
structure - Read the 3rd indirect pointer in the double
indirect block (i.e. DoubleIndirectBlock2) ? we
get an indirect block - Read the direct pointer at entry 918 (i.e.
IndirectBlock918) to get the data block - Comparison to Indexed Allocation
- Advantage Does not waste space for unneeded
pointers - Still fast access for small files
- Disadvantage Need to read indirect blocks of
pointers to calculate addresses (extra disk read) - Keep indirect blocks cached in main memory
15Multi-level Indexed (Observations)
- What is the largest file we can create?
- A block is 4 KB
- 10243 10242 1024 10 pointers to data block
1G pointers - Total 1 G x 4 KB 4 TB
- A block is 512 B (1/2 KB)
- 1283 1282 128 10 pointers 2M pointers
- Total 2 M ½ KB 1 GB
- Block size is an important parameter!
- Competing goals
- Block size is too big, we have internal
fragmentation - Block size is too small, the largest file size is
lowered - Inodes complexity
- Inode keeps 10 direct pointers, 1 indirect
pointer, 1 double indirect pointer, 1 triple
indirect pointer - This is quite messy! Why not maintain a pure tree
(i.e. just keep the triple indirect pointer at
the inode)? - Read block offset 0 of a file will take 4 I/Os!
Bad! - Read the triple, double indirect block, indirect
block, and finally the data block - But lots of files are small ? means we will
access small block offset a lot - Principle Make the common case fast, even if
data structure becomes a little bit messy (hence,
the multi-level indexed was designed)