File System Part 2 - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

File System Part 2

Description:

Working Directory and Tilde. Ideas. Cumbersome to write full path ... Tilde. Not a feature of the file system (chdir(~mjordan) fail! ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 19

Provided by: andreaarpa

Category:

more less

Transcript and Presenter's Notes

Title: File System Part 2

1
File System (Part 2)
UNIVERSITY of WISCONSIN-MADISONComputer Sciences
Department
CS 537Introduction to Operating Systems
Andrea C. Arpaci-DusseauRemzi H.
Arpaci-Dusseau Haryadi S. Gunawi

Directories and Naming
From pathname to inode
Tree-structure Directory hierarchy
Block and Inode allocation
Mechanism (bitmaps) and policy
More optimizations

2
Layers

Human
Jump to slide 20 of /tmp/slides.ppt ? Random
access
Say /tmp/ is mounted on /dev/hda4
Powerpoint application
Convert slide 20 to byte offset (e.g. 20000-th
byte)
System call
read(/tmp/slides.ppt, byte offset 20000)
File System
Get the file information of /tmp/slides.ppt ?
inode 76
Convert byte offset into block offset in a file
(e.g. block offset 20)
Get the block number at the block offset 20
(e.g. block number 6543)
To block layer read logical block number 6543
(logical wrt this partition /dev/hda4)
Block layer
Converts LBN 6543 of /dev/hda4 to disk sector
Block layer to device driver read sector

Today
3
Abstraction File

User view
Named collection of bytes
Untyped or typed
Examples text, source, object, executables,
application-specific
Permanently and conveniently available
Operating system view
Map bytes as collection of blocks on physical
non-volatile storage device
Magnetic disks, tapes, NVRAM, battery-backed RAM
Persistent across reboots and power failures
File Meta-data (or Inode) Additional system
information associated with each file
Name of file
Type of file (e.g. directory, regular file,
device file, symbolic link, etc.)
Pointer to data blocks on disk
File size
Times Creation, access, modification
Owner and group id
Protection bits (read or write)
Inode is stored on disk
Conceptually meta-data can be stored as array on
disk

4
Abstraction Directories

Organization technique Map file name to blocks
of file data on disk
Actually, map file name to file inode (which
enables one to find data on disk)
/tmp/slides.ppt ? inode 76
Old (bad) approaches
Single-level directory
Each file has unique name
Special part of disk holds directory listing
Two-level directory
Directory for each user
Specify file with user name and file name

5
Directories Tree-Structured

Directory in Linux file system
Directory is stored and treated like a file
Special bit set in meta-data for directories
User programs can read directories
Only system programs can write directories
Director listing is stored in the files data
blocks in the form of ltinode, namegt
name can be of arbitrary length (there is a max
limit)
Array of inodes is stored in a fixed location on
the disk
Special directories
Root Fixed index for meta-data (e.g., 2)
This directory .
Parent directory ..

6
Absolute path to inode
inode
data blocks
Inode Type DIR Ino 2 direct0 direct1
2 . 2 .. 50 a 51 tmp 52 boot
Inode Type DIR Ino 50 direct0 direct1

50 .
2 ..
b

Example read(/a/b/c) // read some bytes from c
Read inode 2, look for a find lta, 50gt
Read inode 50, look for b find ltb, 900gt
Read inode 900, look for c find ltc, 2000gt
Read inode 2000, and get the data block
Total 8 I/Os (roughly)
(if inodes are not in memory initially, and all
the inodes needed in this example are stored in
different blocks)

7
Working Directory and Tilde

Ideas
Cumbersome to write full path
Bad for OS if you always give the full path
Example current working directory lecture
open(/mylife/college/courses/cs/537/lecture/14/not
es)
OS must traverse the directory hierarchy every
time ? lots of I/Os
If use cwd, the parent paths (e.g.
/mylife/college/courses/cs/537) do not have to be
in memory
Store cwd in PCB
PCB stores the inode number of the cwd
read(notes) ? generally speaking, only 6 I/Os
Read lectures inode, lectures dir data block
(to get inode for folder 14)
Read 14s inode, 14s dir data block (to get
inode for notes)
Read notes inode, and notes data block
Tilde
Not a feature of the file system
(chdir(mjordan) ? fail!)
Look at entry mjordan in /etc/password file to
get the absolute path name

8
Acyclic-Graph Directories

More general than tree structure
Add connections across the tree (no cycles)
Create links from one file (or directory) to
another
Hard link ln a b (a must exist already)
Idea Can use name a or b to get to same file
data
Implementation Multiple directory entries point
to same inode
What happens when you remove a? Does b still
exist?
Yes
How is this feature implemented? A counter in the
inode
Unix Does not create hard links to directories.
Why?
Theres only one entry for ..
Please search for file A will never end
Due to cyclic-graph

9
Acyclic-Graph Directories

Symbolic (soft) link
Some kind of shortcut
ln -s /a/very/long/directory/path/
/shortpath
shortpath
is a special file (designated by bit in
meta-data)
Content of shortpaths data block is
/a/very/long/director/path
Optimization put the pathname in the array that
is supposedly used for pointing to data blocks
and indirect blocks
Only if the pathname is short
To access the pathname, no need to read a data
block
ls l /shortpath
shortpath ? /a/very/long/directory/path/

10
Bitmaps (space management)

How do know which blocks are free/allocated?
Old days
Free list (like in Project 3)
Poor freelist organization
Consecutive file blocks not close together
Pay seek cost between even sequential disk
transfers
Today bitmaps
1 bit for each block
Very small space overhead 1 bit / 4 KB (0.003)
400 GB file system 100 Mbits 12.5 MB
Automatically merge adjacent free blocks
Bitmap vs. Free list
Bitmap is useful for space management for
fixed-size chunks
The smallest allocation unit is large (e.g. a 4KB
block) ? small space overhead
Number of requested units is small ? scan bitmap
is fast
Free list is useful for variable-size chunks
The smallest allocation unit is small (e.g. in
Project 3, the unit is 4 bytes) ? high space
overhead
Number of requested units can be large (e.g.
Alloc(1000 bytes)) ? scan bitmap is slow

11
Allocation Policy (1)

Problems
create file.txt in /tmp, which inode should
represent file.txt?
give me 10 new data blocks, which data blocks
to give?
A good policy needs to match with the common
workload
Workloads influence design of file system
File characteristics (measurements of UNIX and
NT)
Most files are small (about 8KB)
Most of the disk is allocated to large files
(90 of data is in 10 of files)
Access patterns
Sequential Data in file is read/written in order
Most common access pattern
Random (direct) Access block without referencing
predecessors
Difficult to optimize
Access files in same directory together
Spatial locality
Access meta-data when access file data
Need meta-data to find data

12
Allocation Policy (2)

Old days Unorganized free list
Initially files have contiguous data blocks
But FS is long-lived entities
No locality in allocation to disk
At the end, data are scattered
Inodes far from data blocks
Pay long seek for every data transfer
I-nodes of files in directory not close together
Pay seek for every inode (e.g., ls -l)

13
Allocation Policy (3)

Implications locality-driven policy
Large files should be allocated sequentially
Files in same directory should be allocated near
each other
Data should be allocated near its meta-data
Spread unrelated data far apart
Leaves room for related things to be placed
together
Keep freespace on disk
Always find free block nearby
90 rule of thumb

14
BSD/Linux Solution Cylinder Groups

Divide disk into cylinder groups
Set of adjacent cylinders
Little seek time between cylinders in same group
Each cylinder groups contains
Superblock (contains information about where
Vary offset within each cylinder group for
reliability
Otherwise all superblock copies will be on the
same platter
Inodes
Fixed number per cylinder group
Bitmap of free blocks
Usage summary for high-level allocation policy
Data blocks
Cylinder groups vs. 1 cylinder group
Compare this approach with the default approach
where array of inodes is stored in the beginning
of the disk
Lots of seeks! Because you need to read inode and
its data (in both cases reading/writing
directories/regular files)
Inode is in the beginning part of the disk and
data could be in any part of the disk

15
Solution to Achieving Locality

Maintain locality of each file
Allocate runs of blocks within a cylinder group
Maintain locality of files and inodes in a
directory
Problem Create a file file.txt inside
directory dir, where should file.txt be
located?
Keep files in a directory in same cylinder group
Make room for locality within a directory
Spread out directories among the cylinders groups
A new directory is placed in a cylinder group
that has a greater than average of free inodes,
and the smallest of directories
The files in the directory will be placed in the
same cylinder group
Goal inode clustering policies to succeed most
of the time
Switch to a different cylinder group for large
files
After 48KB and every 1MB thereafter
Prevent one file from filling a cylinder group

16
Layout Global vs. Local

Decompose allocation into two steps
Global Heuristics for allocate filesdirectories
to cylinder groups
Pick optimal next block for allocation
Example
A file is in cg10, and its first data block is
in block 1001
A user appends the file such that a second data
block must be added
Heuristics put 2nd data block in cg10, and
block 1002
Local Handles request for specific block
If free, use it, else
Find a block (or some blocks) within cylinder
group (e.g. cg10)
Rehash on cylinder group to choose another group
Exhaustive search

17
More optimizations

Preallocation
Problem append(f1), append(f2), append(f1),
append(f2)
Preallocates disk data blocks before they are
actually used
E.g. block preallocation 8
If the file is closed, preallocation is released
Why work?
Common workloads when creating a file, it tends
to grow within the near future as long as file is
not closed
Readahead
disk_read(block 1001)
8 block readahead reads 1001 - 1008
Why useful? Common workload sequential access