File System

About This Presentation

Title:

File System

Description:

easy to find free block groups (small bitmap) free areas merge automatically ... bitmap for free blocks. directory. file header. indirect blocks. data blocks ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 66

Provided by: steve1797

Category:

more less

Transcript and Presenter's Notes

Title: File System

1
File System
2
Long-term Information Storage

Must store large amounts of data
Information stored must survive the termination
of the process using it
Multiple processes must be able to access the
information concurrently

3
File Structure

Three kinds of files
byte sequence
record sequence
tree

4
File Types

(a) An executable file (b) An archive

5
File Access

Sequential access
read all bytes/records from the beginning
cannot jump around, could rewind or back up
convenient when medium was mag tape
Random access
bytes/records read in any order
essential for data base systems
read can be
move file marker (seek), then read or
read and then move file marker

6
Memory-Mapped Files

map() and unmap()
map a file onto a portion of the address space
read( ) and write( ) are replaced with memory
operations
implementation
page tables map the file like ordinary pages
same sharing/protection as pages
issues
interaction between file system and VM when two
processes access the same file via different
methods

7
File System Implementation

A possible file system layout

8
Implementing Files (1)

(a) Contiguous allocation of disk space for 7
files
(b) State of the disk after files D and E have
been removed

9
Implementing Files (2)

Storing a file as a linked list of disk blocks

10
Implementing Files (3)

Linked list allocation using a file allocation
table in RAM

11
Implementing Files (4)

An example i-node

12
Implementing Directories (1)

(a) A simple directory
fixed size entries
disk addresses and attributes in directory entry
(b) Directory in which each entry just refers to
an i-node

13
Implementing Directories (2)

Two ways of handling long file names in directory
(a) In-line
(b) In a heap

14
Shared Files (1)

File system containing a shared file

15
Shared Files (2)

(a) Situation prior to linking
(b) After the link is created
(c)After the original owner removes the file

16
Disk Space Management (1)
Block size

Dark line (left hand scale) gives data rate of a
disk
Dotted line (right hand scale) gives disk space
efficiency
All files 2KB

17
Disk Space Management (2)

(a) Storing the free list on a linked list
(b) A bit map

18
Disk Space Management (3)

(a) Almost-full block of pointers to free disk
blocks in RAM
- three blocks of pointers on disk
(b) Result of freeing a 3-block file
(c) Alternative strategy for handling 3 free
blocks
- shaded entries are pointers to free disk blocks

19
Disk Space Management (4)

Quotas for keeping track of each users disk use

20
fsck() - blocks

File system states
(a) consistent
(b) missing block 2 put it to the free list
(c) duplicate block in free list 4 happens only
in linked list
(d) duplicate data block 5 a delete will set
the block as used and free
sol copy the data to a new block

21
fsck() - files

examines directory system
counter per file
hard link makes a file to be in multiple
directories
compares the counter values with link counter in
the i-node
higher link counter value i-node will not be
deleted
higher counter value a linked file can be deleted

22
Buffer Cache

The block cache data structures

23
File System Performance (2)

i-nodes placed at the start of the disk
Disk divided into cylinder groups
each with its own blocks and i-nodes

24
The MS-DOS File System (1)

The MS-DOS directory entry

25
The MS-DOS File System (2)

Maximum partition for different block sizes
The empty boxes represent forbidden combinations

26
The Windows 98 File System (1)
Bytes

The extended MOS-DOS directory entry used in
Windows 98
32 bit block address is split into two places
for long name, a file has two names
My Document MYDOCU1

27
The Windows 98 File System (2)

An example of how a long name is stored in
Windows 98

28
The UNIX V7 File System (1)

A UNIX V7 directory entry

29
The UNIX V7 File System (2)

A UNIX i-node

30
The UNIX V7 File System (3)

The steps in looking up /usr/ast/mbox

31
DEMOS(Cray-1)

in normal case, contiguous allocation
flexibility for non-contiguous allocation
file header
table of base and size (10 entries)
each block group is contiguous on disk

base
size
block group (group of blocks)
32
DEMOS (2)

if a file needs more than 10 block groups, set
flag in file header BIGFILE (max 10GB)
each block group contains pointers to block group
pros cons
easy to find free block groups (small bitmap)
free areas merge automatically
- when disk comes close to full
no long runs of blocks (fragmentation)
CPU overhead to find free block
disk should preserve some reservation
experience tells us 10 would be good

33
Transactions in File System

reliability from unreliable components
concepts
atomicity all or nothing
durability once it happens, it is there
serializability transactions appear to happen
one by one

34
Transactions in File System(2)

Motivation
File Systems have lots of data structures
bitmap for free blocks
directory
file header
indirect blocks
data blocks
for performance reason, all must be cached
read requests are easy
what about writes?

35
Transactions in File System

Write to cache
write through cache is not of any help
write back data can be lost on a crash
Multiple updates are usual for a single file
operation
what happen if a crash occurs between updates
e.g. 1 move a file between directories
delete file from old directory
add file to new directory
e.g. 2 create a new file
allocate space on disk for header, data
write new header to disk
add the new file to the directory

36
Transactions in File System

Unix Approach (ad hoc)
meta-data consistency
synchronous write-through
multiple updates are done in specific order
after crash, fsck program fixes up anything in
progress
file created, but not yet in a directory gt
delete file
blocks allocated, but not in bitmap gt update
bitmap
user data consistency
write back to disk every 30 seconds or by user
request
no guarantee that blocks are written to disk in
any order
no support for transaction
user may want multiple file operation done as a
unit

37
Transactions in File System

Write-ahead logging
Almost all the file systems since 1985 use
write-ahead logging
Windows/NT, Solaris, OSF, etc.
mechanism
operation
write all changes in a transaction to log
send file changes to disk
reclaim log space
if crash, read log
if log isn't complete, no change!
if log is completely written, apply all changes
to disk
if log is zero, then don't worry. All updates
have gotten to disk

38
Log-Structured File Systems

Idea
write data once
log is the only copy of the data
as you modify disk blocks, store them in log
put everything data blocks, file header, etc, on
log
Data fetch
if need to get data from disk, get it from the
log
keep map in memory
tells you where everything is
map should be in the log for crash recovery

39
Log-Structured File Systems

Advantage
all writes are sequential!!
no seeks, except for reads which can be handled
by cache
cache is getting bigger
in extreme case, disk IO only for writes which
are sequential
same problems of contiguous allocation
many files are deleted in the first 5 minutes
need garbage collection
if disk fills up, problem!!
keep disk under-utilized

40
Log-Structured File Systems

Mechanism
Issues for implementing the log
how to retrieve information from the log
enough free space for the log
Cache file changes, and writes sequentially on
the disk in a single operation
fast writes
Information retrieval
inode map at a fixed checkpoint region
indices to inodes contained in the write
most of them are cached in memory
fast reads

41
Log Examples
LFS
data
i-node
dir
i-node
data
i-node
dir
i-node
map
log
FFS
i-node
i-node
data
dir
i-node
i-node
dir
data

In FFS, each inode is at a fixed location on disk
an index into the inode set is sufficient to find
it
in LFS, a map is needed to locate inode since it
is mixed with data on the log

42
Log-Structured File Systems

Space management
holes left by deleting files
threading
use the dispersed holes like a linked list
fragmentation will get worse
copying
copy a file out of the log to a leave large hole
expensive especially for long-lived files

43
Segment of LFS

Concept
clean segments are linked (threading)
segments with holes may be copied into a clean
segment
collect long-lived files into the same segment
Cleaning Policy
when? low watermark for clean segments
how many segments? high watermark
which segments? - most fragmented
how to group files?
files in the same directory
aging sort sort by the last modification time

44
Recovery

checkpoints and roll-forward (NOT a roll-back!!)
possible since all the file operations are in the
log
checkpoint
a point in the log at which file system is
completed
contains
address of inode maps
segment usage table
current time
checkpoint region
contains checkpoint
placed at a specific location on disk

45
Recovery(2)

operation
1. write out all modified information to disk
2. write out checkpoint region
on a crash,
roll-forward operations logged after the last
checkpoint
if the crash occurs while writing a checkpoint,
keep old checkpoint
need two checkpoint regions

46
Roll-Forward

Recover as much information as possible
in segment summary block, there exist
a new inode then, there must be data blocks
before it. Just update inode map
data blocks without inode ignores them since we
dont know if the data blocks are complete
Each inode has counter to indicate how many
directories refer it
reference counter updated, but directory is not
written
directory is written, but the reference counter
is not updated
sol employs special write ahead log for
directory changes

47
Informed Prefetching ..

Prefetching
memory prefetching (to cache memory)
disk prefetching (to memory buffer)
disk latency is larger in different order of
magnitude
Pros Cons of prefetching
reduce latency when the prefetched data is
accessed
file cache may be wasted if the prefetched data
is unused
difficult to know when the prefetched data will
be used
interference with other cached data and virtual
memory is difficult to understand
Assumptions
disk parallelism is underutilized
applications provide hints

48
Limits of RAID

RAID increases disk throughput when the workload
can be processed in parallel
very large accesses
multiple concurrent accesses
Many real I/O workload is not parallel
get a byte from a file
think
get another byte from (the same or another) file
access only a single disk at a time

49
Real I/O Workload

Recent trends
faster CPU generated I/O requests more often
programs favor larger data objects
file cache hit ratio is more important than
before
Most workload is read
writes can be done behind in parallel - Linux
processes are blocked on read
most access patterns are predictable.
Lets use the predictability as hints

50
Overview

Application discloses its future resource
requirements
the system makes the final decisions, not
applications
Disclosing hints are issued through ioctl
file specifier
file name or file descriptor
pattern specifier
sequential
list of ltoffset, lengthgt
What to do with the disclosing hints
parallelize the I/O request for RAID
keep the data in the cache
schedule disk to reduce seek time

51
Informed Cache Manager
52
A System Model

total execution time T NI/O(TCPU TI/O)
number of IO X (time between IO IO time)
TI/O Tmiss Thit
Tmiss Thit Tdriver Tdisk
Tdisk latency of the disk fetch
Tdriver buffer allocation, queueing at the
driver, and interrupt service

53
Benefit of a buffer

Tstall (x) read stall time when there are x
buffers for x prefetches
Tpf (x) service time for a hinted read when
there are x buffers

- benefit of using one more buffer
application accesses the data
application issues a hint
x buffers
54
Stall time Tstall (x)

before x-th request generates, it takes at least
x(TCPU Thit Tdriver) CPU time
all cache hits, no stall

prefetch issued
prefetched data is accessed
Tdisk
stall time
x(TCPU Thit Tdriver)
prefetch issued
prefetched data is accessed
Tdisk
no stall
x(TCPU Thit Tdriver)
55
Prefetch Horizon

stall time Tstall (x) is bounded by
Tdisk - x(TCPU Thit Tdriver)
- it overestimates !!

prefetch horizon P(TCPU) distance at which
Tstall becomes zero, i.e., there is no need to
prefetch beyond this point

56
What really happens

3 buffers are assumed
so,

57
Benefit of a single buffer

When used for prefetching
When used for demand miss ?

58
Model Verification

The model underestimates the stall time due to
neglecting disk contention
variation in disk service time (queueing effect)
overall, it is a good estimator

59
Cost of Shrinking LRU buffer cache

hit ratio H(n) for file cache with n buffers
service time
TLRU(n) H(n)Thit (1-H(n))Tmiss
cost for taking a buffer from the file cache
? TLRU(n) TLRU(n-1) - TLRU(n)
(H(n) - H(n-1))(Tmiss Thit)
H(n) varies with workload
need dynamic run time monitoring

60
Cost for Ejecting a Prefetched Block

cost is paid when the ejected block is accessed
again later
if the block stays in the cache, it would be Thit
cost when that block is prefetched in x accesses
in advance
ejection frees one block for y-x accesses
increase in service time per access is

y
x
prefetch
eject
reaccess
region affected by eviction
61
Local Value Estimates
62
Seeking Global Optimum

Normalization of each estimate LRU, hinted
prefetch
multiply each with usage rate
unhinted demand access rate LRU cache
estimate(TLRU)
access rate to the hinted sequence (TPF)
When a manager needs a new block
each estimator selects the least valuable block
hint the block that is accessed in the furthest
future
LRU the block at the bottom of the LRU stack
the manager selects the least valuable block
compare the benefit with the cost of least
valuable block

63
After 4 Years,

Providing hints is too much burden to programmers
Automatic hints generation is desired
there are idle CPU times when program blocks for
I/O
speculative execution can provide hints for
future I/O accesses
Approaches made
a kernel threads performs the speculative
execution
this speculating thread shares the address space
Issues
run time overhead
incorrectness
may affect the correctness of the results
incorrect hints may waste I/O bandwidth

64
Ensuring Program Correctness

Software copy-on-write
prevents code/data distortion
for each new write to a memory region, make a
copy
insert code to every load/store to check if it is
to a copied region
software fault isolation
code is inserted to a copy of code (shadow code)
original code is not changed, so no overhead for
normal execution
Generates no system call
system state is not changed by the speculative
execution
Signal handler
catches all exceptions that may disturb normal
execution

65
Generating Correct and Timely Hints

Problems
the speculating thread may lack behind generating
stale hints
the speculating thread may stray from the
execution path
How to detect the problems
the original thread checks the hint log prepared
by the speculating thread
if it is wrong, the original thread prepares a
copy of register set and sets the flag
when the speculating thread is invoked, checks
the flag
if set, restart using the register set