File System - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

File System

Description:

easy to find free block groups (small bitmap) free areas merge automatically ... bitmap for free blocks. directory. file header. indirect blocks. data blocks ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 66
Provided by: steve1797
Category:
Tags: bitmap | file | system

less

Transcript and Presenter's Notes

Title: File System


1
File System
2
Long-term Information Storage
  • Must store large amounts of data
  • Information stored must survive the termination
    of the process using it
  • Multiple processes must be able to access the
    information concurrently

3
File Structure
  • Three kinds of files
  • byte sequence
  • record sequence
  • tree

4
File Types
  • (a) An executable file (b) An archive

5
File Access
  • Sequential access
  • read all bytes/records from the beginning
  • cannot jump around, could rewind or back up
  • convenient when medium was mag tape
  • Random access
  • bytes/records read in any order
  • essential for data base systems
  • read can be
  • move file marker (seek), then read or
  • read and then move file marker

6
Memory-Mapped Files
  • map() and unmap()
  • map a file onto a portion of the address space
  • read( ) and write( ) are replaced with memory
    operations
  • implementation
  • page tables map the file like ordinary pages
  • same sharing/protection as pages
  • issues
  • interaction between file system and VM when two
    processes access the same file via different
    methods

7
File System Implementation
  • A possible file system layout

8
Implementing Files (1)
  • (a) Contiguous allocation of disk space for 7
    files
  • (b) State of the disk after files D and E have
    been removed

9
Implementing Files (2)
  • Storing a file as a linked list of disk blocks

10
Implementing Files (3)
  • Linked list allocation using a file allocation
    table in RAM

11
Implementing Files (4)
  • An example i-node

12
Implementing Directories (1)
  • (a) A simple directory
  • fixed size entries
  • disk addresses and attributes in directory entry
  • (b) Directory in which each entry just refers to
    an i-node

13
Implementing Directories (2)
  • Two ways of handling long file names in directory
  • (a) In-line
  • (b) In a heap

14
Shared Files (1)
  • File system containing a shared file

15
Shared Files (2)
  • (a) Situation prior to linking
  • (b) After the link is created
  • (c)After the original owner removes the file

16
Disk Space Management (1)
Block size
  • Dark line (left hand scale) gives data rate of a
    disk
  • Dotted line (right hand scale) gives disk space
    efficiency
  • All files 2KB

17
Disk Space Management (2)
  • (a) Storing the free list on a linked list
  • (b) A bit map

18
Disk Space Management (3)
  • (a) Almost-full block of pointers to free disk
    blocks in RAM
  • - three blocks of pointers on disk
  • (b) Result of freeing a 3-block file
  • (c) Alternative strategy for handling 3 free
    blocks
  • - shaded entries are pointers to free disk blocks

19
Disk Space Management (4)
  • Quotas for keeping track of each users disk use

20
fsck() - blocks
  • File system states
  • (a) consistent
  • (b) missing block 2 put it to the free list
  • (c) duplicate block in free list 4 happens only
    in linked list
  • (d) duplicate data block 5 a delete will set
    the block as used and free
  • sol copy the data to a new block

21
fsck() - files
  • examines directory system
  • counter per file
  • hard link makes a file to be in multiple
    directories
  • compares the counter values with link counter in
    the i-node
  • higher link counter value i-node will not be
    deleted
  • higher counter value a linked file can be deleted

22
Buffer Cache
  • The block cache data structures

23
File System Performance (2)
  • i-nodes placed at the start of the disk
  • Disk divided into cylinder groups
  • each with its own blocks and i-nodes

24
The MS-DOS File System (1)
  • The MS-DOS directory entry

25
The MS-DOS File System (2)
  • Maximum partition for different block sizes
  • The empty boxes represent forbidden combinations

26
The Windows 98 File System (1)
Bytes
  • The extended MOS-DOS directory entry used in
    Windows 98
  • 32 bit block address is split into two places
  • for long name, a file has two names
  • My Document MYDOCU1

27
The Windows 98 File System (2)
  • An example of how a long name is stored in
    Windows 98

28
The UNIX V7 File System (1)
  • A UNIX V7 directory entry

29
The UNIX V7 File System (2)
  • A UNIX i-node

30
The UNIX V7 File System (3)
  • The steps in looking up /usr/ast/mbox

31
DEMOS(Cray-1)
  • in normal case, contiguous allocation
  • flexibility for non-contiguous allocation
  • file header
  • table of base and size (10 entries)
  • each block group is contiguous on disk

base
size
block group (group of blocks)
32
DEMOS (2)
  • if a file needs more than 10 block groups, set
    flag in file header BIGFILE (max 10GB)
  • each block group contains pointers to block group
  • pros cons
  • easy to find free block groups (small bitmap)
  • free areas merge automatically
  • - when disk comes close to full
  • no long runs of blocks (fragmentation)
  • CPU overhead to find free block
  • disk should preserve some reservation
  • experience tells us 10 would be good

33
Transactions in File System
  • reliability from unreliable components
  • concepts
  • atomicity all or nothing
  • durability once it happens, it is there
  • serializability transactions appear to happen
    one by one

34
Transactions in File System(2)
  • Motivation
  • File Systems have lots of data structures
  • bitmap for free blocks
  • directory
  • file header
  • indirect blocks
  • data blocks
  • for performance reason, all must be cached
  • read requests are easy
  • what about writes?

35
Transactions in File System
  • Write to cache
  • write through cache is not of any help
  • write back data can be lost on a crash
  • Multiple updates are usual for a single file
    operation
  • what happen if a crash occurs between updates
  • e.g. 1 move a file between directories
  • delete file from old directory
  • add file to new directory
  • e.g. 2 create a new file
  • allocate space on disk for header, data
  • write new header to disk
  • add the new file to the directory

36
Transactions in File System
  • Unix Approach (ad hoc)
  • meta-data consistency
  • synchronous write-through
  • multiple updates are done in specific order
  • after crash, fsck program fixes up anything in
    progress
  • file created, but not yet in a directory gt
    delete file
  • blocks allocated, but not in bitmap gt update
    bitmap
  • user data consistency
  • write back to disk every 30 seconds or by user
    request
  • no guarantee that blocks are written to disk in
    any order
  • no support for transaction
  • user may want multiple file operation done as a
    unit

37
Transactions in File System
  • Write-ahead logging
  • Almost all the file systems since 1985 use
    write-ahead logging
  • Windows/NT, Solaris, OSF, etc.
  • mechanism
  • operation
  • write all changes in a transaction to log
  • send file changes to disk
  • reclaim log space
  • if crash, read log
  • if log isn't complete, no change!
  • if log is completely written, apply all changes
    to disk
  • if log is zero, then don't worry. All updates
    have gotten to disk

38
Log-Structured File Systems
  • Idea
  • write data once
  • log is the only copy of the data
  • as you modify disk blocks, store them in log
  • put everything data blocks, file header, etc, on
    log
  • Data fetch
  • if need to get data from disk, get it from the
    log
  • keep map in memory
  • tells you where everything is
  • map should be in the log for crash recovery

39
Log-Structured File Systems
  • Advantage
  • all writes are sequential!!
  • no seeks, except for reads which can be handled
    by cache
  • cache is getting bigger
  • in extreme case, disk IO only for writes which
    are sequential
  • same problems of contiguous allocation
  • many files are deleted in the first 5 minutes
  • need garbage collection
  • if disk fills up, problem!!
  • keep disk under-utilized

40
Log-Structured File Systems
  • Mechanism
  • Issues for implementing the log
  • how to retrieve information from the log
  • enough free space for the log
  • Cache file changes, and writes sequentially on
    the disk in a single operation
  • fast writes
  • Information retrieval
  • inode map at a fixed checkpoint region
  • indices to inodes contained in the write
  • most of them are cached in memory
  • fast reads

41
Log Examples
LFS
data
i-node
dir
i-node
data
i-node
dir
i-node
map
log
FFS
i-node
i-node
data
dir
i-node
i-node
dir
data
  • In FFS, each inode is at a fixed location on disk
  • an index into the inode set is sufficient to find
    it
  • in LFS, a map is needed to locate inode since it
    is mixed with data on the log

42
Log-Structured File Systems
  • Space management
  • holes left by deleting files
  • threading
  • use the dispersed holes like a linked list
  • fragmentation will get worse
  • copying
  • copy a file out of the log to a leave large hole
  • expensive especially for long-lived files

43
Segment of LFS
  • Concept
  • clean segments are linked (threading)
  • segments with holes may be copied into a clean
    segment
  • collect long-lived files into the same segment
  • Cleaning Policy
  • when? low watermark for clean segments
  • how many segments? high watermark
  • which segments? - most fragmented
  • how to group files?
  • files in the same directory
  • aging sort sort by the last modification time

44
Recovery
  • checkpoints and roll-forward (NOT a roll-back!!)
  • possible since all the file operations are in the
    log
  • checkpoint
  • a point in the log at which file system is
    completed
  • contains
  • address of inode maps
  • segment usage table
  • current time
  • checkpoint region
  • contains checkpoint
  • placed at a specific location on disk

45
Recovery(2)
  • operation
  • 1. write out all modified information to disk
  • 2. write out checkpoint region
  • on a crash,
  • roll-forward operations logged after the last
    checkpoint
  • if the crash occurs while writing a checkpoint,
  • keep old checkpoint
  • need two checkpoint regions

46
Roll-Forward
  • Recover as much information as possible
  • in segment summary block, there exist
  • a new inode then, there must be data blocks
    before it. Just update inode map
  • data blocks without inode ignores them since we
    dont know if the data blocks are complete
  • Each inode has counter to indicate how many
    directories refer it
  • reference counter updated, but directory is not
    written
  • directory is written, but the reference counter
    is not updated
  • sol employs special write ahead log for
    directory changes

47
Informed Prefetching ..
  • Prefetching
  • memory prefetching (to cache memory)
  • disk prefetching (to memory buffer)
  • disk latency is larger in different order of
    magnitude
  • Pros Cons of prefetching
  • reduce latency when the prefetched data is
    accessed
  • file cache may be wasted if the prefetched data
    is unused
  • difficult to know when the prefetched data will
    be used
  • interference with other cached data and virtual
    memory is difficult to understand
  • Assumptions
  • disk parallelism is underutilized
  • applications provide hints

48
Limits of RAID
  • RAID increases disk throughput when the workload
    can be processed in parallel
  • very large accesses
  • multiple concurrent accesses
  • Many real I/O workload is not parallel
  • get a byte from a file
  • think
  • get another byte from (the same or another) file
  • access only a single disk at a time

49
Real I/O Workload
  • Recent trends
  • faster CPU generated I/O requests more often
  • programs favor larger data objects
  • file cache hit ratio is more important than
    before
  • Most workload is read
  • writes can be done behind in parallel - Linux
  • processes are blocked on read
  • most access patterns are predictable.
  • Lets use the predictability as hints

50
Overview
  • Application discloses its future resource
    requirements
  • the system makes the final decisions, not
    applications
  • Disclosing hints are issued through ioctl
  • file specifier
  • file name or file descriptor
  • pattern specifier
  • sequential
  • list of ltoffset, lengthgt
  • What to do with the disclosing hints
  • parallelize the I/O request for RAID
  • keep the data in the cache
  • schedule disk to reduce seek time

51
Informed Cache Manager
52
A System Model
  • total execution time T NI/O(TCPU TI/O)
  • number of IO X (time between IO IO time)
  • TI/O Tmiss Thit
  • Tmiss Thit Tdriver Tdisk
  • Tdisk latency of the disk fetch
  • Tdriver buffer allocation, queueing at the
    driver, and interrupt service

53
Benefit of a buffer
  • Tstall (x) read stall time when there are x
    buffers for x prefetches
  • Tpf (x) service time for a hinted read when
    there are x buffers

- benefit of using one more buffer
application accesses the data
application issues a hint
x buffers
54
Stall time Tstall (x)
  • before x-th request generates, it takes at least
    x(TCPU Thit Tdriver) CPU time
  • all cache hits, no stall

prefetch issued
prefetched data is accessed
Tdisk
stall time
x(TCPU Thit Tdriver)
prefetch issued
prefetched data is accessed
Tdisk
no stall
x(TCPU Thit Tdriver)
55
Prefetch Horizon
  • stall time Tstall (x) is bounded by
  • Tdisk - x(TCPU Thit Tdriver)
  • - it overestimates !!
  • prefetch horizon P(TCPU) distance at which
    Tstall becomes zero, i.e., there is no need to
    prefetch beyond this point

56
What really happens
  • 3 buffers are assumed
  • so,

57
Benefit of a single buffer
  • When used for prefetching
  • When used for demand miss ?

58
Model Verification
  • The model underestimates the stall time due to
  • neglecting disk contention
  • variation in disk service time (queueing effect)
  • overall, it is a good estimator

59
Cost of Shrinking LRU buffer cache
  • hit ratio H(n) for file cache with n buffers
  • service time
  • TLRU(n) H(n)Thit (1-H(n))Tmiss
  • cost for taking a buffer from the file cache
  • ? TLRU(n) TLRU(n-1) - TLRU(n)
  • (H(n) - H(n-1))(Tmiss Thit)
  • H(n) varies with workload
  • need dynamic run time monitoring

60
Cost for Ejecting a Prefetched Block
  • cost is paid when the ejected block is accessed
    again later
  • if the block stays in the cache, it would be Thit
  • cost when that block is prefetched in x accesses
    in advance
  • ejection frees one block for y-x accesses
  • increase in service time per access is

y
x
prefetch
eject
reaccess
region affected by eviction
61
Local Value Estimates
62
Seeking Global Optimum
  • Normalization of each estimate LRU, hinted
    prefetch
  • multiply each with usage rate
  • unhinted demand access rate LRU cache
    estimate(TLRU)
  • access rate to the hinted sequence (TPF)
  • When a manager needs a new block
  • each estimator selects the least valuable block
  • hint the block that is accessed in the furthest
    future
  • LRU the block at the bottom of the LRU stack
  • the manager selects the least valuable block
  • compare the benefit with the cost of least
    valuable block

63
After 4 Years,
  • Providing hints is too much burden to programmers
  • Automatic hints generation is desired
  • there are idle CPU times when program blocks for
    I/O
  • speculative execution can provide hints for
    future I/O accesses
  • Approaches made
  • a kernel threads performs the speculative
    execution
  • this speculating thread shares the address space
  • Issues
  • run time overhead
  • incorrectness
  • may affect the correctness of the results
  • incorrect hints may waste I/O bandwidth

64
Ensuring Program Correctness
  • Software copy-on-write
  • prevents code/data distortion
  • for each new write to a memory region, make a
    copy
  • insert code to every load/store to check if it is
    to a copied region
  • software fault isolation
  • code is inserted to a copy of code (shadow code)
  • original code is not changed, so no overhead for
    normal execution
  • Generates no system call
  • system state is not changed by the speculative
    execution
  • Signal handler
  • catches all exceptions that may disturb normal
    execution

65
Generating Correct and Timely Hints
  • Problems
  • the speculating thread may lack behind generating
    stale hints
  • the speculating thread may stray from the
    execution path
  • How to detect the problems
  • the original thread checks the hint log prepared
    by the speculating thread
  • if it is wrong, the original thread prepares a
    copy of register set and sets the flag
  • when the speculating thread is invoked, checks
    the flag
  • if set, restart using the register set
Write a Comment
User Comments (0)
About PowerShow.com