File System Part 2 - PowerPoint PPT Presentation

1 / 18
About This Presentation

File System Part 2


Working Directory and Tilde. Ideas. Cumbersome to write full path ... Tilde. Not a feature of the file system (chdir(~mjordan) fail! ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 19
Provided by: andreaarpa
Tags: file | part | system | tilde


Transcript and Presenter's Notes

Title: File System Part 2

File System (Part 2)
CS 537Introduction to Operating Systems
Andrea C. Arpaci-DusseauRemzi H.
Arpaci-Dusseau Haryadi S. Gunawi
  • Directories and Naming
  • From pathname to inode
  • Tree-structure Directory hierarchy
  • Block and Inode allocation
  • Mechanism (bitmaps) and policy
  • More optimizations

  • Human
  • Jump to slide 20 of /tmp/slides.ppt ? Random
  • Say /tmp/ is mounted on /dev/hda4
  • Powerpoint application
  • Convert slide 20 to byte offset (e.g. 20000-th
  • System call
  • read(/tmp/slides.ppt, byte offset 20000)
  • File System
  • Get the file information of /tmp/slides.ppt ?
    inode 76
  • Convert byte offset into block offset in a file
    (e.g. block offset 20)
  • Get the block number at the block offset 20
    (e.g. block number 6543)
  • To block layer read logical block number 6543
    (logical wrt this partition /dev/hda4)
  • Block layer
  • Converts LBN 6543 of /dev/hda4 to disk sector
  • Block layer to device driver read sector

Abstraction File
  • User view
  • Named collection of bytes
  • Untyped or typed
  • Examples text, source, object, executables,
  • Permanently and conveniently available
  • Operating system view
  • Map bytes as collection of blocks on physical
    non-volatile storage device
  • Magnetic disks, tapes, NVRAM, battery-backed RAM
  • Persistent across reboots and power failures
  • File Meta-data (or Inode) Additional system
    information associated with each file
  • Name of file
  • Type of file (e.g. directory, regular file,
    device file, symbolic link, etc.)
  • Pointer to data blocks on disk
  • File size
  • Times Creation, access, modification
  • Owner and group id
  • Protection bits (read or write)
  • Inode is stored on disk
  • Conceptually meta-data can be stored as array on

Abstraction Directories
  • Organization technique Map file name to blocks
    of file data on disk
  • Actually, map file name to file inode (which
    enables one to find data on disk)
  • /tmp/slides.ppt ? inode 76
  • Old (bad) approaches
  • Single-level directory
  • Each file has unique name
  • Special part of disk holds directory listing
  • Two-level directory
  • Directory for each user
  • Specify file with user name and file name

Directories Tree-Structured
  • Directory in Linux file system
  • Directory is stored and treated like a file
  • Special bit set in meta-data for directories
  • User programs can read directories
  • Only system programs can write directories
  • Director listing is stored in the files data
    blocks in the form of ltinode, namegt
  • name can be of arbitrary length (there is a max
  • Array of inodes is stored in a fixed location on
    the disk
  • Special directories
  • Root Fixed index for meta-data (e.g., 2)
  • This directory .
  • Parent directory ..

Absolute path to inode
data blocks
Inode Type DIR Ino 2 direct0 direct1
2 . 2 .. 50 a 51 tmp 52 boot
Inode Type DIR Ino 50 direct0 direct1
  • 50 .
  • 2 ..
  • b
  • Example read(/a/b/c) // read some bytes from c
  • Read inode 2, look for a find lta, 50gt
  • Read inode 50, look for b find ltb, 900gt
  • Read inode 900, look for c find ltc, 2000gt
  • Read inode 2000, and get the data block
  • Total 8 I/Os (roughly)
  • (if inodes are not in memory initially, and all
    the inodes needed in this example are stored in
    different blocks)

Working Directory and Tilde
  • Ideas
  • Cumbersome to write full path
  • Bad for OS if you always give the full path
  • Example current working directory lecture
  • OS must traverse the directory hierarchy every
    time ? lots of I/Os
  • If use cwd, the parent paths (e.g.
    /mylife/college/courses/cs/537) do not have to be
    in memory
  • Store cwd in PCB
  • PCB stores the inode number of the cwd
  • read(notes) ? generally speaking, only 6 I/Os
  • Read lectures inode, lectures dir data block
    (to get inode for folder 14)
  • Read 14s inode, 14s dir data block (to get
    inode for notes)
  • Read notes inode, and notes data block
  • Tilde
  • Not a feature of the file system
  • (chdir(mjordan) ? fail!)
  • Look at entry mjordan in /etc/password file to
    get the absolute path name

Acyclic-Graph Directories
  • More general than tree structure
  • Add connections across the tree (no cycles)
  • Create links from one file (or directory) to
  • Hard link ln a b (a must exist already)
  • Idea Can use name a or b to get to same file
  • Implementation Multiple directory entries point
    to same inode
  • What happens when you remove a? Does b still
  • Yes
  • How is this feature implemented? A counter in the
  • Unix Does not create hard links to directories.
  • Theres only one entry for ..
  • Please search for file A will never end
  • Due to cyclic-graph

Acyclic-Graph Directories
  • Symbolic (soft) link
  • Some kind of shortcut
  • ln -s /a/very/long/directory/path/
  • shortpath
  • is a special file (designated by bit in
  • Content of shortpaths data block is
  • Optimization put the pathname in the array that
    is supposedly used for pointing to data blocks
    and indirect blocks
  • Only if the pathname is short
  • To access the pathname, no need to read a data
  • ls l /shortpath
  • shortpath ? /a/very/long/directory/path/

Bitmaps (space management)
  • How do know which blocks are free/allocated?
  • Old days
  • Free list (like in Project 3)
  • Poor freelist organization
  • Consecutive file blocks not close together
  • Pay seek cost between even sequential disk
  • Today bitmaps
  • 1 bit for each block
  • Very small space overhead 1 bit / 4 KB (0.003)
  • 400 GB file system 100 Mbits 12.5 MB
  • Automatically merge adjacent free blocks
  • Bitmap vs. Free list
  • Bitmap is useful for space management for
    fixed-size chunks
  • The smallest allocation unit is large (e.g. a 4KB
    block) ? small space overhead
  • Number of requested units is small ? scan bitmap
    is fast
  • Free list is useful for variable-size chunks
  • The smallest allocation unit is small (e.g. in
    Project 3, the unit is 4 bytes) ? high space
  • Number of requested units can be large (e.g.
    Alloc(1000 bytes)) ? scan bitmap is slow

Allocation Policy (1)
  • Problems
  • create file.txt in /tmp, which inode should
    represent file.txt?
  • give me 10 new data blocks, which data blocks
    to give?
  • A good policy needs to match with the common
  • Workloads influence design of file system
  • File characteristics (measurements of UNIX and
  • Most files are small (about 8KB)
  • Most of the disk is allocated to large files
  • (90 of data is in 10 of files)
  • Access patterns
  • Sequential Data in file is read/written in order
  • Most common access pattern
  • Random (direct) Access block without referencing
  • Difficult to optimize
  • Access files in same directory together
  • Spatial locality
  • Access meta-data when access file data
  • Need meta-data to find data

Allocation Policy (2)
  • Old days Unorganized free list
  • Initially files have contiguous data blocks
  • But FS is long-lived entities
  • No locality in allocation to disk
  • At the end, data are scattered
  • Inodes far from data blocks
  • Pay long seek for every data transfer
  • I-nodes of files in directory not close together
  • Pay seek for every inode (e.g., ls -l)

Allocation Policy (3)
  • Implications locality-driven policy
  • Large files should be allocated sequentially
  • Files in same directory should be allocated near
    each other
  • Data should be allocated near its meta-data
  • Spread unrelated data far apart
  • Leaves room for related things to be placed
  • Keep freespace on disk
  • Always find free block nearby
  • 90 rule of thumb

BSD/Linux Solution Cylinder Groups
  • Divide disk into cylinder groups
  • Set of adjacent cylinders
  • Little seek time between cylinders in same group
  • Each cylinder groups contains
  • Superblock (contains information about where
  • Vary offset within each cylinder group for
  • Otherwise all superblock copies will be on the
    same platter
  • Inodes
  • Fixed number per cylinder group
  • Bitmap of free blocks
  • Usage summary for high-level allocation policy
  • Data blocks
  • Cylinder groups vs. 1 cylinder group
  • Compare this approach with the default approach
    where array of inodes is stored in the beginning
    of the disk
  • Lots of seeks! Because you need to read inode and
    its data (in both cases reading/writing
    directories/regular files)
  • Inode is in the beginning part of the disk and
    data could be in any part of the disk

Solution to Achieving Locality
  • Maintain locality of each file
  • Allocate runs of blocks within a cylinder group
  • Maintain locality of files and inodes in a
  • Problem Create a file file.txt inside
    directory dir, where should file.txt be
  • Keep files in a directory in same cylinder group
  • Make room for locality within a directory
  • Spread out directories among the cylinders groups
  • A new directory is placed in a cylinder group
    that has a greater than average of free inodes,
    and the smallest of directories
  • The files in the directory will be placed in the
    same cylinder group
  • Goal inode clustering policies to succeed most
    of the time
  • Switch to a different cylinder group for large
  • After 48KB and every 1MB thereafter
  • Prevent one file from filling a cylinder group

Layout Global vs. Local
  • Decompose allocation into two steps
  • Global Heuristics for allocate filesdirectories
    to cylinder groups
  • Pick optimal next block for allocation
  • Example
  • A file is in cg10, and its first data block is
    in block 1001
  • A user appends the file such that a second data
    block must be added
  • Heuristics put 2nd data block in cg10, and
    block 1002
  • Local Handles request for specific block
  • If free, use it, else
  • Find a block (or some blocks) within cylinder
    group (e.g. cg10)
  • Rehash on cylinder group to choose another group
  • Exhaustive search

More optimizations
  • Preallocation
  • Problem append(f1), append(f2), append(f1),
  • Preallocates disk data blocks before they are
    actually used
  • E.g. block preallocation 8
  • If the file is closed, preallocation is released
  • Why work?
  • Common workloads when creating a file, it tends
    to grow within the near future as long as file is
    not closed
  • Readahead
  • disk_read(block 1001)
  • 8 block readahead reads 1001 - 1008
  • Why useful? Common workload sequential access

Misc (Roles of FS)
  • Standard library
  • Provide file abstraction
  • Simplify access to disk (a bunch of bits)
  • Resource coordination
  • Provide protection across users
  • Provide fair and efficient performance
  • User processes do not have direct access to
  • Could crash entire system
  • Could read/write data without appropriate
  • Could hog device unfairly
  • FS exports higher-level functions
  • File system Provides file and directory
  • File system operations mkdir, create, read, write
Write a Comment
User Comments (0)