File Systems

About This Presentation

Title:

File Systems

Description:

Link & unlink: link is a common technique used for sharing files or directories between users. ... Directories ... Implementing Directories ... – PowerPoint PPT presentation

Number of Views:319

Avg rating:3.0/5.0

Slides: 76

Provided by: RPy1

Category:

more less

Transcript and Presenter's Notes

Title: File Systems

1
File Systems

We need a mechanism that provides long-term
information storage with following
characteristics
Possible to store large amount of INFO
INFO survives after termination of any process
Multiple processes can access INFO concurrently
The file system is the component of O.S. that
manipulate the INFO as files and directories
The file systems is the appearance of INFO from
the users standpoint that involved two main
structures Files and directories

2
Files

INFO stored in the files must be persistent, that
is, not be affected by process creation and
termination
A file is a logical storage unit defined by the
O.S. providing the user a mechanism to store INFO
on a physical storage devices such as disk , tape
, CD and etc.
user O.S.
Physical
Logical View
view

--- ---- -----
3
File Naming

Some O.S. recognize difference between upper and
lower case letters ( e.g., Unix) and some of them
dont (e.g., MS-DOS)
The file extension usually indicates what type of
file it is (see the next slide). In some systems
(e.g., UNIX), file extension are just conventions
and are not enforced by O.S. Some other systems
(e.g., Windows) are aware of extension and use
programs that are assigned to the extensions
(e.g., file.doc starts Word)

5
File Structure

The structure of a file is determined by O.S.
Some O.S.,s (e.g., CPM and old mainframes)
impose the view that a file is a sequence of
fixed length records ( e.g., b in the next slide)
Other O.S.s may impose a B-tree (or index) like
structure on a file in order to support rapid
search ( e.g., c in the next slide)
The problem with imposing more structure by O.S.
is it is difficult to do something out of the
ordering that is not foreseen by O.S. designer

7
File Structure

O.S. systems such as UNIX and Windows impose no
structure to ensure maximum flexibility. They
consider a file as a steam of bytes , and user
processes define any structure that they want
I/O is usually performed in units of ONE physical
Block and all blocks have the same size that is
related to the page size in paging scheme.

8
File Types

Some of the file types are
Regular files User files (ASCII files or binary
files)
Directory files System files used to maintain
directory structure
I/O files Special system files dedicated to I/O
Executable files O.S. usually expects special
structure for these files. For example in Unix
they must start with Magic Number. Next slide
shows difference between executable (a) and
archive (i.e. compiled but not linked) file in
Unix

10
File Access

Generally two types of access are provided for
the files
Sequential access starts from the beginning and
read sequentially (usually is using with tapes)
Random access can access any byte in the file
directly.
O.S. provides these operations to the user

11
File Attributes

Deals with
Location where the file is physically located
Size how big is the file
Type what kind of file it is
Protection who can access the file
Time Date when was the last access or
modification
User who created the file
and other information. Some of the attributes
are shown in the next slide

13
File Operations

Most common system calls relating to files
Create announce that file is coming and set
attributes and allocate space
Delete Free disk space, adjust directory
structure
Open Fetch the attributes and location of the
file
Close Release internal table space and writing
the files last block

14
File Operations

Read Data read from the file and put into memory
for user access
Write Data are written to the file usually at
the current position
Append Adds data to the end of file
Seek Random access data from the file,
repositioning the file pointer for reading
Rename Change the name of the file
Get Set attribute Get attributes of file or
set attributes of a file (e.g., get and set read
only attribute )
See the program for copying a file in UNIX shown
in the next slides. It can be called by the
following command line
copyfile abc xyz

17
Directories

Directories are the mechanism provided by O.S. to
keep track of files. A directory records info a
bout the files in the particular partition.
Directory typically contains one entry per file.
It may contain Name, Attributes and Location or
It may contain Name and pointer to Attribute
information

18
Directory Structure

Single level directory system
No owner, problem is the files with the same
names created by two different owners
Note that in the following Figures the files are
shown by the owner names. For example the files
named A created by the same owner.

19
Directory Structure

Two-level directory system
Search in directories is based on user name.
Problem is the user with the large number of
files

20
Directory Structure

Hierarchical directory system

21
Path Names

Absolute path name /usr/ast/mailbox. Always
starts with / (i.e.,separator)
Relative Path Name mailbox
Current directory or working directory determines
the relative path name
In Unix . is current directory and
.. refers to parent
For example cp ../lib/abdy.doc .

22
Directory Operations

Create creates . , ..
Delete only empty directory can be deleted
Rename
Link unlink link is a common technique used
for sharing files or directories between users.
(see next slide). Instead of link, duplication of
the files can be used for shared files but the
problem of duplication is consistency is
difficult to maintain. Link within a directory
can be hard link (implemented by i-node that
explained later) or symbolic linking (creating a
file that contains the path of the linked file).

23
Directories

Creating a shared file by link changes the
directory structure from a tree to a graph

24
File System Layout

Most disks divided up into one or more
partitions, with independent file systems on each
partition.
Sector 0 of disk is called MBR ( Master Boot
Record) and contains partition table that
contains start and ending address for each
partition
The layout of a disk partition depends on its
file system. For example after its first block (
i.e., boot block) it may contain super block that
contains administrative information such as magic
numbers to identify file types. (see next slide)

26
Implementing the Files

Various methods are used in different O.S. for
implementing the files
Contiguous Allocation Each file is stored on
consecutive disk blocks. For example for a disk
with 4K block size a 20K file is stored on 5
consecutive blocks. (see next slide)
Advantages
simple to implement because we need to know only
disk address of the first block and number of
blocks
The read performance is excellent because we need
only one disk operation to read the entire file.

27
Contiguous Allocation

28
Contiguous Allocation

The disadvantages of Contiguous allocation are
Disk fragmentation happens when the files are
removed. Compaction is difficult because all the
blocks following the holes should be copied. It
is worse when the disk filled up.
Needs to know the final size of new file to be
able to choose the correct hole to place it. That
is also difficult
Consecutive allocation is good for write once
medias such as CD-ROMS and DVDs

29
Linked List Allocation

A linked list of disk blocks (first word is
pointer) is kept in this method
Every disk blocks can be used (except for
internal fragmentation)
The sequential read for the blocks of the file is
easy but random access to each block is hard
because we have to read all the blocks of a file
before that block
Because of pointer the amount of data stored in
each block is not a power of two

30
Linked List Allocation

31
Linked List Allocation using a Table in Memory

Both of disadvantages of the linked list
allocation can be eliminated by keeping the table
of pointer to the blocks (FAT) in the memory.
MSDOS uses that.
Random access to blocks is easy because there is
no disk reference involved. We need only the
starting block number.
The problem is for 20 GB disk, and a 1 KB block
size table needs 20 million entries if each be 4
bytes, table will take approximately 80 MB .

32
File Allocation Table

33
I-nodes

To solve the problem of the large file table we
can use i-node
In this method for each file there is a table
contains attributes and disk address of the
blocks of that file. So if i-node occupies n
bytes for k files open we have kn bytes of
memory. Thus i-node depends on open files not
disk size
Problem is if each i-node has room for a fixed
number of disk addresses what happens when a file
grows beyond this limit?
One solution is keeping multiple indexes in
i-node.

35
I-node in Unix

i-node in UNIX has
Initial 10 disk addresses.
Single indirect blocks keeps address of file more
blocks for larger files.
Double indirect block that holds address of the
blocks each contains a list of single indirect
block
Triple indirect block has the address of block
each is double indirect block

36
I-node in Unix

37
Implementing Directories

Basically, a directory is a file that contains an
entry for each file or subdirectory in that
directory
When a file is opened, O.S. uses the path name to
locate directory entry
Each directory entry contain the file information
Each file information can be stored directly in
directory entry (a in the next slide)
Or file information can be stored in i-node and
each directory entry refers to i-node (b in the
next slide)

38
Implementing Directories

39
Directories in MS-DOS

Same as CP/M directory entries they are 32 bits
each
The extension is for a large file size that
requires more than one directory entry. The order
in which directory entries should be followed
First block number is the physical block number
address of the file

40
Directories in MS-DOS

41
Directories in UNIX

Each directory entry contains file name and
i-node number

42
Directories in UNIX

Directory lookup in Unix and all hierarchical
system is same
First file system locates the root directory.
Then it looks up the first component of the path
and its i-node
From the i-node system looks up the block address
of next component and it works in the same way
until the file can be found. For example next
slide shows the steps in looking up /usr/ast/mbox

44
Disk Space ManagementPhysical Disk Structure

Main secondary storage is disk. Tape mainly is
used for backup
The physical disk consists of cylinders. Each
cylinder is divided into tracks. A track is
divided further into sectors. One or more sectors
form a logical block. Data transfers between the
main memory and disk are in the units of logical
blocks. The size of a logical block is usually
512 bytes or larger, although the disk can be
formatted to have different logical block sizes

45
(No Transcript)
46

47
Disk Read Speed

The total time for accessing a file consists of
the time to move the head to the right track
(seek time), the time to find a correct sector
(rotational delay), and the time to transfer data
(transfer time). Disk seek time contributes more
to the total delay of accessing the files,
especially when files are not stored in
contiguous blocks.

48
Disk Read

Example The seek time is 10 msec per block in
average, and rotation latency is 8 msec per block
in average and transfer time is 0.25 msec for
1KB block for a disk system. The average reading
time for each block in this disk system is 10
8 0.25 18.25 ms
Usually as shown in this example seek time and
rotation time contribute more to disk read
latency.
It means if we reduce seek time or rotation
latency we can increase disk read time
significantly. Therefore most of the
optimizations for increasing disk performance are
based on reducing disk seek time.
For example in Unix FFS uses cylinder grouping
technique to reduce disk seek time

49
Cylinder Grouping Technique

Fast File System (FFS, a Unix file system) uses
the cylinder grouping technique to provide both
block-level and file-level clustering. In the
cylinder grouping technique, users or
applications have to place the related files into
a directory. The files of the directory are
allocated in one or more consecutive cylinders to
reduce disk seek time (see next slide). In the
cylinder grouping technique, files belonging to a
directory are stored on consecutive blocks on
disk(s). With the same approach, FFS also tries
to store a single file in consecutive disk
blocks.

51
Keeping Track of Free Blocks

There are two methods for keeping track of the
free disk blocks. Linked list and bitmap
Often free blocks on disk can be used to hold the
number of free blocks. For example (a) in the
next slide shows three free blocks (16,17 and 18)
that maintain the block numbers of the free
blocks with linked list method.

52
Free disk blocks 16, 17 , 18

(b)
(a)
53
Keeping Track of Free Blocks

In the bit map method one bit required for each
block, where 1 shows block is used and 0 shows
the block is free. Bit map method requires less
space compare to linked list, except for the
situation in which disk is full and there is
only free few blocks on disk.

54
File System Reliability

Bad block management Most hard disk have bad
blocks that can be resolved by hardware solution
or software solution

55
File System Reliability

Backups
Full backups
Problem taking long time and space.
Solution instead of the entire file system
only part of that can be backed up. There is no
reason to backup /bin or /dev files in UNIX

56
File System Reliability

Incremental dumps to make a complete dump
(backup) periodically and make daily backup of
only those files that have been modified since
the last dump
Advantage minimize the backup time
Disadvantage It makes recovery more complicated

57
File System Consistency

If the system crashes before writing all the
modified blocks, file system becomes
inconsistent.
Solution Checking the file system consistency.
For example fsck in UNIX or scandisk in Windows

58
File System Consistency

Two type of consistency checks can be made
block and files consistency check
Block consistency check
Two tables are builds each contains a counter for
all blocks
Program reads all i-nodes to find used blocks and
updates first table
Program examines free list/bit map to find not
used blocks and updates second table

59
Block Consistency Check
Block number

Missing block
Consistent
Duplicate data block
Duplicate block in free list
60
File Inconsistency Check

Can be done by
Using a table of counters per file.
Verifying directory system by traversing the
directory tree. It can be done by incrementing
the counter for each file based on the number of
time that file has been used in the directories
Comparing the number of file usage with the link
count (i.e., a number reported by i-node of that
file) shows the consistency/inconsistency

61
File System Performance

Access to disk is much slower than access to
memory. In memory reading a word takes 10 nsec
Solution Using block cache buffer in the memory
For each read request, cache is checked for
availability of the requested block

62
Caching

Cache references are less than paging so using
LRU for cache is feasible
Disadvantage of using LRU is a crash will leave
file system inconsistence

63
Buffer Cache Data Structure

64
Caching

Solution
The needed blocks such as i-node and directory
can put at the front (to be evicted faster)
instead of rear. It means they can be written on
disk more frequently. This reduces the chance of
inconsistency in file system.
Writing modified data blocks immediately. Sync in
UNIX and write-through cache in MS-DOS can do
that.

65
Block Read Ahead

It is the second technique for improving the file
performance
Reading ahead the blocks on each file read. Only
good for sequential file reads
Solution Keeping access pattern of file by using
a bit for that file. By setting that bit in each
sequential access and resetting in each random
access (i.e., seek is done) system can guess if
the file is in sequential or random access mode.

66
Reducing Arm Motion

Placing i-nodes in the middle of the disk instead
of start of the disk (see the next slide)
Cylinder grouping technique (i-nodes and related
files are in the same cylinder group)

68
Log-Structured File System

Log-structured (or journaling) file system
designed in Berkeley for UNIX to reduce disk seek
times for the write operations
In UNIX most of the write operations are small
writes

69
Log-Structured File System

LFS considers the entire disk as a log and by
buffering the writes in the memory, writes them
in a single segment at the end of log
periodically.
Each segment may contain i-nodes, directory entry
blocks and data blocks
The problem is i-nodes are scattered all over the
log instead of being in the fixed disk position

70
Log-Structured File System

Opening a file consists of using map to locate
the i-node for that file
LFS has a book keeping program named cleaner that
moves around the log and remove old segments

71
The Sun Network File System (NFS)

The implementation is part of the Solaris and
SunOS operating systems running on Sun
workstations using an unreliable datagram
protocol (UDP/IP protocol and Ethernet.
NFS is designed to operate in a heterogeneous
environment
In NFS clients access the server directories by
mounting them

72
Remote Mounting in NFS
73
Remote Mounting in NFS
74
Remote Mounting in NFS

Mount operation includes name of remote directory
to be mounted and name of server machine that is
storing it.
Mount request is mapped to corresponding RPC and
forwarded to mount server running on server
machine.
Export list specifies local file systems that
server exports for mounting, along with names of
machines that are permitted to mount them.

75
Remote Mounting in NFS

Following a mount request that conforms to its
export list, the server returns a file handlea
key for further accesses.
File handle a file-system identifier, and an
i-node number is used to identify the mounted
directory within the exported file system.

Write a Comment

User Comments (0)