CT213

About This Presentation

Title:

CT213

Description:

CT213 File Management Petronel Bigioi Content File Management Overview File Management Functional Requirements File Management Architecture File Organization ... – PowerPoint PPT presentation

Number of Views:27

Avg rating:3.0/5.0

Slides: 44

Provided by: ww2ItNui

Category:

more less

Transcript and Presenter's Notes

Title: CT213

1
CT213 File Management

Petronel Bigioi

2
Content

File Management Overview
File Management Functional Requirements
File Management Architecture
File Organization pile, sequential, indexed
sequential, indexed and hashed
File Directory
Content and structure
File Sharing
Access rights and simultaneous access
Record blocking
Secondary Storage management
File Allocation and Free Space Management

3
File Management Overview

Filing systems provide
A storage service clients dont know about the
physical characteristics of the disks or where
the files have been stored on them
The filing systems must make sure that a file is
not lost, even if there are hardware failures or
software crashes
A directory service clients can give convenient
text names to files and group them in directories
(establish some relationship between them)
Clients should be able to access the sharing of
their files with other by specifying who can
access a given file and in what way

4
File Management Overview

(1) client calls an operation such as open-file
with the text name as an argument
Another argument could be a field specifying the
type of access (read-only or read-write)
The directory service will carry out an access
check to ensure that the client is authorized to
access the file
The directory service will translate the text
name into a form that enables the file storage
service to locate the file on disk (name
resolution)
In order to do the name resolution, the directory
service may call the file storage service, the
file storage service may call the disk handler to
access the disk and find the required information

(2) The filing systems stores a very large
numbers of files, only a few being used at any
time.
The filing system is ready for the client to use
the file (It will have set up the information
about this file in its tables in main memory)
It will return a user-file-identifier (UFID) for
the client to use in subsequent requests to read
or write the file
(3) Request to read the file
(4) The storage service returns the portion of
the file that was requested at (3)

5
File Management Overview

Concurrency control must be provided
One approach is to allow multiple clients read
control, but only one client the right to write
An operating system will deal with such approach
by noting if a file system has been opened for
reading access, in which case it will allow
multiple clients or for writing, in which case it
would refuse any subsequent requests to read or
write from other clients this is called
mandatory concurrency control. A file is said to
be locked for reading or writing.
Many applications may need to have write access
to different parts of the same file (the above
approach is inflexible in this case)
Many operating systems allow simultaneous write
access to files, being the application job to
make sure it will sync the access
Eventually the OS will provide some extra locking
service to help the clients to cooperate with
each other.

6
Files Common Terms

Field basic element of data, containing a
single value (i.e. an employees last name)
Characterized by its length and data-type (i.e.
ASCII string, decimal, etc..)
Depending on file design, it may be fixed or
variable length
Record a collection of related fields that can
be treated as single unit
The record of an employee can contain fields such
as name, social security number, etc..
Up to the design of the file system it may be
fixed or variable in length

7
Files Common terms

File is a collection of similar records
The file is treated as a single entity by users
and applications and may be referenced by name
Files have unique file names and may be created
and deleted
Access control restrictions usually apply at the
file level
Database a collection of related data
The essential aspect is the relations that exists
between elements of data are explicit
The data base itself consists of one or more
types of files
Usually there is a separate database management
system that is independent of the file system
management and the operating system

8
File Management Requirements

To meet the data management needs and
requirements from the user
Create, delete, read and change files controlled
access to other users files control the type of
access to own files restructure the files move
data between files backup and recovery files in
case of damage file access by using symbolic
file names
To guarantee that the data in the file is valid
To optimize performance
To provide I/O support for a wide range of
storage device types
To provide a standardized set of I/O interface
routines
To provide I/O support for multiple users, in the
case of multiple-user systems

9
File System Architecture

The Basic I/O Supervisor is responsible for all
file I/O initiation and termination
Maintains control structures that deal with
device I/O scheduling and file status
Is concerned with the selection of the device on
which file I/O is to be performed, on the basis
of which file has been selected
Is also concerned with scheduling disk and tape
access to optimize performance
I/O buffers and secondary memory is allocated at
this level

Logical I/O enables users and applications to
access records. Thus, whereas the basic file
system deals with blocks of data, the logical I/O
module deals with file records.
Logical I/O provides a general purpose record I/O
capability and maintains basic data about files
Access method is the closest level of the file
system to the user
Provides a standard interface between
applications and file system
Different access methods pile, sequential,
indeed sequential, indexed, hashed

At the lowest level we have device drivers, that
do communicate directly with the peripheral
device or their controllers or channels
A device driver is responsible for staring I/O
operations on a device and processing the
completion of an I/O request
Device drivers are considered to be part of the
operating system, and in the case of file system,
the devices controlled are disks and tapes

Basic File System deals with blocks of data that
are exchanged with the disk or tape systems.
It is concerned with the placement of those
blocks on the secondary storage device and on the
buffering of those blocks in main memory
It doesnt understand the content of the data
structure of the files involved

10
File Organization

Criteria
Rapid access
Ease of update
Economy of storage
Simple maintenance
Reliability
These criteria may vary in importance
CD-ROM Ease of update irrelevant
Indexes Faster but use more storage
We will outline five common organizations (the
actual number that have been implemented or
proposed is unimaginably large)
Pile
Sequential file
Indexed sequential file
Indexed file
Direct (Hashed) file

11
File Organization

Pile
Add data to the file as it arrives (chronological
order)
Record size and field order may vary (variable
length records, variable set of fields)
Requires use of exhaustive search
Sequential File
Fixed length record format
Size and order of fields fixed
Key field - unique record ID
Records stored in order based on key
Handles random requests poorly
Must use sequential search (batch system)
Hard to insert new records

12
File Organization

Indexed Sequential File
Still maintains the organization of records in
sequence, based on a key
Adds an index to the file to speed lookup
Index provides a lookup capability to reach
quickly the vicinity of a desired record
The index file contains two records the key and
a pointer to the main file
May have multiple levels of indexes
Overflow area to handle new records
Each record in the main file contains a hidden
pointer to the overflow file (used if needed)
Link from main records to overflow, and back
Operations
Search To find a specific field, a search begins
in the index file. The highest key value that is
less than or equal to the desired key record is
looked up in the index file. A pointer to the
main file is retrieved and the search continues
in the main file.
Additions Each record in the main file contains
an additional field (not visible to the
application) that is a pointer to the overflow
file When a new record is to be inserted, it is
actually added to the overflow file. The record
in the main file, that immediately precedes the
new record in logical sequence is updated to
contain a pointer to the new record (in the
overflow file).

13
File Organization

Indexed File
Useful when is necessary to search for a record
on the basis of some other attribute than the key
field
May have multiple indexes
One for each field we may search
Records accessed only through the indexes
Each index may be
Exhaustive contains one entry for every record
in the main file
Partial contains entries only for records where
the field of interest exists
Used in applications where time is critical and
where data is rarely processed exhaustively (such
as reservation systems, inventory controls, etc)
Direct (Hashed) file
Use hashing on a key to find the record
No notion of sequential access
Generally used when rapid access to one record is
required (directory)

14
Hashing

It can find most of the items with a single seek
Insertions and deletions can be handled without
added complexity
Assuming that a number of N items are to be
inserted into a hash table of length M, with MgtN
Insert an item into the hash table
Convert the label of the item to near random
number n (between 0 and M-1) (i.e. if the label
is numeric, then a popular mapping function is to
divide the label by M and take the reminder as
the value of n
Use n as index into the hash table
If the entry is empty, then store the item
If the entry is occupied, then store the item
according to the hashing criteria (linear or
overflow with chaining)
Table lookup of an item whose label is known
Convert the label of the item to a near random
number n (using same mapping function as for
insertion)
Use n as index into the hash table
If the corresponding entry is empty, then the
item hasnt been inserted
If the corresponding entry is occupied, and the
labels match, then retrieve the value
If the corresponding entry is occupied and the
labels are not matching, then continue the search
according to the hashing criteria ( linear or
overflow with chaining)

15
Linear Hashing
Labels of the items to be stored are numeric and
the hash table has eight positions (M8). The
hashing function takes the reminder upon division
by 8

In the linear hashing schema, if the entry is
already occupied, set n(n1)(mod M) and try
again. Perform this step until we will find an
empty entry
The figure assumes that the entries have been
inserted in the ascending order
Item 50 and 51 maps in positions 2 and 3
Item 74 maps in position 2, position 2 is taken,
so we try position 3 (taken). Next is position 4
we need to try and is empty, so we write it on
position 4
The average search is not depending on the table
size, is dependent of how full the table is (at
80, we are getting an average for the search
around 3)

16
Hashing using Overflow with Chaining

A separate table in which overflow entries are
inserted is kept. This table includes pointers
passing down the chain of entries associated with
any positions in the hash table.
For large values of N and M, for NM, the average
search is around 1.5
This method provides for compact storage with
fast lookup.

17
File Directory

A file directory is a structure associated with
any file management system and collection of
files
It contains information about the files,
including attributes, location and ownership.
Most of this information (especially the one
concerned with the storage is handled by the
operating system)
The directory itself is a file, owned by the
operating system and accessible by various file
management routines.
Some of the information in directories is
available to users through system routines
The users cannot directly access the directory
even in read-only mode

18
Typical Directory Entries

Basic
Name Unique in directory (some systems permit
file versions)
Type Text, binary, load module, etc.
Organization Sequential, indexed, etc.
Address
Device Which disk holds the file
Often this must be the same device as the
directory is on
Starting address/Blocks used
Block , cylinder , or other location id
Size used Current file size
May be in bytes or blocks
Size allocated Maximum space allocated for this
file
Not used on all file systems

19
Typical Directory Entries

Access Control
Owner Who has control of the file
Access Information What users are allowed to
work with the file
Permitted Actions Controls reading, writing,
etc.
Usage Information
Date Created
Identity of Creator
Date Last Read Access
Identity of Last Reader
Date Last Modified
Identity of Last Modifier
Date of Last Backup
Current Usage Who has the file open, is the
file locked, are there updates waiting in main
memory?

20
Directory Structure

Operations to support
Search for the file entry (open)
Create a new file
Delete a file
List the files in the directory
May be for all or part of the directory
May also include attribute information
Simplest form
A list of directory entries, one for each file
(CP/M, DOS 1.0)
Difficult to handle large numbers of files or
multiple users
The directory would be very large and held on the
disk (looking up a given filename in it would
take the directory service a long time)
Different users might use the same text names for
their files. Unique text names would be achieved
by appending the username to each filename.
Some support for organizing the information is
desirable. Convenient grouping within users
files should be supported for easy location and
access control
More complex form
One directory for each user
Easier to manage access information
Users still cant structure files

21
Directory Structure

Tree-structured file system
Single master (root) directory
DOS Master directory for each drive
Each directory may contain files and other
subdirectories
Names only unique in directory
Each directory often stored as a sequential file
Less effective when there are a large number of
files in a given directory

22
Directory Tree Structure
23
Directory Tree Example

Path - following set of directories from master
directory to file
Example /UserB/Word/UnitA/ABC
/ often used to separate directories
Working directory
Current directory for files /UserB/Word
Files in this directory unless path given

24
File Sharing

Rights that may be granted
None Others dont know it exists
Often done by preventing user from reading the
parent directory (Unix)
May have an explicit permission bit for access to
the file name (Novell)
Knowledge Know it is there and who the owner is
Execution Able to run a program
Read Look at/copy contents
Execute and Read may be independent
Append Add data to the file
Cannot modify existing contents
Update Modify/delete/add data
Change Protection Grant rights to file
Owner can specify what other users have rights to
this file
Deletion Can delete file

25
File Sharing

Who to grant rights to
Specific user
May allow different users to have distinct
permissions
Group of users
World (public files)
Simultaneous Access
Multiple users may want to access or modify the
same file
Example Airline reservation database
Locking Entire file vs. Records
Easier to lock entire file
Locking records allows more concurrency
Instance of reader/writer problem
Must address mutual exclusion and deadlock

26
Record Blocking

A record is the logical access unit of a file
Blocks are unit of I/O with secondary storage.
For I/O to be performed, records must be
organized as blocks.
Issues to consider
Should be blocks be fixed or variable length
On most systems blocks are fixed length
Simplifies I/O, buffer allocation in main memory
and organization of blocks on secondary storage
What should the relative size of a block be
compared to the average record size
The larger the block the more records that are
passed in one I/O operation
If a file is being processed or searched
sequentially, than this is an advantage
If records are being accessed randomly, it will
result in unnecessary transfer of unused records,
than this is a disadvantage
Three methods of blocking
Fixed blocking, variable-length spanned blocking
and variable-length un-spanned blocking

27
Fixed Blocking

Fixed length records are used and an integral
number of records are stored in a block
There may be unused space at the and of each
block (internal fragmentation)

28
Variable Length Spanned Blocking

Variable length records are used and are packed
into blocks with no unused space
Two records may span across two blocks with the
continuation indicated by a pointer to the
successor block
Wastes space only at the end of the file

29
Variable Length Unspanned Blocking

Variable length records are used, but spanning is
not employed.
There is wasted space inmost blocks because of
the inability to use the remainder of a block if
the next record is larger than the remaining
unused space

30
Secondary Storage Management

On secondary storage a file is a collection of
blocks the operating system or file management
system is responsible for allocating blocks to
files
Two management issues
Space on secondary storage must be allocated to
files
Keep track of the space available for allocation
The approach taken for file allocation may
influence the approach taken for available space
management

31
File Allocation

Issues in file allocation
When a new file is created, do we specify the
maximum size? Is that space allocated at once?
Space is allocated to a file as one or more
contiguous units (portions). How big of a unit
should we use when allocating space for a file?
How do we keep track of what space has been
allocated to a given file (what kind of structure
or table is used to keep track for a unit
allocated to a file)?
Pre-Allocation
Declare max size in advance
May be hard to guess space needed
Tendency to overestimate space needed
Ok if the file will never change
Dynamic allocation
Get space as the file needs it
Files are often no longer contiguous

32
File Allocation

Portion (unit) size
At one extreme, a single unit large enough to
hold the entire file, while at the other extreme
space on disk is allocated one block at a time.
In choosing the unit allocation size, there is a
tradeoff between efficiency from the point of
view of a single file versus overall system
efficiency
Few items to be considered
Having lots of small units requires more space
for allocation tables
Fixed-size portions simplifies the reallocation
of space
Variable-sized units or small fixed-size units
reduces wasted space

33
File Allocation

Two common alternatives
Variable-sized large contiguous portions
Minimizes waste, allocation overhead
Have to deal with fragmentation
First-Fit choose the first unused contiguous
group of blocks of sufficient size from a free
block list.
Best-Fit choose the smallest unused group that
is of sufficient size
Nearest-Fit allocation choose the unused group
of sufficient size that is closest to the
previous allocation for the file to increase
locality
Blocks Small fixed-size portions
May require large tables or complex structures
for their allocation
Abandons contiguity
Allocate blocks as needed
Either strategy is compatible with pre-allocation
and dynamic allocation. Not clear which strategy
is best.

34
File Allocation Methods

Three methods are in common use
Contiguous allocation
Single contiguous set of blocks is allocated to a
file at the time of file creation
Chained allocation
Each block contains a pointer to the next block
in the chain
Indexed allocation
The file allocation table contains a separate one
level index for each file the index has one
entry for each allocated portion (unit) to the
file.

35
Contiguous allocation

A single contiguous set of blocks assigned to a
file when it is created
Pre-allocation strategy with variable-sized
portions (units)
Good performance (especially for sequential
files)
External fragmentation tends to occur
Use compaction to combine free space
Need to specify the size of the file at the time
of creation
Used by CD-ROMs (ISO 9660)
Before compaction (left)
After compaction (right)

36
Chained Allocation

Allocate on the basis of individual blocks
Directory only links to the first block
Each block points to the next block
Easy to add blocks to a file
No external fragmentation
MSDOS FAT12/16/32 is a variation
Best suited to sequential files
File B, with start1and len5
No accommodation for locality
If necessary to brig in several blocks of a file
at a time, then series of access to different
parts of the disk is necessary
To overcome this problem, files are
consolidated by some systems

37
Indexed Allocation

File allocation table contains one level index
for each file the index has just one entry for
portion (unit) allocated to the file
Typically the file indexes are not stored as part
of the allocation table
The file index for a file is kept in a separate
block and the entry for the file allocation table
points to that block
Supports both sequential and random access to a
file
In the figure above, a fixed size blocks
allocation schema is presented
Eliminates external fragmentation

38
Indexed Allocation

Indexed allocation supports also a variable size
portions (units) allocation schema
Improves locality
It is the most popular form of file allocation
In both cases, from time to time consolidation
may be done
It reduces the size of the index for the variable
sized portions schema

39
Free Space Management

Same as managing the space allocated to files,
the free space that is not currently allocated
needs to be managed
In order to be able to perform file allocation,
we need to know what blocks on the disk are
available, therefore we need to keep a disk
allocation table, in addition to file allocation
table
A number to methods to record free space
Bit Tables Free bit for each block
Works well with any of the presented allocation
methods
It is as small as possible
Still, it can be large. The amount (in bits)
required for a block bitmap is as follows disk
size (Bytes)/ (8 file system block
size(BYTES)). For a 16GBytes disk, we would get
a 4MBytes large table it is large to hold in
memory and also large to search
To speed up the search in the bit tables, the OS
may divide disk into sections
Additional data structures must be kept to
summarize the status of each section (i.e. the
number of free blocks and the maximum sized
contiguous number of free blocks)

40
Bit Table Example

Example bit vector for the above figure
00111000011111000011111111111011000

41
Free Space Management

A number to methods to record free space
Chained Free List
Free portions may be chained together by using a
pointer and length value in each free portion
It can produce fragmentation of the disk and many
portions (units) will be a single block long. In
this situation, every time a block gets
allocated, it needs to be first read, to find out
the pointer to the next free block. If this is
done for a file creation (for multiple blocks) it
can slow down the operation. Similarly, deleting
highly fragmented files, it is time consuming
Indexing
Free space is treated as a file, store list of
blocks in the same manner as ordinary files
For efficiency, the index should be on the basis
of variable-size portions rather than blocks
One entry in the file for each free portion on
the disk
Provides efficient support for all known file
allocation methods
Free Block List
Each block is assigned a number sequentially and
the list of the numbers of all free blocks is
maintained in a reserved portion of the disk
Can treat it as a stack and re-allocate recently
freed blocks only the last few blocks need to
be kept in memory
Can use FIFO structure (a block is allocated from
the head of the FIFO, and de-allocated by adding
it to the tail of the queue)
May have a background process that works to
facilitate contiguous allocation