Title: Chapter Eight : File Management
1Chapter Eight File Management
- The File Manager
- Interacting With File Manager
- File Organization
- Physical Storage Allocation
- Data Compression
- Access Methods
- Levels in File Management System
- Access Control Verification Module
- Fixed Length Contiguous
- Records Storage
- Non-contiguous
- Storage
- Variable Length
- Records
- Indexed
- Storage
-
- Sequential or Direct File Access
2The File Manager
- File Manager controls every file in system which
is a complex job. - Efficiency depends on
- how systems files are organized (sequential,
direct, or indexed sequential). - how theyre stored (contiguously,
noncontiguously, or indexed). - how each files records are structured
(fixed-length or variable-length). - how access to these files is controlled .
3Responsibilities of File Manager
- Track where each file is stored.
- Determine where and how files will be stored.
- Efficiently use available storage space.
- Provide efficient access to files.
- Allocate each file when a user has been cleared
for access to it, then record its use. - Deallocate file when it is returned to storage.
- Communicate its availability to others waiting
for it.
4Important Definitions
- Field -- group of related bytes that can be
identified by user with name, type, and size. - Record -- group of related fields.
- File (flat file) -- group of related records that
contains info used by specific application
programs to generate reports. - Database -- groups of related files that are
interconnected at various levels to give flexible
access to users. - Appears to File Manager to be a type of file.
5Definitions - 2
- Program files contain instructions.
- Data files contain data.
- Directories -- listings of file names and their
attributes. - Every program and data file accessed by computer
system, and every piece of computer software, is
treated as a file. - File Manager treats all files exactly same way as
far as storage is concerned.
6Interacting With File Manager
- Users communicates with File Manager via specific
commands that may be either embedded in users
program or submitted interactively by user. - Embedded commands
- OPEN CLOSE pertain to availability of file for
program invoking it. - READ WRITE are I/O commands.
- MODIFY specialized WRITE command for existing
data files that allows for appending/rewriting
records.
7Interactive Commands
- CREATE DELETE -- deal with systems knowledge
of file. - SAVE -- first time used, a file is actually
created. - OPEN NEW -- within a program indicates file must
be created. - OPENFOR OUTPUT -- creates file by making entry
for it in directory finding space for it in
secondary storage. - RENAME -- allows users to change name of existing
file. - COPY allows user to make duplicate copies of
existing files.
8Commands Are Device-Independent
- Interface commands designed to be as simple as
possible to use. - Lack detailed instructions to run device where
file is stored. - Device independent.
- To access a file, user doesnt need to know its
exact physical location on disk pack or storage
medium. - Each logical command broken down into sequence of
low-level signals that - Trigger step-by-step actions performed by device.
- Supervise progress of operation by testing
devices status.
9Typical Volume Configuration
- Each secondary storage unit (removable or
non-removable) is considered a volume. - Each volume can contain several files called
multifile volumes. - Some files are extremely large and are contained
in several volumes called multivolume files. - Generally, each volume in system is given name.
- File Manager writes name other descriptive info
on easy-to-access place on each unit.
10Master File Directory (MFD)
- MFD stored immediately after volume descriptor
- Lists names characteristics of every file
contained in volume. - File names refer to program files, data files,
and/or system files. - Subdirectories, if supported.
- Remainder of volume is used for file storage.
- Early OS supported only a single directory per
volume. - Created by File Manager.
- Contains names of files, usually organized in
alphabetical, spatial, or chronological order. - Simple to implement and maintain.
- Some major disadvantages
11Volume Descriptor
12Some Major Disadvantages of Single Directory Per
Volume
- Takes long time to search for an individual file,
especially if MFD was organized in an arbitrary
order. - If user has many small files stored in volume,
directory space fills before disk storage space
fills. User told disk full when only directory
full. - Users cant create subdirectories to group
related files. - Multiple users cant safeguard files from other
users browsing file lists cause entire directory
listed on request. - Each program in entire directory needs unique
name. - E.g., Only 1 person using directory can name
program PROG1.
13About Subdirectories
- Semi-sophisticated File Managers create MFD for
each volume with entries for files
subdirectories. - Subdirectory created when user opens account to
access computer. - MFD entry flagged to indicate subdirectory with
unique properties. - Improvement from single directory scheme.
- Still cant group files in a logical order to
improve accessibility efficiency of system.
14Subdirectories Can Be Implemented As an
Upside-down Tree
- Todays File Managers allow users to create
subdirectories so related files are grouped
together. - Extension of previous two-level directory
structure. - Tree structures allow system to efficiently
search individual directories due to fewer
entries in each. - Path to requested file may lead through several
directories. - When user wants to access specific file, file
name is sent to File Manager. File Manager
searches MFD for user's directory. Then searches
user's directory any subdirectories for
requested file location.
15File Descriptor
- Each file entry in every directory contains info
describing file - File nameusually represented in ASCII code.
- File typeorganization and usage that are
dependent on system (e.g., Files and
directories). - File sizesize is kept here for convenience.
- File locationidentification of first physical
block (or all blocks) where file is stored. - Date and time of creation.
- Owner.
- Protection informationaccess restrictions based
on who is allowed to access file and what type of
access is allowed. - Record size its fixed size or its maximum size,
depending on type of record
16File Names
- Absolute file name (complete file name) long
name that includes all path info. - Relative file name short name seen in
directory listings. - Selected by user when file is created.
- E.g., ACCOUNT ADDRESSES, TAXES 2001, or AUTOEXEC.
- Extension 2-3 character name used to identify
type of file or its contents. - Separated from relative name by a period.
- E.g., CPP, BAS, BAT, COB, EXE signal to system
to use specific compiler or program to run these
files. - E.g., TXT, DOC, OUT, MIC, KEY created by
applications or by users for own identification.
17File Naming Conventions
- Can vary in length from 1 or more characters.
- Can include letters of alphabet digits.
- Every OS has specific rules that affect length of
relative name types of characters allowed. - E.g., MS-DOS allows 1-8 alphanumeric character
names without spaces. - More modern OS allow names with dozens of
characters including spaces. - Try to select descriptive relative names that
readily identify file contents/purpose of file.
18Base and Current Directories Used by File Manager
to Locate Files
- File Manager selects base directory for user when
interactive session begins. - All file operations requested by that user start
here. - Then, user selects subdirectory (current
directory or working directory). - Thereafter, files presumed to be located in
current directory. - Whenever file accessed, user types in relative
name File Manager adds proper prefix. - As long as users refer to files in working
directory, can access them without entering
complete name.
19File Organization Record Format
- Fixed-length records easiest to access
directly. - Most common type ideal for data files.
- Record size critical (too small truncation too
large wastes space). - Variable-length records -- difficult to access
directly because hard to calculate exactly where
record is located. - Dont leave empty storage space dont truncate
any characters. - Frequently used in files accessed sequentially
(e.g,. text files, program files) or files using
index to access records. - File descriptor stores record format, how its
blocked, other related info.
20Physical File Organization
- Concerned with how records are arranged
characteristics of medium used to store it. - On magnetic disks, files can be organized as
- Sequential
- Direct
- Indexed sequential.
21Characteristics Considered When Selecting File
Organization
- Volatility of datafrequency with which additions
deletions made. - Activity of file records processed during a
given run. - Size of file.
- Response timeamount of time user is willing to
wait before requested operation is completed.
22Sequential Record Organization
- Easiest to implement because records are stored
retrieved serially, one after other. - To speed process some optimization features may
be built into system. - E.g., select a key field from record then sort
records by that field before storing them. - Aids search process.
- Complicates maintenance algorithms because
original order must be preserved every time
records added or deleted.
23Direct Record Organization (Random Organization)
- Uses direct access files which can be implemented
only on direct access storage devices. - Give users flexibility of accessing any record in
any order without having to begin search from
beginning of file. - Records are identified by their relative
addresses (their addresses relative to beginning
of file). - Logical addresses computed when records are
stored again when records are retrieved. - Use hashing algorithms.
24Advantages of Direct Access Organization
- Fast access to records.
- Can be accessed sequentially by starting at first
relative address incrementing it by one to get
to next record. - Can be updated more quickly than sequential files
because records quickly rewritten to original
addresses after modifications. - No need to preserve order of the records, so
adding or deleting them takes very little time.
25Collisions Are a Problem With Direct Access
Organization
- Several records with unique keys may generate
same logical address (collision). - Program generates another logical address before
presenting it to File Manager for storage. - Colliding records stored in overflow area via
links. - File Manager handles physical allocation of
space. - Maximum file size established when created
eventually file is full or too many records are
stored in overflow area. - Programmer must reorganize rewrite file.
26Indexed Sequential Record Organization
- Combines best of sequential direct access.
- Created maintained through Indexed Sequential
Access Method (ISAM) software package. - Doesnt create collisions because it doesnt use
result of hashing algorithm to generate a
records address. - Uses info to generate index file through which
records retrieved. - Divides ordered sequential file into blocks of
equal size. - Size determined by File Manager to take advantage
of physical storage devices to optimize
retrieval strategies. - Each entry in index file contains highest record
key physical location of data block where this
record, records with smaller keys, are stored.
27Indexed Sequential - 2
- To access any record in file, system begins by
searching index file then goes to physical
location indicated at that entry. - Overflow areas are spread throughout file
- Existing records can expand new records are in
close physical logical sequence. - Last-resort overflow area is located apart from
main data area but is used only when the other
overflow areas are completely filled. - When retrieval time becomes too slow, file has to
be reorganized.. - Allows both direct access to a few requested
records sequential access to many records for
most dynamic files. - A variation of indexed sequential files is
B-tree.
28Physical Storage Allocation
- File Manager must work with files not just as
whole units but also as logical units or records.
- Records within file must have same format but can
vary in length. - Records are subdivided into fields.
- Structure usually managed by application
programs, not OS. - When we talk about file storage, were actually
referring to record storage .
29- Unblocked, fixed-length records
- Blocked, fixed length records
- Unblocked, variable-length records
- Unblocked, variable-length records
- Blocked, variable-length records
30Contiguous Storage
- Records stored one after other.
- Any record can be found read once starting
address size are known, so directory is very
streamlined. - Direct access easy every part of file is stored
in same compact area. - Files cant be expanded unless theres empty
space available immediately following it. - Room for expansion must be provided when file is
created. - Fragmentation occurs (slivers of unused storage
space). - Can compact rearrange files.
- Files cant be accessed while compaction is
taking place.
31Noncontiguous Storage
- Allows files to use any storage space available
on disk. - Files records are stored in a contiguous manner
if enough empty space. - Any remaining records, all other additions to
file, are stored in other sections of disk
(extents). - Linked together with pointers.
- Physical size of each extent is determined by OS
(e.g., 256 bytes).
32Linking File Extents
- Linking at storage level each extent points to
next one in sequence. - Directory entry consists of file name, storage
location of first extent, location of last
extent, total number of extents, not counting
first. - Linking at directory level each extent listed
with its physical address, size, pointer to
next extent. - A null pointer indicates that it's last one.
- Eliminate external storage fragmentation need
for compaction. - Dont support direct access because no easy way
to determine exact location of specific record.
33Indexed Storage
- Allows direct record access by bringing pointers
linking every extent of that file into index
block. - Every file has its own index block (addresses of
each disk sector that make up the file) - Lists each entry in same order in which sectors
linked . - When a file is created, pointers in index block
set to null. - As each sector is filled, pointer set to
appropriate sector address. - Address is removed from empty space list copied
into its position in index block.
34Indexed Storage - 2
- Supports both sequential direct access.
- Doesnt necessarily improve use of storage space
because each file must have index block. - For larger files with more entries, several
levels of indexes can be generated. - To find a desired record, File Manager accesses
first index (highest level), which points to a
second index (lower level), which points to an
even lower level index eventually to data
record.
35Data Compression
- Several techniques (3) used to save space in
files. - System must be able to distinguish between
compressed uncompressed data. - Trade-off storage space gained, but processing
time lost. - Records with repeated characters can be
abbreviated. - E.g., fixed-length field with short name many
blank characters replaced with variable-length
field special code to indicate blanks
truncated. - ADAMSbbbbbbbbbb ? ADAMSb10
- Â 300000000 ? 38
36Data Compression Repeated Terms
- Repeated terms compressed by using symbols to
represent each of most commonly used words in the
database. - E.g., in a universitys student database common
words like student, course, teacher, classroom,
grade, department could each be represented
with single character.
37Data Compression Front-end Compression
- 3. Front-end compression used for index
compression. - For example, student database where the students
names are kept in alphabetical order could be
compressed
38Access Methods
- Access methods dictated by a files organization
- Most flexibility is allowed with indexed
sequential files and least with sequential. - File organized in sequential fashion can support
only sequential access to its records, these
records can be of fixed or variable length. - File Manager uses the address of last byte read
to access the next sequential record. - Current byte address (CBA) must be updated every
time a record is accessed.
39Sequential Access
- For sequential access of fixed-length records,
CBA updated by incrementing it by record length
(RL), which is constant - CBA CBA RL
- For sequential access of variable-length records,
File Manager adds length of record (RLk) plus
number of bytes used to hold record length (N) to
CBA. - CBA CBA N RLk
40Direct Access Fixed-Length Records
- If file is organized in direct fashion, accessed
easily in direct or sequential order if have
fixed-length records. - For direct access with fixed length records, CBA
computed directly from record length desired
record number RN (info provided through READ
command) minus one - CBA(RN1) RL
-
41Direct Access Variable-Length Records
- Virtually impossible to access a record directly
because address of desired record cant be easily
computed. - To access a record, File Manager must do
sequential search through records. - If File Manager saves address of last record
accessed, can do half-sequential read through
file. When next request arrives it could search
forward from CBA. - Or File Manager can keep table of record numbers
their CBAs. Search table for exact storage
location of desired record. - To avoid this problem, many systems force users
to have files organized for fixed-length records
if want direct access to records.
42Access of Records in Indexed Sequential File
- Accessed either sequentially or directly,
- Either CBA computations apply but with one extra
step. - Index file must be searched for pointer to block
where data stored. - Because index file is smaller, kept in main
memory quick search to locate block where
desired record is located. - Block retrieved from secondary storage
beginning byte address of record calculated. - In systems with several levels of indexing, index
at each level must be searched before computing
CBA. - Entry point to this type of data file is usually
through index file.
43Levels in a File Management System
- Efficient management of files cant be separated
from efficient management of devices that house
them. - A wide range of functions must be organized for
I/O system to perform efficiently. - Each level implemented by using structured
modular programming techniques, which also set up
a hierarchy.
- Basic File System
- Access Control Module
- Logical File System
- Physical File System
- Device Interface Module
- Device
44Basic File System
- Highest level module that passes info to logical
file system, which notifies physical file system,
which works with Device Manager. - Activates access control verification module to
verify that this user is permitted to perform
this operation with this file.
45Access Control Verification Module
- Any file can be shared.
- Saves space allows for synchronization of data
updates. - Improves efficiency of system's resources,
because if files are shared in main memory, I/O
operations reduced. - However, integrity of each file must be
safeguarded - Control over who is allowed to access file and
what type of access is permitted. - READ only, WRITE only, EXECUTE only, DELETE only,
or some combination.
46File Access Control Methods
- Each file management system has own file access
control method. - Access control matrix
- Access control lists Most
- Capability lists Common Methods
- Lockword control.
47Access Control Matrix
- Intuitively appealing easy to implement.
- Works well only for systems with few files few
users. - In matrix each column identifies a user each
row identifies a file. - Intersection of row column has access rights
for that user to that file.
48Access Control Lists
- Modification of access control matrix technique.
- Each file is entered in list contains names of
users allowed to access it type of access
permitted. - To shorten list, only those who may use file are
named those denied any access are grouped under
global heading such as WORLD. - Or shorten by putting every user into a category
- SYSTEM system personnel with unlimited access
to all files. - OWNER absolute control over all files created
in own account. - GROUP all users belonging to appropriate group
have access. - WORLD all other users in system default access
types given by File Manager.
49Access Control List Example
50Capability Lists
- Lists every user and files to which each has
access. - Requires less storage space than an access
control matrix. - Easier to maintain than an access control list
when users are added or deleted from system.
51Lockword Control
- Lockword is similar to a password but protects a
single file. - When file created, owner protects it via lockword
- Stored in directory but isnt revealed with
directory listing. - User must provide correct lockword to access
protected file. - Require smallest amount of storage for file
protection. - Can be guessed by hackers or passed on to
unauthorized users. - Generally doesnt control type of access to file.
- Anyone who knows lockword can read, write,
execute, or delete file.
52Terminology
- access control list
- access control matrix
- capability list
- complete file name
- current byte address (CBA)
- current directory
- data compression
- data file
- database
- device independent
- direct access files
- direct record organization
- directory
- extension
- extents
- file
- file descriptor
- fixed-length record
- hashing algorithm
- indexed sequential record organization
- key field
- lockword
- logical address
53Terminology - 2
- logical address
- master file directory (MFD)
- relative address
- relative file name
- sequential record organization
- subdirectory
- variable-length record
- volume
- working directory