CT213 - PowerPoint PPT Presentation

About This Presentation
Title:

CT213

Description:

CT213 File Management Petronel Bigioi Content File Management Overview File Management Functional Requirements File Management Architecture File Organization ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 44
Provided by: ww2ItNui
Category:

less

Transcript and Presenter's Notes

Title: CT213


1
CT213 File Management
  • Petronel Bigioi

2
Content
  • File Management Overview
  • File Management Functional Requirements
  • File Management Architecture
  • File Organization pile, sequential, indexed
    sequential, indexed and hashed
  • File Directory
  • Content and structure
  • File Sharing
  • Access rights and simultaneous access
  • Record blocking
  • Secondary Storage management
  • File Allocation and Free Space Management

3
File Management Overview
  • Filing systems provide
  • A storage service clients dont know about the
    physical characteristics of the disks or where
    the files have been stored on them
  • The filing systems must make sure that a file is
    not lost, even if there are hardware failures or
    software crashes
  • A directory service clients can give convenient
    text names to files and group them in directories
    (establish some relationship between them)
  • Clients should be able to access the sharing of
    their files with other by specifying who can
    access a given file and in what way

4
File Management Overview
  • (1) client calls an operation such as open-file
    with the text name as an argument
  • Another argument could be a field specifying the
    type of access (read-only or read-write)
  • The directory service will carry out an access
    check to ensure that the client is authorized to
    access the file
  • The directory service will translate the text
    name into a form that enables the file storage
    service to locate the file on disk (name
    resolution)
  • In order to do the name resolution, the directory
    service may call the file storage service, the
    file storage service may call the disk handler to
    access the disk and find the required information
  • (2) The filing systems stores a very large
    numbers of files, only a few being used at any
    time.
  • The filing system is ready for the client to use
    the file (It will have set up the information
    about this file in its tables in main memory)
  • It will return a user-file-identifier (UFID) for
    the client to use in subsequent requests to read
    or write the file
  • (3) Request to read the file
  • (4) The storage service returns the portion of
    the file that was requested at (3)

5
File Management Overview
  • Concurrency control must be provided
  • One approach is to allow multiple clients read
    control, but only one client the right to write
  • An operating system will deal with such approach
    by noting if a file system has been opened for
    reading access, in which case it will allow
    multiple clients or for writing, in which case it
    would refuse any subsequent requests to read or
    write from other clients this is called
    mandatory concurrency control. A file is said to
    be locked for reading or writing.
  • Many applications may need to have write access
    to different parts of the same file (the above
    approach is inflexible in this case)
  • Many operating systems allow simultaneous write
    access to files, being the application job to
    make sure it will sync the access
  • Eventually the OS will provide some extra locking
    service to help the clients to cooperate with
    each other.

6
Files Common Terms
  • Field basic element of data, containing a
    single value (i.e. an employees last name)
  • Characterized by its length and data-type (i.e.
    ASCII string, decimal, etc..)
  • Depending on file design, it may be fixed or
    variable length
  • Record a collection of related fields that can
    be treated as single unit
  • The record of an employee can contain fields such
    as name, social security number, etc..
  • Up to the design of the file system it may be
    fixed or variable in length

7
Files Common terms
  • File is a collection of similar records
  • The file is treated as a single entity by users
    and applications and may be referenced by name
  • Files have unique file names and may be created
    and deleted
  • Access control restrictions usually apply at the
    file level
  • Database a collection of related data
  • The essential aspect is the relations that exists
    between elements of data are explicit
  • The data base itself consists of one or more
    types of files
  • Usually there is a separate database management
    system that is independent of the file system
    management and the operating system

8
File Management Requirements
  • To meet the data management needs and
    requirements from the user
  • Create, delete, read and change files controlled
    access to other users files control the type of
    access to own files restructure the files move
    data between files backup and recovery files in
    case of damage file access by using symbolic
    file names
  • To guarantee that the data in the file is valid
  • To optimize performance
  • To provide I/O support for a wide range of
    storage device types
  • To provide a standardized set of I/O interface
    routines
  • To provide I/O support for multiple users, in the
    case of multiple-user systems

9
File System Architecture
  • The Basic I/O Supervisor is responsible for all
    file I/O initiation and termination
  • Maintains control structures that deal with
    device I/O scheduling and file status
  • Is concerned with the selection of the device on
    which file I/O is to be performed, on the basis
    of which file has been selected
  • Is also concerned with scheduling disk and tape
    access to optimize performance
  • I/O buffers and secondary memory is allocated at
    this level
  • Logical I/O enables users and applications to
    access records. Thus, whereas the basic file
    system deals with blocks of data, the logical I/O
    module deals with file records.
  • Logical I/O provides a general purpose record I/O
    capability and maintains basic data about files
  • Access method is the closest level of the file
    system to the user
  • Provides a standard interface between
    applications and file system
  • Different access methods pile, sequential,
    indeed sequential, indexed, hashed
  • At the lowest level we have device drivers, that
    do communicate directly with the peripheral
    device or their controllers or channels
  • A device driver is responsible for staring I/O
    operations on a device and processing the
    completion of an I/O request
  • Device drivers are considered to be part of the
    operating system, and in the case of file system,
    the devices controlled are disks and tapes
  • Basic File System deals with blocks of data that
    are exchanged with the disk or tape systems.
  • It is concerned with the placement of those
    blocks on the secondary storage device and on the
    buffering of those blocks in main memory
  • It doesnt understand the content of the data
    structure of the files involved

10
File Organization
  • Criteria
  • Rapid access
  • Ease of update
  • Economy of storage
  • Simple maintenance
  • Reliability
  • These criteria may vary in importance
  • CD-ROM Ease of update irrelevant
  • Indexes Faster but use more storage
  • We will outline five common organizations (the
    actual number that have been implemented or
    proposed is unimaginably large)
  • Pile
  • Sequential file
  • Indexed sequential file
  • Indexed file
  • Direct (Hashed) file

11
File Organization
  • Pile
  • Add data to the file as it arrives (chronological
    order)
  • Record size and field order may vary (variable
    length records, variable set of fields)
  • Requires use of exhaustive search
  • Sequential File
  • Fixed length record format
  • Size and order of fields fixed
  • Key field - unique record ID
  • Records stored in order based on key
  • Handles random requests poorly
  • Must use sequential search (batch system)
  • Hard to insert new records

12
File Organization
  • Indexed Sequential File
  • Still maintains the organization of records in
    sequence, based on a key
  • Adds an index to the file to speed lookup
  • Index provides a lookup capability to reach
    quickly the vicinity of a desired record
  • The index file contains two records the key and
    a pointer to the main file
  • May have multiple levels of indexes
  • Overflow area to handle new records
  • Each record in the main file contains a hidden
    pointer to the overflow file (used if needed)
  • Link from main records to overflow, and back
  • Operations
  • Search To find a specific field, a search begins
    in the index file. The highest key value that is
    less than or equal to the desired key record is
    looked up in the index file. A pointer to the
    main file is retrieved and the search continues
    in the main file.
  • Additions Each record in the main file contains
    an additional field (not visible to the
    application) that is a pointer to the overflow
    file When a new record is to be inserted, it is
    actually added to the overflow file. The record
    in the main file, that immediately precedes the
    new record in logical sequence is updated to
    contain a pointer to the new record (in the
    overflow file).

13
File Organization
  • Indexed File
  • Useful when is necessary to search for a record
    on the basis of some other attribute than the key
    field
  • May have multiple indexes
  • One for each field we may search
  • Records accessed only through the indexes
  • Each index may be
  • Exhaustive contains one entry for every record
    in the main file
  • Partial contains entries only for records where
    the field of interest exists
  • Used in applications where time is critical and
    where data is rarely processed exhaustively (such
    as reservation systems, inventory controls, etc)
  • Direct (Hashed) file
  • Use hashing on a key to find the record
  • No notion of sequential access
  • Generally used when rapid access to one record is
    required (directory)

14
Hashing
  • It can find most of the items with a single seek
  • Insertions and deletions can be handled without
    added complexity
  • Assuming that a number of N items are to be
    inserted into a hash table of length M, with MgtN
  • Insert an item into the hash table
  • Convert the label of the item to near random
    number n (between 0 and M-1) (i.e. if the label
    is numeric, then a popular mapping function is to
    divide the label by M and take the reminder as
    the value of n
  • Use n as index into the hash table
  • If the entry is empty, then store the item
  • If the entry is occupied, then store the item
    according to the hashing criteria (linear or
    overflow with chaining)
  • Table lookup of an item whose label is known
  • Convert the label of the item to a near random
    number n (using same mapping function as for
    insertion)
  • Use n as index into the hash table
  • If the corresponding entry is empty, then the
    item hasnt been inserted
  • If the corresponding entry is occupied, and the
    labels match, then retrieve the value
  • If the corresponding entry is occupied and the
    labels are not matching, then continue the search
    according to the hashing criteria ( linear or
    overflow with chaining)

15
Linear Hashing
Labels of the items to be stored are numeric and
the hash table has eight positions (M8). The
hashing function takes the reminder upon division
by 8
  • In the linear hashing schema, if the entry is
    already occupied, set n(n1)(mod M) and try
    again. Perform this step until we will find an
    empty entry
  • The figure assumes that the entries have been
    inserted in the ascending order
  • Item 50 and 51 maps in positions 2 and 3
  • Item 74 maps in position 2, position 2 is taken,
    so we try position 3 (taken). Next is position 4
    we need to try and is empty, so we write it on
    position 4
  • The average search is not depending on the table
    size, is dependent of how full the table is (at
    80, we are getting an average for the search
    around 3)

16
Hashing using Overflow with Chaining
  • A separate table in which overflow entries are
    inserted is kept. This table includes pointers
    passing down the chain of entries associated with
    any positions in the hash table.
  • For large values of N and M, for NM, the average
    search is around 1.5
  • This method provides for compact storage with
    fast lookup.

17
File Directory
  • A file directory is a structure associated with
    any file management system and collection of
    files
  • It contains information about the files,
    including attributes, location and ownership.
    Most of this information (especially the one
    concerned with the storage is handled by the
    operating system)
  • The directory itself is a file, owned by the
    operating system and accessible by various file
    management routines.
  • Some of the information in directories is
    available to users through system routines
  • The users cannot directly access the directory
    even in read-only mode

18
Typical Directory Entries
  • Basic
  • Name Unique in directory (some systems permit
    file versions)
  • Type Text, binary, load module, etc.
  • Organization Sequential, indexed, etc.
  • Address
  • Device Which disk holds the file
  • Often this must be the same device as the
    directory is on
  • Starting address/Blocks used
  • Block , cylinder , or other location id
  • Size used Current file size
  • May be in bytes or blocks
  • Size allocated Maximum space allocated for this
    file
  • Not used on all file systems

19
Typical Directory Entries
  • Access Control
  • Owner Who has control of the file
  • Access Information What users are allowed to
    work with the file
  • Permitted Actions Controls reading, writing,
    etc.
  • Usage Information
  • Date Created
  • Identity of Creator
  • Date Last Read Access
  • Identity of Last Reader
  • Date Last Modified
  • Identity of Last Modifier
  • Date of Last Backup
  • Current Usage Who has the file open, is the
    file locked, are there updates waiting in main
    memory?

20
Directory Structure
  • Operations to support
  • Search for the file entry (open)
  • Create a new file
  • Delete a file
  • List the files in the directory
  • May be for all or part of the directory
  • May also include attribute information
  • Simplest form
  • A list of directory entries, one for each file
    (CP/M, DOS 1.0)
  • Difficult to handle large numbers of files or
    multiple users
  • The directory would be very large and held on the
    disk (looking up a given filename in it would
    take the directory service a long time)
  • Different users might use the same text names for
    their files. Unique text names would be achieved
    by appending the username to each filename.
  • Some support for organizing the information is
    desirable. Convenient grouping within users
    files should be supported for easy location and
    access control
  • More complex form
  • One directory for each user
  • Easier to manage access information
  • Users still cant structure files

21
Directory Structure
  • Tree-structured file system
  • Single master (root) directory
  • DOS Master directory for each drive
  • Each directory may contain files and other
    subdirectories
  • Names only unique in directory
  • Each directory often stored as a sequential file
  • Less effective when there are a large number of
    files in a given directory

22
Directory Tree Structure
23
Directory Tree Example
  • Path - following set of directories from master
    directory to file
  • Example /UserB/Word/UnitA/ABC
  • / often used to separate directories
  • Working directory
  • Current directory for files /UserB/Word
  • Files in this directory unless path given

24
File Sharing
  • Rights that may be granted
  • None Others dont know it exists
  • Often done by preventing user from reading the
    parent directory (Unix)
  • May have an explicit permission bit for access to
    the file name (Novell)
  • Knowledge Know it is there and who the owner is
  • Execution Able to run a program
  • Read Look at/copy contents
  • Execute and Read may be independent
  • Append Add data to the file
  • Cannot modify existing contents
  • Update Modify/delete/add data
  • Change Protection Grant rights to file
  • Owner can specify what other users have rights to
    this file
  • Deletion Can delete file

25
File Sharing
  • Who to grant rights to
  • Specific user
  • May allow different users to have distinct
    permissions
  • Group of users
  • World (public files)
  • Simultaneous Access
  • Multiple users may want to access or modify the
    same file
  • Example Airline reservation database
  • Locking Entire file vs. Records
  • Easier to lock entire file
  • Locking records allows more concurrency
  • Instance of reader/writer problem
  • Must address mutual exclusion and deadlock

26
Record Blocking
  • A record is the logical access unit of a file
  • Blocks are unit of I/O with secondary storage.
    For I/O to be performed, records must be
    organized as blocks.
  • Issues to consider
  • Should be blocks be fixed or variable length
  • On most systems blocks are fixed length
  • Simplifies I/O, buffer allocation in main memory
    and organization of blocks on secondary storage
  • What should the relative size of a block be
    compared to the average record size
  • The larger the block the more records that are
    passed in one I/O operation
  • If a file is being processed or searched
    sequentially, than this is an advantage
  • If records are being accessed randomly, it will
    result in unnecessary transfer of unused records,
    than this is a disadvantage
  • Three methods of blocking
  • Fixed blocking, variable-length spanned blocking
    and variable-length un-spanned blocking

27
Fixed Blocking
  • Fixed length records are used and an integral
    number of records are stored in a block
  • There may be unused space at the and of each
    block (internal fragmentation)

28
Variable Length Spanned Blocking
  • Variable length records are used and are packed
    into blocks with no unused space
  • Two records may span across two blocks with the
    continuation indicated by a pointer to the
    successor block
  • Wastes space only at the end of the file

29
Variable Length Unspanned Blocking
  • Variable length records are used, but spanning is
    not employed.
  • There is wasted space inmost blocks because of
    the inability to use the remainder of a block if
    the next record is larger than the remaining
    unused space

30
Secondary Storage Management
  • On secondary storage a file is a collection of
    blocks the operating system or file management
    system is responsible for allocating blocks to
    files
  • Two management issues
  • Space on secondary storage must be allocated to
    files
  • Keep track of the space available for allocation
  • The approach taken for file allocation may
    influence the approach taken for available space
    management

31
File Allocation
  • Issues in file allocation
  • When a new file is created, do we specify the
    maximum size? Is that space allocated at once?
  • Space is allocated to a file as one or more
    contiguous units (portions). How big of a unit
    should we use when allocating space for a file?
  • How do we keep track of what space has been
    allocated to a given file (what kind of structure
    or table is used to keep track for a unit
    allocated to a file)?
  • Pre-Allocation
  • Declare max size in advance
  • May be hard to guess space needed
  • Tendency to overestimate space needed
  • Ok if the file will never change
  • Dynamic allocation
  • Get space as the file needs it
  • Files are often no longer contiguous

32
File Allocation
  • Portion (unit) size
  • At one extreme, a single unit large enough to
    hold the entire file, while at the other extreme
    space on disk is allocated one block at a time.
  • In choosing the unit allocation size, there is a
    tradeoff between efficiency from the point of
    view of a single file versus overall system
    efficiency
  • Few items to be considered
  • Having lots of small units requires more space
    for allocation tables
  • Fixed-size portions simplifies the reallocation
    of space
  • Variable-sized units or small fixed-size units
    reduces wasted space

33
File Allocation
  • Two common alternatives
  • Variable-sized large contiguous portions
  • Minimizes waste, allocation overhead
  • Have to deal with fragmentation
  • First-Fit choose the first unused contiguous
    group of blocks of sufficient size from a free
    block list.
  • Best-Fit choose the smallest unused group that
    is of sufficient size
  • Nearest-Fit allocation choose the unused group
    of sufficient size that is closest to the
    previous allocation for the file to increase
    locality
  • Blocks Small fixed-size portions
  • May require large tables or complex structures
    for their allocation
  • Abandons contiguity
  • Allocate blocks as needed
  • Either strategy is compatible with pre-allocation
    and dynamic allocation. Not clear which strategy
    is best.

34
File Allocation Methods
  • Three methods are in common use
  • Contiguous allocation
  • Single contiguous set of blocks is allocated to a
    file at the time of file creation
  • Chained allocation
  • Each block contains a pointer to the next block
    in the chain
  • Indexed allocation
  • The file allocation table contains a separate one
    level index for each file the index has one
    entry for each allocated portion (unit) to the
    file.

35
Contiguous allocation
  • A single contiguous set of blocks assigned to a
    file when it is created
  • Pre-allocation strategy with variable-sized
    portions (units)
  • Good performance (especially for sequential
    files)
  • External fragmentation tends to occur
  • Use compaction to combine free space
  • Need to specify the size of the file at the time
    of creation
  • Used by CD-ROMs (ISO 9660)
  • Before compaction (left)
  • After compaction (right)

36
Chained Allocation
  • Allocate on the basis of individual blocks
  • Directory only links to the first block
  • Each block points to the next block
  • Easy to add blocks to a file
  • No external fragmentation
  • MSDOS FAT12/16/32 is a variation
  • Best suited to sequential files
  • File B, with start1and len5
  • No accommodation for locality
  • If necessary to brig in several blocks of a file
    at a time, then series of access to different
    parts of the disk is necessary
  • To overcome this problem, files are
    consolidated by some systems

37
Indexed Allocation
  • File allocation table contains one level index
    for each file the index has just one entry for
    portion (unit) allocated to the file
  • Typically the file indexes are not stored as part
    of the allocation table
  • The file index for a file is kept in a separate
    block and the entry for the file allocation table
    points to that block
  • Supports both sequential and random access to a
    file
  • In the figure above, a fixed size blocks
    allocation schema is presented
  • Eliminates external fragmentation

38
Indexed Allocation
  • Indexed allocation supports also a variable size
    portions (units) allocation schema
  • Improves locality
  • It is the most popular form of file allocation
  • In both cases, from time to time consolidation
    may be done
  • It reduces the size of the index for the variable
    sized portions schema

39
Free Space Management
  • Same as managing the space allocated to files,
    the free space that is not currently allocated
    needs to be managed
  • In order to be able to perform file allocation,
    we need to know what blocks on the disk are
    available, therefore we need to keep a disk
    allocation table, in addition to file allocation
    table
  • A number to methods to record free space
  • Bit Tables Free bit for each block
  • Works well with any of the presented allocation
    methods
  • It is as small as possible
  • Still, it can be large. The amount (in bits)
    required for a block bitmap is as follows disk
    size (Bytes)/ (8 file system block
    size(BYTES)). For a 16GBytes disk, we would get
    a 4MBytes large table it is large to hold in
    memory and also large to search
  • To speed up the search in the bit tables, the OS
    may divide disk into sections
  • Additional data structures must be kept to
    summarize the status of each section (i.e. the
    number of free blocks and the maximum sized
    contiguous number of free blocks)

40
Bit Table Example
  • Example bit vector for the above figure
  • 00111000011111000011111111111011000

41
Free Space Management
  • A number to methods to record free space
  • Chained Free List
  • Free portions may be chained together by using a
    pointer and length value in each free portion
  • It can produce fragmentation of the disk and many
    portions (units) will be a single block long. In
    this situation, every time a block gets
    allocated, it needs to be first read, to find out
    the pointer to the next free block. If this is
    done for a file creation (for multiple blocks) it
    can slow down the operation. Similarly, deleting
    highly fragmented files, it is time consuming
  • Indexing
  • Free space is treated as a file, store list of
    blocks in the same manner as ordinary files
  • For efficiency, the index should be on the basis
    of variable-size portions rather than blocks
  • One entry in the file for each free portion on
    the disk
  • Provides efficient support for all known file
    allocation methods
  • Free Block List
  • Each block is assigned a number sequentially and
    the list of the numbers of all free blocks is
    maintained in a reserved portion of the disk
  • Can treat it as a stack and re-allocate recently
    freed blocks only the last few blocks need to
    be kept in memory
  • Can use FIFO structure (a block is allocated from
    the head of the FIFO, and de-allocated by adding
    it to the tail of the queue)
  • May have a background process that works to
    facilitate contiguous allocation

42
Free Space Management
  • Where to store allocation tables
  • Memory
  • Table size may be a problem
  • Information may be lost if it crashes
  • Disk
  • Requires extra read/write to allocate a block -
    slows system down dramatically
  • Handling system crashes
  • Lock the allocation table on disk, do the
    allocation, update the disk
  • Will make the system very slow
  • May choose to pre-allocate a batch of blocks,
    then allocate to files on demand
  • Mark it in use on disk
  • Clean it up when a crash occurs

43
References
  • Operating Systems, William Stallings,
    ISBN 0-13-032986-X
  • Operating Systems, Jean Bacon and Tim Harris,
    ISBN 0-321-11789-1
Write a Comment
User Comments (0)
About PowerShow.com