File Structures - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

File Structures

Description:

Physical Records and Buffers ... location of the buffer. flag to save the file or not when application terminates. ... fread( buffer, sizeof( char ), 10, stream ) ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 49
Provided by: shyhka
Category:
Tags: file | structures

less

Transcript and Presenter's Notes

Title: File Structures


1
File Structures
  • Shyh-Kang Jeng
  • Department of Electrical Engineering/
  • Graduate Institute of Communication Engineering
  • National Taiwan University

2
Logical Records
  • High-level programming language provides
    primitives for requesting access to files via the
    operating system
  • logical records
  • Units in a file compatible with the application,
    provided by high-level programming language
    file-access primitives
  • Fields
  • Smaller information units in a logical record

3
File, Logical Record, and Field
4
Physical Records
5
Operating System and File Access
6
Physical Records and Buffers
  • Manipulating the file in terms of the blocks is
    handled by the operating system
  • The operating system responds to a reading
    request by reading enough physical records,
    placing the data in buffer, and making the buffer
    available to the application
  • When storing, the operating system stores the
    data in a buffer until a complete physical record
    has been accumulated and then transfers the
    entire physical record to mass storage

7
File Descriptor
  • Also called file control block
  • A table storing information about the file being
    manipulated
  • device
  • name of the file
  • location of the buffer
  • flag to save the file or not when application
    terminates
  • Opening and closing the file
  • The processes of creating and discarding a file
    descriptor

8
Opening and Closing Files
  • Imperative Paradigm
  • Open the file document.txt as DocFile for input
  • Close the file DocFile
  • Object-Oriented Paradigm
  • Create the object DocFile as the input file
    document.txt
  • Send the message GetCharacter to DocFile to
    retrieve Symbol
  • Send the message Close to the object DocFile

9
Sequential Files
  • Accessed in a serial manner from its beginning to
    its end
  • Examples
  • Audio
  • Video
  • Files containing programs
  • Files containing text documents

10
Processing a Sequential File
  • while (the end of the file has not been reached)
    do
  • (retrieve the next record from the file and
    process it)

11
File Allocation Tables (FAT)
  • Clusters (4-16 sectors, about 2 KB in Windows)
  • FAT keeps a record of which cluster is assigned
    to which file
  • Through FAT, the operating system can retrieve
    the file in the proper cluster-by-cluster order
  • FAT16 (64K clusters, 128 MB)
  • FAT32 (4 G clusters, terabytes)

12
Maintaining a Files Order
13
Disk Scanning
14
Detection of EOF
  • End-of-file
  • Identifying EOF
  • Place a special record (sentinel)
  • retrieve the first record from the file
  • while (the record is not sentinel) do
  • ( retrieve the next record from the file)
  • Leave the task to operating system
  • while (not EOF) do
  • (retrieve the next record from the file)

15
Key Field
  • A single field to identify a logical record in a
    file
  • Example
  • social security number in an employee file
  • Arranging files according to a key field can
    greatly reduce processing time
  • Updating classic sequential files
  • Transaction file (new information)
  • Master file (file to be updated)

16
Merging Two Files (1)
17
Merging Two Files (2)
18
Merging Two Sequential Files
19
Text Files
  • A sequential file in which each logical record
    consists of a single encoded character
  • ASCII file and Unicode file
  • Binary file
  • Line feed (PC), Carriage Return Line feed
    (UNIX), Carriage Return (Apple)

20
Editors vs. Word Processors
  • Editors create and modify strict text files
  • Word processors insert nonprintable codes in the
    file to represent changes in fonts, alignment
    information, etc.
  • Email handles only text files, and word processor
    output can only be transferred as attachments

21
A Simple Employee File Implemented as a Text File
22
The First Two Bars of Beethovens Fifth Symphony
23
Representing Sheet Music by a Text File
  • ltstaff cleftreblegt
  • ltkeygtC minorlt/keygtlttimegt2/4lt/timegt
  • ltmeasuregtltrestgtqtrlt/restgt
  • ltnotesgtqtr G, qtr G, qtr Glt/notesgt
  • lt/measuregt
  • ltmeasuregtltnotesgthlf Elt/notesgtlt/measuregt
  • lt/staffgt

24
Advantages of Representing Music as a Text File
  • Can be encoded, modified, stored, and transferred
    over the internet
  • Software can be written to present the contents
    of such files in the form of traditional sheet
    music or even to play the music on a synthesizer

25
eXtensible Markup Language
  • A standard style for designing notation systems
    (markup languages) for representing data as text
    files
  • Some markup languages following the XML standards
  • MathML
  • SMIL (multimedia presentations)
  • 4ML (music)
  • XHTML
  • WML and MPEG-7

26
Programming Concerns
  • Imperative paradigm
  • Apply the procedure ReadFile to retrieve
    MailRecord from MailList
  • Object-oriented paradigm
  • Send the ReadFile message to the object MailList
    to retrieve MailRecord
  • Peripheral devices
  • Apply the procedure ReadFile to retrieve Name
    from the file KeyBoard

27
Programming Concerns
  • Examples
  • Apply the procedure GetCharacter to retrieve
    Symbol from the file Text
  • Apply the procedure ReadLine to retrieve TextLine
    from the file Text
  • Apply the procedure Write to place the value of
    Length in the file Text
  • Apply the procedure ReadFile to retrieve the
    value of Length from the file Text
  • Apply the procedure ReadFile to retrieve Age from
    the file KeyBoard

28
Converting data from twos complement notation
into ASCII for storage in a text file
29
Indices
  • Contains a list of values we call keys
  • Each key identifies a block of information
    residing in the related storage structure
  • Along with each of these keys in the index is an
    entry indicating where the associated block of
    information is stored
  • To find a particular block of information
  • First finds the identifying key in the index
  • Then retrieves the block of information stored at
    the location associated with that key

30
Indexed File
31
Inverted File
32
Partial Index
33
Hierarchical Index System
34
File Pointer
  • fgetpos( Personnel, Position )
  • fsetpos( Personnel, Position )

35
Example (1)
  • / FGETPOS.C This program opens a file and reads
  • bytes at several different locations.
  • /
  • include ltstdio.hgt
  • void main( void )
  • FILE stream
  • fpos_t pos
  • char buffer20
  • if( (stream fopen( "fgetpos.c", "rb" ))
    NULL )
  • printf( "Trouble opening file\n" )
  • else
  • / Read some data and then check the position.
    /
  • fread( buffer, sizeof( char ), 10, stream
    )
  • if( fgetpos( stream, pos ) ! 0 )
  • perror( "fgetpos error" )
  • else
  • fread( buffer, sizeof( char ), 10, stream )
  • printf( "10 bytes at byte ld .10s\n",
    pos, buffer )

36
Example (2)
  • / Set a new position and read more data /
  • pos 140
  • if( fsetpos( stream, pos ) ! 0 )
  • perror( "fsetpos error" )
  • fread( buffer, sizeof( char ), 10, stream
    )
  • printf( "10 bytes at byte ld .10s\n",
    pos, buffer ) fclose( stream )

37
Hashing
  • A technique that provides access similar to index
    structure
  • But needs not to maintain indices
  • Bucket
  • Sections that the data storage space is divided
    into
  • Hash function
  • Algorithm that converts the key value into bucket
    number
  • Hash files and hash tables

38
Hashing System
39
Hashing a Key
40
Distribution Problems
  • Better to select a hash function that evenly
    distributes the records among the buckets
  • If a dividend and a divisor both have a common
    factor, this factor will be present in the
    remainder as well
  • Example
  • 40 buckets and keys are multiples of 5
  • The entries will cluster in those buckets
    associated with the remainder 0, 5, 10, 15, 20,
    25, 30, 35

41
Collision
  • The phenomenon of two keys hashing to the same
    value
  • Tends to clustering, and should be avoided

42
Probability Calculation
  • 41 buckets
  • Probability that a new entry can be placed in an
    empty bucket after inserting 7 entries
  • (41/41)(40/41)(39/41)(38/41)(34/41)
  • 0.482

43
Handling Bucket Overflow
44
Load Factor
  • The ratio between the number of entries actually
    stored in the structure to the total capacity of
    the buckets
  • As long as the ratio is below 50, the
    performance is normally good
  • If the load factor creeps above 75, the system
    performance generally degrades
  • Usually reconstruct the system using more buckets
    if a load factor approaches 75

45
Java Class Hashtable
  • table new Hashtable(capacity, factor)
  • Each bucket is a linked list
  • The load factor is the ratio of nonempty buckets
    to the total number of buckets
  • Methods put and get

46
Hash File
47
Java Class Properties
  • A Properties object is in effect a Hashtable
  • Initialized from a file by method load
  • Saved in mass storage by the method store
  • The file is actually a sequential file consisting
    of a stream of bits from which the appropriate
    hash table can be constructed in main memory

48
Exercises
  • Review problems
  • 6, 9, 12, 16, 18, 20, 25, 32, 38, 39
Write a Comment
User Comments (0)
About PowerShow.com