Managing Files of Records - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Managing Files of Records

Description:

... of algorithms in between, but we'll start with ... n is the number of records in ... Fixed length fields (think inventory example) Make sure record size fits ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 16
Provided by: mega100
Category:
Tags: files | keys | managing | records

less

Transcript and Presenter's Notes

Title: Managing Files of Records


1
Managing Files of Records
  • CS 3050, Spring 2007
  • 4/4/2007
  • Dr Melanie Martin

2
Assume
  • We have a file
  • The file is made up of records
  • The records are made up of fields
  • We want to access a specific record

3
Identifying the Record
  • RRN (relative record number)
  • Saw previously
  • Access fixed length records directly
  • Byte offset RRN size of record in bytes
  • Variable length
  • Use index
  • Fixed length records
  • At RRN j, index contains byte offset in data file
  • Adds an extra look-up

4
Identifying the Record
  • Key
  • Field or set of fields
  • Canonical
  • Rule for exact format
  • All caps
  • Remove or add - in SSN or phone
  • Distinct (unique)
  • Required for primary key
  • ISBN, SSN, Phone

5
Identifying the Record
  • Keys come in two main flavors
  • Primary
  • Uniquely identifies a single record
  • Ex your specific bank account
  • Secondary
  • Identifies a group of records
  • Ex all bank customers in Turlock
  • Ex all bank customers overdrawn

6
Finding the Record
  • Two extremes
  • Direct access
  • Sequential search
  • Lots of algorithms in between, but well start
    with the extremes

7
Measuring Algorithm Performance
  • In general well count reads (seeks)
  • Big O
  • Asymptotic upper bound - worst case
  • g(n) O(f(n)) means cf(n) is an upper bound for
    g(n), if there exist constants c, n0 such that to
    the right of n0 the value of g(n) is always below
    cf(n)
  • Draw Picture

8
Direct Access
  • Just go get the record we want
  • O(1)
  • No matter how large the file we can get the
    record in one seek
  • See previous discussion of using RRN for fixed
    length or index RRN for variable length

9
Sequential Access
  • Go through the records in the file sequentially
    until we find the one were looking for
  • RRN or Key
  • Read one record at a time from disk
  • O(n) where n is the number of records in the file
  • I.e.time is proportional to the number of records
    in the file (average and worst case)
  • BUT what if we use blocks and read 100 records at
    a time
  • STILL proportional to number of records in the
    file

10
Why would we ever do this?
  • Sequential search can be good when
  • There are few records
  • Rarely need to search
  • Ascii files where looking for patterns (grep)
  • Lots of records that will match a secondary key

11
Pros and Cons
  • Sequential search
  • easy to program
  • only requires simple file structures
  • - takes too long
  • Soon we will start looking at ways to get around
    this and get closer to direct access

12
Some Miscellaneous Topics
  • Structure and length
  • Fixed length fields (think inventory example)
  • Make sure record size fits evenly into sectors
  • Ex 512 byte sectors
  • 30 byte records -gt increase to 32 bytes
  • Records never span sectors
  • More challenging with variable length fields
    (records)
  • Estimate longest possible field values (waste
    issues if too big, truncation/data loss if too
    small)
  • Averaging effect
  • Longest name unlikely to occur with longest
    address in mailing list

13
Some Miscellaneous Topics
  • Distinguishing data from unused space
  • Read length at beginning
  • Special delimiter at end
  • Count fields

14
Some Miscellaneous Topics
  • Header records
  • Commonly used
  • At beginning of file
  • Might contain
  • records
  • Length of records
  • Date and time of last update
  • Name of file
  • Need to be able to distinguish it from data

15
Some Miscellaneous Topics
  • Metadata
  • Data that describes the primary data in the file
  • Ex Astronomer with image data generated by
    telescopes
  • Mostly interested in the image
  • Need info about image
  • Where and when taken
  • Which telescope
  • Names of related files/images
  • Etc.
Write a Comment
User Comments (0)
About PowerShow.com