LEARNING OBJECTIVES - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

LEARNING OBJECTIVES

Description:

LEARNING OBJECTIVES Index files. Operations Required to Maintain an Index File. Primary keys. Secondary keys. – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 34
Provided by: dhu65
Category:

less

Transcript and Presenter's Notes

Title: LEARNING OBJECTIVES


1
LEARNING OBJECTIVES
  • Index files.
  • Operations Required to Maintain an Index File.
  • Primary keys.
  • Secondary keys.

2
Index
  • Index is a tool for finding records in a file. It
    consists of a key field on which the index is
    searched and a reference (address or RRN) field
    that tells where to find the data file record
    associated with a particular key.

3
Examples of an Index
  • The index to a book (usually at the end of the
    book) provides a way to find a topic quickly.
    Imagine a book without an index?
  • The index in a library (an on-line catalog)
    allows you to locate items by an author, by a
    title, or by a call number.

4
Index in Databases -example
  • Musical recording store uses an index file to
    keep track of its inventory.
  • The data file consists of the following fields in
    each record
  • Id number
  • Title
  • Composer or composers
  • Artist or artists
  • Label (publisher)

5
recording.h
  • class Recording // a recording with a composite
    key
  • public
  • Recording ()
  • Recording (char label, char idNum, char
    title, char composer, char artist)
  • char IdNum7
  • char Title 30
  • char Composer30
  • char Artist30
  • char Label7 char Key () const
  • Unpack (IOBuffer ) int Pack (IOBuffer )
    const
  • void Print (ostream , char label 0) const

6
Primary key -example
  • The primary key in our example consists of the
    initials for the company label combined with the
    product ID. The canonical form of this key will
    consist of the uppercase form of the Label field
    followed by the ASCII representation of the ID
    number.
  • E.G. DG241

7
Index file
  • Index file is used to provide rapid keyed access
    to individual records in the data file.
  • Index file consists of the following fields
  • key (e.g. ANG3795)
  • reference (address) address of the corresponding
    record in the data file

8
Operations Required to Maintain an Indexed File
  • Create the original empty index file and data
    file
  • Load index file into memory before using it (if
    possible, load the whole file)
  • Rewrite the index file from memory to the
    permanent storage after modifying it
  • Add data records to the data file
  • Delete data records from the data file
  • Update records in the data file
  • Update the index to reflect changes in the data
    file

9
Creating Files
  • Create two empty files
  • index file and
  • data record file

10
Loading Index into Main Memory
  • This can be supported with a buffer I/O or with
    an array.

11
Rewriting the Index File from Memory
  • This can be supported as a part of the close
    operation for the index file (I.e write the
    buffer or the array to the disk).

12
Dangers of losing the index file
  • If the index file is
  • outdated
  • corrupted or
  • lost
  • then there must be some means of
    reconstructing the index file from the data file!

13
Record addition
  • Adding a new data record to the data file
    requires that we add a new record to the index
    file too.
  • Since the index file is usually kept sorted than
    adding a new record would require rearranging the
    records in this file. (This should be easy done
    if the index is kept in main memory).

14
Record deletion
  • Deleting a data record requires deletion of the
    corresponding index record.
  • Note that in an index file organization all data
    records are pinned. (WHY?)
  • What are the consequences of this fact?

15
Record Updating
  • There are two categories of updates
  • the update modifies the value of the key
  • the update does not modify the value of the key
  • If the update modifies (changes) the primary key,
    then re-ordering of the index file might be
    required.
  • If the update does not change the primary key it
    might still require reordering of records in the
    data file. (WHY?)

16
Indexes that are too large to hold in Memory
  • If the index file is too large to be kept in main
    memory then it has to be kept on the secondary
    storage. There are a number of disadvantages of
    keeping an index file on the disk
  • searching the index file can be very time
    consuming
  • index rearrangement can be time consuming too.

17
Possible alternatives to storing index files
  • If the index file is too large to be kept in main
    memory than the following alternative
    organizations should be considered
  • a hashed organization (if access speed is very
    important)
  • a tree structured organization, or a multilevel
    index such as a B-tree

18
Pros of a simple index file
  • Even if a simple index file has to be stored on
    the disk, in some cases it might prove a useful
    method of data storing.
  • Advantages of the simple index file
  • allows for use of binary search to obtain a
    key-access to the record
  • if index entries are much smaller than data
    records then sorting and maintaining an index is
    much easier than the data file
  • if the data records are pinned than the index
    file allows for rearranging the keys without
    moving the data records

19
Indexing with Multiple Key Access
  • Since the primary key is unique then it is often
    used as a search keyword.
  • Example of the primary key of the class recording
    is Label Id (e.g. ANG3795). But most of the time
    when one searches for a music CD one would rather
    provide a title, a composer, or an artist.

20
Secondary key
  • Secondary key is a key for which multiple records
    may exist in the data file.
  • Example
  • The composers name in the Recording class
    example (there can be a number of CDs with
    Beethovens work in a store).
  • The artist name in the Recording class.

21
Secondary Index File
  • A secondary index file might be created for each
    of the possible secondary indexes.
  • Each entry in the secondary index file should
    consists of the following two fields
  • secondary index field (e.g. Beethoven)
  • the corresponding primary index key (e.g. ANG3795)

22
Record Addition
  • Adding a record to the data file implies adding a
    record to the secondary index file.
  • Costs of that are similar to the cost of adding a
    record in the primary index file. (e.g. records
    might have to be shifted)

23
Record Deletion
  • Deleting a record implies removing all references
    to that record in the file system.
  • After the search on the secondary key, we perform
    a search on the primary key of the record to be
    deleted and and remove it from the secondary
    index file.

24
Record Updating
  • There are three possible situations
  • The update changes secondary key (if the
    secondary key is changed, we may have to
    rearrange the secondary key index so it stays in
    sorted order)
  • The update changes the primary key (it has a big
    impact on the primary key index but in the
    secondary key index we only need to update the
    affected primary key field)

25
Record Updating
  • Update is confined to other fields all updates
    that do not affect either the primary or
    secondary key fields do not affect the secondary
    key index, even if the update is substantial.

26
Retrieving Data with Multiple Secondary Keys
  • Example If we want to find all CDs in a music
    store that have Beethovens Symphony No. 9 then
    we should search data files by using the
    following secondary keys
  • composer AND title.
  • Both of those searches should produce a list of
    CDs by providing their primary keys.

27
Boolean AND in searches
  • EG. The search by composer could produce the
    following list of CDs (ANG3795, DG139201,
    DG18807, RCA2626) and the search by title could
    produce the following list of CDs (ANG3795,
    COL31809, DG18807)
  • The CDs that we are interested in will have to
    belong to both of the above lists. (In other
    words we are taking an intersection of two sets)
    WHY?

28
Boolean OR searches
  • If we want to find all CDs by Beethoven and
    Chopin then we will use OR operation in our
    secondary key searches.
  • To obtain the list of CDs that we are interested
    we would have to combine the outcomes of both
    searches (or use a union of two sets) WHY?

29
Cons of the Current Secondary Index Structure
  • Index file has to be rearranged every time a new
    record is added to the file.
  • If there are duplicate secondary keys, the
    secondary key field is repeated for each entry.

30
Improvements to the secondary index key structure
  • Solution 1
  • Allow for multiple primary keys to be associated
    with a single secondary key by allocating an
    array of primary keys for each secondary key
    entry.
  • Solves the problem of sorting each time when an
    new entry is added.
  • Suffers from internal fragmentation (WHY?), and
    the number of allocated entries in the array may
    prove too small.

31
Improvements to the secondary index key structure
  • Solution 2
  • Create an inverted list of indexes. Have each
    secondary key point to a list of primary key
    references associated with it.
  • This method eliminates most of the problems
    associated with maintaining a secondary index
    file. WHY?

32
Selective Index
  • A selective index contains keys for only a
    portion of the records in the data file. Such an
    index provides the user with a view of a specific
    subset of the files records. (E.G. all CDs of
    Beethovens work produced in 1998)

33
Binding
  • Binding takes place when a key is associated with
    a particular physical record in the data file.
    This can take place either during the preparation
    of the data file and indexes or later on during
    program execution.
Write a Comment
User Comments (0)
About PowerShow.com