Indexing and Hashing - PowerPoint PPT Presentation

About This Presentation
Title:

Indexing and Hashing

Description:

Redwood. Round Hill. 11/4/09. B.Ramamurthy. 9. Sparse Index. Brighton A-217 750. Downtown ... Redwood. Which one is better? Dense or sparse? It is a trade off ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 25
Provided by: bina1
Learn more at: https://cse.buffalo.edu
Category:

less

Transcript and Presenter's Notes

Title: Indexing and Hashing


1
Indexing and Hashing
  • B.Ramamurthy
  • Chapter 11

2
Representing Data
  • Attributes are represented in fixed or variable
    length collections called fields
  • Fields in turn are put into fixed or variable
    length collections called records.
  • Records are stored in physical blocks.
  • A collection of records that forms a relation is
    stored as a collection of blocks called a file.
  • This file different than OS file. How?
  • Organization is different.
  • Extra indices to accommodate easy search and
    access.

3
Basic Concepts (indexing)
  • Indexing works the same way as a catalog for a
    book in a library.
  • Indexing needs to be efficient to allow fast
    access to records.
  • Two types of indices
  • ordered indices and
  • hash indices

4
Techniques and Evaluation
  • Access types types of accesses that are
    supported efficiently. Search by specific value
    or by range.
  • Access time Time sit takes to find a particular
    data or a set of data.
  • Insertion time Time it takes to insert a new
    item.
  • Deletion time Time it takes to delete an item.
  • Space overhead Additional space occupied by the
    index structure.

5
Ordered Indices
  • To gain fast access to records in a file we can
    use an index structure.
  • If the file containing the records is
    sequentially ordered, the index whose search key
    specifies the sequential order of the file is the
    primary key index.
  • Primary key indices are also called clustering
    indices.

6
Primary Index
  • Assume that all files are ordered sequentially on
    some search key.
  • Such files, with primary key on the search key,
    are called index-sequential files.
  • These files accommodate both sequential and
    random access to individual records.

7
Dense and Sparse Index
  • Dense index
  • An index record appears for every search key
    value in the file.
  • The index record contains the search key and a
    pointer to the first data record with that
    search-key value.
  • Sparse index
  • An index is created only for a few values. Each
    index contains a value and pointer to first
    record that contains that value.

8
Dense Index
9
Sparse Index
Brighton
Mianus
Redwood
Which one is better? Dense or sparse? It is a
trade off Between access time and space overhead.
10
Multi-level Indices
  • Indices themselves may become too large for
    efficient processing.
  • Example
  • Consider file with 100000 records with 10 records
    in a block.
  • With sparse index and one index per block we have
    about 10,000 indices.
  • Assuming 100 indices fit into a block we need
    about 100 blocks.
  • It is desirable to keep the index file in the
    main memory.
  • Problem Searching a large index file becomes
    expensive.

11
Multi-level Index
  • Solution Index the index file. We treat the
    index as we would treat any other sequential file
    and construct a sparse index on the primary
    index.
  • We binary-search the outer level index to find
    the largest search key less than or equal to the
    one we desire.
  • Two-level sparse index Figure 11.4

12
Secondary Index
  • Secondary index is on attributes whose values are
    not stored sequentially.
  • If the search key of a secondary index is not a
    candidate key, the index needs to be dense too.
  • We can use an extra level of indirection with
    buckets at the second level.
  • See fig.11.5

13
Secondary Index
350
400
500
600
700
750
900
14
B Tree Index Files
  • Main disadvantage of the index-sequential file
    organization is that performance degrades as the
    file grows both for index lookups and sequential
    scans.
  • B tree index structure is most widely used of
    several index structures that maintain their
    efficiency despite insertion and deletion of data.

15
B Tree Index files
  • A B index tree is a balanced tree in which every
    path from root to leaf is of same length and each
    non-leaf node has between ceiling(n/2) and n
    nodes where n is fixed.
  • Typical node is a B tree
  • n-1 search keys K1, K2, Kn-1
  • n pointers P1, P2, Pn

16
B Tree Node
17
B Tree (contd.)
  • Structure of a B tree
  • Queries on B trees
  • Updates on B trees (insertion , deletion)
  • B file organization
  • B Tree variation of B tree avoiding redundancy

18
Hashing
  • Can we avoid the IO operations that the result
    from accessing the index file?
  • Hashing offers a way.
  • It also provides a way of constructing indices
    (which need nor be sequential).
  • We will study static and dynamic hashing.

19
Hash File Organization
  • Address of the disk block containing a desired
    record is computed using a function (hash
    function) and the search key.
  • Let K denote set of all search keys, B denote set
    of all bucket addresses. Hash function h is a
    function that maps K to B.
  • Bucket is typically a disk block.

20
Operations
  • To insert a record with Ki as key, compute h(Ki)
    which gives the address of the bucket for the
    record. If there is space in the bucket then it
    is stored that bucket. (else chaining?)
  • To lookup a record with key Ki, compute h(ki).
    Check with every record in the bucket to obtain
    the record.
  • To delete a similar hash, find and delete is
    followed.

21
Hash Functions
  • Hash function should be chosen so that
  • The distribution of records is uniform.
  • The distribution is random.
  • Handling bucket overflows
  • May occur due to insufficient number of buckets.
  • Due to bucket skew.
  • Solution Overflow buckets, chaining, double
    hashing, linear probing, quadratic probing

22
Hash Indices
  • Hashing can be used for organizing indices.Hash
    index organizes search keys with their associated
    pointers.
  • See Fig.11.22
  • Typically only secondary indices need to be
    organized using hashing.

23
Dynamic Hashing
  • Many of todays databases grow very large in (a
    short) time.
  • If you use static hash function we have three
    option
  • Choose hash function based on current size,
  • Choose hash function based on anticipated size.
  • Periodically restructure the hash file in
    response to growth.
  • Another solution dynamic hashing.

24
Dynamic Hash Techniques
  • Dynamic hash techniques allow the hash function
    to be modified dynamically to accommodate the
    growth and shrinkage of the database.
  • It is also known as extendable hashing.
  • Extendable hashing copes with the growth in the
    database size by splitting and coalescing buckets
    as the database grows and shrinks.
Write a Comment
User Comments (0)
About PowerShow.com