Indexing and Hashing - PowerPoint PPT Presentation

About This Presentation

Title:

Indexing and Hashing

Description:

Redwood. Round Hill. 11/4/09. B.Ramamurthy. 9. Sparse Index. Brighton A-217 750. Downtown ... Redwood. Which one is better? Dense or sparse? It is a trade off ... – PowerPoint PPT presentation

Number of Views:90

Avg rating:3.0/5.0

Slides: 25

Provided by: bina1

Learn more at: https://cse.buffalo.edu

Category:

more less

Transcript and Presenter's Notes

Title: Indexing and Hashing

1
Indexing and Hashing

B.Ramamurthy
Chapter 11

2
Representing Data

Attributes are represented in fixed or variable
length collections called fields
Fields in turn are put into fixed or variable
length collections called records.
Records are stored in physical blocks.
A collection of records that forms a relation is
stored as a collection of blocks called a file.
This file different than OS file. How?
Organization is different.
Extra indices to accommodate easy search and
access.

3
Basic Concepts (indexing)

Indexing works the same way as a catalog for a
book in a library.
Indexing needs to be efficient to allow fast
access to records.
Two types of indices
ordered indices and
hash indices

4
Techniques and Evaluation

Access types types of accesses that are
supported efficiently. Search by specific value
or by range.
Access time Time sit takes to find a particular
data or a set of data.
Insertion time Time it takes to insert a new
item.
Deletion time Time it takes to delete an item.
Space overhead Additional space occupied by the
index structure.

5
Ordered Indices

To gain fast access to records in a file we can
use an index structure.
If the file containing the records is
sequentially ordered, the index whose search key
specifies the sequential order of the file is the
primary key index.
Primary key indices are also called clustering
indices.

6
Primary Index

Assume that all files are ordered sequentially on
some search key.
Such files, with primary key on the search key,
are called index-sequential files.
These files accommodate both sequential and
random access to individual records.

7
Dense and Sparse Index

Dense index
An index record appears for every search key
value in the file.
The index record contains the search key and a
pointer to the first data record with that
search-key value.
Sparse index
An index is created only for a few values. Each
index contains a value and pointer to first
record that contains that value.

8
Dense Index
9
Sparse Index
Brighton
Mianus
Redwood
Which one is better? Dense or sparse? It is a
trade off Between access time and space overhead.
10
Multi-level Indices

Indices themselves may become too large for
efficient processing.
Example
Consider file with 100000 records with 10 records
in a block.
With sparse index and one index per block we have
about 10,000 indices.
Assuming 100 indices fit into a block we need
about 100 blocks.
It is desirable to keep the index file in the
main memory.
Problem Searching a large index file becomes
expensive.

11
Multi-level Index

Solution Index the index file. We treat the
index as we would treat any other sequential file
and construct a sparse index on the primary
index.
We binary-search the outer level index to find
the largest search key less than or equal to the
one we desire.
Two-level sparse index Figure 11.4

12
Secondary Index

Secondary index is on attributes whose values are
not stored sequentially.
If the search key of a secondary index is not a
candidate key, the index needs to be dense too.
We can use an extra level of indirection with
buckets at the second level.
See fig.11.5

13
Secondary Index
350
400
500
600
700
750
900
14
B Tree Index Files

Main disadvantage of the index-sequential file
organization is that performance degrades as the
file grows both for index lookups and sequential
scans.
B tree index structure is most widely used of
several index structures that maintain their
efficiency despite insertion and deletion of data.

15
B Tree Index files

A B index tree is a balanced tree in which every
path from root to leaf is of same length and each
non-leaf node has between ceiling(n/2) and n
nodes where n is fixed.
Typical node is a B tree
n-1 search keys K1, K2, Kn-1
n pointers P1, P2, Pn

16
B Tree Node
17
B Tree (contd.)

Structure of a B tree
Queries on B trees
Updates on B trees (insertion , deletion)
B file organization
B Tree variation of B tree avoiding redundancy

18
Hashing

Can we avoid the IO operations that the result
from accessing the index file?
Hashing offers a way.
It also provides a way of constructing indices
(which need nor be sequential).
We will study static and dynamic hashing.

19
Hash File Organization

Address of the disk block containing a desired
record is computed using a function (hash
function) and the search key.
Let K denote set of all search keys, B denote set
of all bucket addresses. Hash function h is a
function that maps K to B.
Bucket is typically a disk block.

20
Operations

To insert a record with Ki as key, compute h(Ki)
which gives the address of the bucket for the
record. If there is space in the bucket then it
is stored that bucket. (else chaining?)
To lookup a record with key Ki, compute h(ki).
Check with every record in the bucket to obtain
the record.
To delete a similar hash, find and delete is
followed.

21
Hash Functions

Hash function should be chosen so that
The distribution of records is uniform.
The distribution is random.
Handling bucket overflows
May occur due to insufficient number of buckets.
Due to bucket skew.
Solution Overflow buckets, chaining, double
hashing, linear probing, quadratic probing

22
Hash Indices

Hashing can be used for organizing indices.Hash
index organizes search keys with their associated
pointers.
See Fig.11.22
Typically only secondary indices need to be
organized using hashing.

23
Dynamic Hashing

Many of todays databases grow very large in (a
short) time.
If you use static hash function we have three
option
Choose hash function based on current size,
Choose hash function based on anticipated size.
Periodically restructure the hash file in
response to growth.
Another solution dynamic hashing.

24
Dynamic Hash Techniques

Dynamic hash techniques allow the hash function
to be modified dynamically to accommodate the
growth and shrinkage of the database.
It is also known as extendable hashing.
Extendable hashing copes with the growth in the
database size by splitting and coalescing buckets
as the database grows and shrinks.

Write a Comment

User Comments (0)