Searching - PowerPoint PPT Presentation

About This Presentation

Title:

Searching

Description:

Searching Given distinct keys k1, k2, , kn and a collection of n records of the form (k1,I1), (k2,I2), , (kn, In) Search Problem - For key value K, locate the ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 20

Provided by: WillT153

Learn more at: http://faculty.winthrop.edu

Category:

more less

Transcript and Presenter's Notes

Title: Searching

1
Searching

Given distinct keys k1, k2, , kn and a
collection of n records of the form
(k1,I1), (k2,I2), , (kn, In)
Search Problem - For key value K, locate the
record (kj, Ij) in T such that kjK.
Searching is a systematic method for locating the
record(s) with key value kjK.
A successful search is one in which a record with
key kjK is found.
An unsuccessful search is one in which no record
with kjK is found (and does not exist).

2
Searching Ordered Arrays

Binary Search - been there done that.
Dictionary Search - interpolation search
Determine how far from an endpoint your value is
probably going to be.
Pos(value-Alo)/(Ahi-Alow) (hi-lo)
Look here rather than mid
Assumes the data is evenly distributed.

3
Lists Ordered by Frequency

Order lists by (expected) frequency of
occurrence.
Perform sequential search
Cost for first record 1
Cost for second record 2
Search cost 1p1 2 p2 3p3 npn
Worst case (n1)/2
Best if a few items are accessed many times

4
Self Organizing Lists

80/20 rule 80 of the accesses are to 20 of the
records
expected search cost .122n
Self organizing lists modify the order of records
within the list basedon the actual pattern of
record accesses.
Self organizing lists use a rule called a
heuristic for deciding how to reorder the list.

5
Self Organizing Heuristics

Order by actual frequency - most frequently used
first
When a record is found, swap it with the first
item
When a record is found, move it to the front of
the list
When a record is found, swap it with the record
ahead of it

6
Hashing

The process of mapping a key value to a position
in a table.
A hash function maps key values to positions.
A hash table is an array that holds the records.
The hash table has M slots (0M-1)
For any value K in the key range and some hash
function h,
h(k) I where 0 IltM, and key(TI)K

7
Hashing Situations

Hashing is appropriate for unique keys.
Good for both in-memory and disk based
applications.
Answers the question What record, if any, has
key value K?
Example Store the n records with keys in range
0-(n-1).
Store the record with key i in slot i.
Uses the hash function h(k)k. (Identity
function).

8
Collisions

More reasonable example
Store about 1000 records with keys in the range
0-16,383.
Impractical to keep a table of size 16,384.
We need a hash function to map keys to a smaller
range.
Given a hash function h and different keys k1 and
k2. Let ? be a position in the hash table.
If h(k1 ) h(k2 ) ? then k1 and k2 have a
collision at ? under h.

9
Collision Resolution

To search for the record with key K
Compute the table location h(K).
Starting with slot h(K), locate the record
containing key K using (if necessary) a collision
resolution policy.
Collisions are inevitable in most applications.
Example In a group of 23 people the odds are
good that at least one pair share a birthday.

10
Hash Functions

Must return a value within the table range.
Should evenly distribute the records to be stored
among the table slots.
Ideally, the function should distribute records
with equal probability to all the positions. In
reality, usually depends on the data.
If we know nothing about the key distribution,
evenly distribute the key range among the
positions.
If we know about the key distribution, use a
distribution dependant hash function.

11
Example Hash Functions

h(key)key 16 - uses only last 4 bits.
H(key)key 1000 - uses last 4 digits.
Use tablesize to make sure result is in the
range.
Mid-square method square the key and take the
middle r bits for a table of size 2r
Sum up ASCII characters and take results modulo
tablesize (a folding technique).

12
Collision Handling Categories

Open hashing - when there is a collision, put
collided item outside the table.
Closed hashing - when there is a collision, put
collided item inside the table.

13
Open Hashing

Look at each table element as the head of a
linked list of items that has to that position.
Can organize the linked lists in many ways
ordered unsuccessful searches are quickly
found.
Ordered by frequency if a few are searched for
frequently, then this is a good technique.
If there are N records to be stored and the table
is of size M then the average search length is
O(N/M).
Good for internal memory. Linked nodes may be in
different blocks on disk and cause many disk
accesses.

14
Closed Hashing - Linear Probe

If the item you are looking for is not in the
hash position, look in the next position.
Do the same for insert until you find an empty
location.
When you reach the bottom, go to the beginning.
Must have at least one empty slot or there will
be an infinite loop.
Tends to have clustering since the collision
position is not uniformly distributed (i.e. if
collide at position 4, go to position 5, then 6,
independent of key).

15
Better Linear Probe

Instead of going to the next slot, skip by some
constant c.
The tablesize M and c should be relatively prime.
This assures the probing will cycle through all
the table.
Still has some clustering.

16
Quadratic Probe

Instead of adding 1 to the key add i2
i is the probe sequence, so add 1, 4, 9, 16,...
Remember we also mod with table size.

17
Double Hashing

After a collision, use a different hash function.
Eliminates clustering to some degree.
For example if h(k) causes a collision then use
p(k,i) ih2(k)
h2 is a different hash function
generates a different probe sequence

18
Analysis of Closed Hashing

load factor lfN/M
N is the number of records
M is the size of the table
N/M is the percent full
The larger the load factor the greater the
probability of a collision
Average search length is O(1/(1-lf))

19
Deletions

If we delete a value it may stop the search
prematurely (break the chain).
Use a special mark to indicate something was
deleted. When searching continue if see this
mark rather than stopping as if it was empty.
Once we have many deleted items we may wish to
rehash everything remaining
best if we rehash the most frequently accessed
items first.

Write a Comment

User Comments (0)