Searching - PowerPoint PPT Presentation

About This Presentation
Title:

Searching

Description:

Searching Given distinct keys k1, k2, , kn and a collection of n records of the form (k1,I1), (k2,I2), , (kn, In) Search Problem - For key value K, locate the ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 20
Provided by: WillT153
Category:

less

Transcript and Presenter's Notes

Title: Searching


1
Searching
  • Given distinct keys k1, k2, , kn and a
    collection of n records of the form
  • (k1,I1), (k2,I2), , (kn, In)
  • Search Problem - For key value K, locate the
    record (kj, Ij) in T such that kjK.
  • Searching is a systematic method for locating the
    record(s) with key value kjK.
  • A successful search is one in which a record with
    key kjK is found.
  • An unsuccessful search is one in which no record
    with kjK is found (and does not exist).

2
Searching Ordered Arrays
  • Binary Search - been there done that.
  • Dictionary Search - interpolation search
  • Determine how far from an endpoint your value is
    probably going to be.
  • Pos(value-Alo)/(Ahi-Alow) (hi-lo)
  • Look here rather than mid
  • Assumes the data is evenly distributed.

3
Lists Ordered by Frequency
  • Order lists by (expected) frequency of
    occurrence.
  • Perform sequential search
  • Cost for first record 1
  • Cost for second record 2
  • Search cost 1p1 2 p2 3p3 npn
  • Worst case (n1)/2
  • Best if a few items are accessed many times

4
Self Organizing Lists
  • 80/20 rule 80 of the accesses are to 20 of the
    records
  • expected search cost .122n
  • Self organizing lists modify the order of records
    within the list basedon the actual pattern of
    record accesses.
  • Self organizing lists use a rule called a
    heuristic for deciding how to reorder the list.

5
Self Organizing Heuristics
  • Order by actual frequency - most frequently used
    first
  • When a record is found, swap it with the first
    item
  • When a record is found, move it to the front of
    the list
  • When a record is found, swap it with the record
    ahead of it

6
Hashing
  • The process of mapping a key value to a position
    in a table.
  • A hash function maps key values to positions.
  • A hash table is an array that holds the records.
  • The hash table has M slots (0M-1)
  • For any value K in the key range and some hash
    function h,
  • h(k) I where 0 IltM, and key(TI)K

7
Hashing Situations
  • Hashing is appropriate for unique keys.
  • Good for both in-memory and disk based
    applications.
  • Answers the question What record, if any, has
    key value K?
  • Example Store the n records with keys in range
    0-(n-1).
  • Store the record with key i in slot i.
  • Uses the hash function h(k)k. (Identity
    function).

8
Collisions
  • More reasonable example
  • Store about 1000 records with keys in the range
    0-16,383.
  • Impractical to keep a table of size 16,384.
  • We need a hash function to map keys to a smaller
    range.
  • Given a hash function h and different keys k1 and
    k2. Let ? be a position in the hash table.
  • If h(k1 ) h(k2 ) ? then k1 and k2 have a
    collision at ? under h.

9
Collision Resolution
  • To search for the record with key K
  • Compute the table location h(K).
  • Starting with slot h(K), locate the record
    containing key K using (if necessary) a collision
    resolution policy.
  • Collisions are inevitable in most applications.
  • Example In a group of 23 people the odds are
    good that at least one pair share a birthday.

10
Hash Functions
  • Must return a value within the table range.
  • Should evenly distribute the records to be stored
    among the table slots.
  • Ideally, the function should distribute records
    with equal probability to all the positions. In
    reality, usually depends on the data.
  • If we know nothing about the key distribution,
    evenly distribute the key range among the
    positions.
  • If we know about the key distribution, use a
    distribution dependant hash function.

11
Example Hash Functions
  • h(key)key 16 - uses only last 4 bits.
  • H(key)key 1000 - uses last 4 digits.
  • Use tablesize to make sure result is in the
    range.
  • Mid-square method square the key and take the
    middle r bits for a table of size 2r
  • Sum up ASCII characters and take results modulo
    tablesize (a folding technique).

12
Collision Handling Categories
  • Open hashing - when there is a collision, put
    collided item outside the table.
  • Closed hashing - when there is a collision, put
    collided item inside the table.

13
Open Hashing
  • Look at each table element as the head of a
    linked list of items that has to that position.
  • Can organize the linked lists in many ways
  • ordered unsuccessful searches are quickly
    found.
  • Ordered by frequency if a few are searched for
    frequently, then this is a good technique.
  • If there are N records to be stored and the table
    is of size M then the average search length is
    O(N/M).
  • Good for internal memory. Linked nodes may be in
    different blocks on disk and cause many disk
    accesses.

14
Closed Hashing - Linear Probe
  • If the item you are looking for is not in the
    hash position, look in the next position.
  • Do the same for insert until you find an empty
    location.
  • When you reach the bottom, go to the beginning.
  • Must have at least one empty slot or there will
    be an infinite loop.
  • Tends to have clustering since the collision
    position is not uniformly distributed (i.e. if
    collide at position 4, go to position 5, then 6,
    independent of key).

15
Better Linear Probe
  • Instead of going to the next slot, skip by some
    constant c.
  • The tablesize M and c should be relatively prime.
  • This assures the probing will cycle through all
    the table.
  • Still has some clustering.

16
Quadratic Probe
  • Instead of adding 1 to the key add i2
  • i is the probe sequence, so add 1, 4, 9, 16,...
  • Remember we also mod with table size.

17
Double Hashing
  • After a collision, use a different hash function.
  • Eliminates clustering to some degree.
  • For example if h(k) causes a collision then use
  • p(k,i) ih2(k)
  • h2 is a different hash function
  • generates a different probe sequence

18
Analysis of Closed Hashing
  • load factor lfN/M
  • N is the number of records
  • M is the size of the table
  • N/M is the percent full
  • The larger the load factor the greater the
    probability of a collision
  • Average search length is O(1/(1-lf))

19
Deletions
  • If we delete a value it may stop the search
    prematurely (break the chain).
  • Use a special mark to indicate something was
    deleted. When searching continue if see this
    mark rather than stopping as if it was empty.
  • Once we have many deleted items we may wish to
    rehash everything remaining
  • best if we rehash the most frequently accessed
    items first.
Write a Comment
User Comments (0)
About PowerShow.com