Skip List PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Skip List


1
Skip List Hashing
  • CSE, POSTECH

2
Introduction
  • The search operation on a sorted array using the
    binary search method takes O(logn)
  • The search operation on a sorted chain takes O(n)
  • How can we improve the search performance of a
    sorted chain?
  • By putting additional pointers in some of the
    chain nodes
  • Chains augmented with additional forward pointers
    are called skip lists

3
Dictionary
  • A dictionary is a collection of elements
  • Each element has a field called key
  • (key, value)
  • Every key is usually distinct
  • Typical dictionary operations are
  • Determine whether or not the dictionary is empty
  • Determine the dictionary size (i.e., of pairs)
  • Insert a pair into the dictionary
  • Search the pair with a specified key
  • Delete the pair with a specified key

4
Accessing Dictionary Elements
  • Random Access
  • Any element in the dictionary can be retrieved by
    simply performing a search on its key
  • Sequential Access
  • Elements are retrieved one by one in ascending
    order of the key field
  • Sequential Access Operations
  • Begin retrieves the element with smallest key
  • Next retrieves the next element

5
Dictionary with Duplicates
  • Keys are not required to be distinct
  • Word dictionary is such an example
  • Pairs are of the form (word, meaning)
  • May have two or more entries for the same word
  • For example, the meanings of the word, rank
  • (rank, a relative position in a society)
  • (rank, an official position or grade)
  • (rank, to give a particular order or position to)
  • etc.

6
Application of Dictionary
  • Collection of student records in a class
  • (key, value) (student-number, a list of
    assignment and exam marks)
  • All keys are distinct
  • Get the element whose key is Tiger Woods
  • Update the element whose key is Seri Pak
  • Read Examples 10.1, 10.2 10.3
  • Exercise Give other real-world applications of
    dictionaries and/or dictionaries with duplicates

7
Dictionary ADT Class Definition
  • See ADT 10.1 for the abstract data type
    Dictionary
  • See Program 10.1 for the abstract class
    Dictionary

8
Dictionary as an Ordered Linear List
  • L (e1, e2, e3, , en)
  • Each ei is a pair (key, value)
  • Array or chain representation
  • unsorted array O(n) search time
  • sorted array O(logn) search time
  • unsorted chain O(n) search time
  • sorted chain O(n) search time
  • See Program 10.2 (find), 10.3 (insert), 10.4
    (erase) of the class sortedChain

9
Skip Lists
  • Skip lists improve the performance of insert and
    delete operations
  • Employ a randomization technique to determine
    where and how many to put additional forward
    pointers
  • The expected performance of search and delete
    operations on skip lists is O(logn)
  • However, the worst-case performance is ?(n)

10
Dictionary as a Skip List
  • Read Example 10.4 and see Figure 10.1 for
  • A sorted chain with head and tail nodes
  • Adding forward pointers
  • Search and insert operations in skip lists
  • For general n, the level 0 chain includes all
    elements
  • Level 1 chain includes every second element
  • Level 2 chain includes every fourth element
  • Level i chain includes 2ith element
  • An element is a level i element iff it is in the
    chains for levels 0 through i

11
Skip List pointers, search, insert
12
Skip List Insertions Deletions
  • When insertions or deletions occur, we require
    O(n) work to maintain the structure of skip lists
  • When an insertion is made, the pair level is i
    with probability 1/2i
  • We can assign the newly inserted pair at level i
    with probability pi
  • For general p, the number of chain levels is
    ?log1/pn? 1
  • See Figure 10.1(d) for inserting 77
  • We have no control over the structure that is
    left following a deletion

13
Skip List Assigning Levels
  • The level assignment of newly inserted pair is
    done using a random number generator (0 to
    RAND_MAX)
  • The probability that the next random number is ?
    Cutoff p RAND_MAX is p
  • The following is used to assign a level number
  • int lev 0
  • while (rand() lt CutOff) lev
  • In a regular skip list structure with N pairs,
    the maximum level is ?log1/pN? - 1
  • Read Example 10.5

14
Skip List Class definition
  • The class definition for skipNode is in Program
    10.5
  • The data members of the class skipList is defined
    in Program 10.6
  • See Program 10.7 10.12 for skipList operations

15
Hash Table
  • A hash table is an alternative method for
    representing a dictionary
  • In a hash table, a hash function is used to map
    keys into positions in a table. This act is
    called hashing
  • The ideal hashing case if a pair p has the key k
    and f is the hash function, then p is stored in
    position f(k) of the table
  • Hash table is used in many real world
    applications!

16
Hash Table
  • Hash Table Operations
  • Search compute f(k) and see if a pair exists
  • Insert compute f(k) and place it in that
    position
  • Delete compute f(k) and delete the pair in that
    position
  • In ideal situation, hash table search, insert or
    delete takes ?(1)
  • Read Examples 10.6 10.7

17
Ideal Hashing Example
  • Pairs are (22,a),(33,c),(3,d),(72,e),(85,f)
  • Hash table is ht07, b 8 (where b is the
    number of positions in the hash table)
  • Hash function f is key b key 8
  • Where are the pairs stored?

18
What Can Go Wrong? - Collision
  • Where does (25,g) go?
  • The home bucket for (25,g) is already occupied by
    (33,c)
  • ? This situation is called collision
  • Keys that have the same home bucket are called
    synonyms
  • 25 and 33 are synonyms with respect to the hash
    function that is in use

19
What Can Go Wrong? - Overflow
  • A collision occurs when the home bucket for a new
    pair is occupied by a pair with different key
  • An overflow occurs when there is no space in the
    home bucket for the new pair
  • When a bucket can hold only one pair, collisions
    and overflows occur together
  • Need a method to handle overflows

20
Hash Table Issues
  • The choice of hash function
  • Overflow handling
  • The size (number of buckets) of hash table

21
Hash Functions
  • Two parts
  • Convert key into an integer in case the key is
    not
  • Map an integer into a home bucket
  • f(k) is an integer in the range 0,b-1,where b
    is the number of buckets in the table

22
Converting String to Integer
  • Let us assume that each character is 2 bytes long
  • Let us assume that an integer is 4 bytes long
  • A 2 character string s may be converted into a
    unique 4 byte integer using the following code
  • int answer (int) s0
  • answer (answer ltlt 16) (int) s1
  • In this case, strings that are longer than 2
    characters do not have a unique integer
    representation
  • Read Example 10.8 and see Program 10.13

23
Mapping Into a Home Bucket
  • Most common method is by division
  • homeBucket k divisor
  • Divisor equals to the number of buckets b
  • 0 lt homeBucket lt divisor b

24
Overflow Handling
  • Search the hash table in some systematic fashion
    for a bucket that is not full
  • Linear probing (linear open addressing)
  • Quadratic probing
  • Random probing
  • Eliminate overflows by permitting each bucket to
    keep a list of all pairs for which it is home
    bucket
  • Array linear list
  • Chain

25
Hashing with Linear Open Addressing
  • If a collision occurs, insert the entry into the
    next available bucket regarding the table as
    circular
  • Example
  • the size of hash table b 11
  • f(k) k b
  • after inserting the three keys 80, 40, and 65

26
Linear Open Addressing
  • Example
  • after inserting the two keys 58 (collision) and
    24
  • after inserting the key 35 (collision)

27
Linear Open Addressing
  • Search operation
  • The search begins at the home bucket f(k) of the
    key k
  • Continue the search by examining successive
    buckets in the table until one of the following
    happens
  • (c1) A bucket containing an element with key k is
    reached
  • (c2) An empty bucket is reached
  • (c3) We return to the home bucket
  • In the cases of (c2) and (c3), the table contains
    no element with key k

28
Linear Open Addressing
  • Delete operation
  • Perform the search operation to find the bucket
    for key k
  • Clear the bucket
  • Then do either one of the following
  • Move zero or more elements to fill the empty
    bucket
  • Introduce and use the NeverUsed field in each
    bucket (Read how this is done on page 388)
  • See Programs 10.16-10.19 for hashTable class
    definition and operations

29
Performance of Linear Probing
  • The worst-case search/insert/delete time is
    ?(n),where n is the number of pairs in the table
  • When does the worst-case happen?
  • When all n key values have the same home bucket
  • For the worst case, the performance of hash table
    and linear list are the same
  • However, for average performance, hashing is much
    better

30
Expected (Average) Performance
  • alpha loading factor n / b
  • Sn average number of buckets examined in a
    successful search
  • Un average number of buckets examined in an
    unsuccessful search
  • Time to insert and delete is governed by Un.

31
Expected Performance
  • Sn ½ (1 1/(1-alpha))
  • Un ½ (11/(1-alpha)2)
  • Note that 0 lt alpha lt 1.

alpha Sn (buckets) Un (buckets)
0.50 1.5 2.5
0.75 2.5 8.5
0.90 5.5 50.5
32
Hash Table Design
  • In practice, the choice of the devisor D (i.e.,
    the number of buckets b) has a significant effect
    on the performance of hashing
  • Best results are obtained when D is either a
    prime number or has no prime factors less than 20
  • The key is how do we determine D (see the next
    slide)
  • Read Example 10.12

33
Methods for Determining D
  • Method 1
  • First, determine what constitutes acceptable
    performance.
  • Use the formulas Un and Sn, determine the largest
    alpha that can be used.
  • From the value of n and the computed value of
    alpha, obtain the smallest permissible value for
    b.
  • Method 2
  • Begin with the largest possible value for b as
    determined by the max. amount of space available.
  • Then find the largest D no larger than this
    largest value that is either a prime or has no
    factors smaller than 20.

34
Hashing with Chains
  • Hash table can handle overflows using chaining
  • Each bucket keeps a chain of all pairs for which
    it is the home bucket (see Figure 10.3)
  • The chain may or may not be sorted by key
  • See Program 10.20 for hashChains methods

35
Hash Table with Sorted Chains
  • Put in pairswhose keys are6,12,34,29,28,11,23,7
    ,0,33,30,45
  • Home bucket key 17.

36
Exercise Reading
  • Exercise
  • Suppose we are hashing integers with a 7-bucket
    hash table using the hash function f(k) k 7.
  • (a) Show the hash table if 1, 8, 23, 40, 51, 69,
    70 are to be inserted. Use the linear open
    addressing method to resolve collisions.
  • (b) Repeat part (a) using chaining to resolve
    collisions. Assume the chain is sorted.
  • Read Chapter 10
Write a Comment
User Comments (0)
About PowerShow.com