Chapter 4: Sorting - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Chapter 4: Sorting

Description:

Therefore, to insert 1 item into an already sorted list is in (n) ... If n is large, the number of recursive calls makes our shiftVacRec inefficient ... – PowerPoint PPT presentation

Number of Views:115
Avg rating:3.0/5.0
Slides: 45
Provided by: OIT299
Category:
Tags: chapter | sorting

less

Transcript and Presenter's Notes

Title: Chapter 4: Sorting


1
Chapter 4 Sorting
  • Sorting is an important application to study
  • Sorting is used in numerous applications
  • As time as gone on, technology has permitted
    easier sorting
  • For instance, it is no longer necessary (usually)
    to worry about how to keep portions of a list on
    disk because memory was too small in size
  • From an algorithmic perspective, we find many
    different strategies for sorting
  • Each has its own advantages and disadvantages
  • We will study several sorting algorithms here and
    determine their average and worst-case
    complexities
  • We will also note the amount of temporary storage
    required for each so that we can compare their
    space complexity as well as their time complexity
  • Many of the algorithms we will view will be
    limited to sorting lists implemented as arrays,
    but some also accommodate linked lists

2
Insertion Sort
  • Basic idea
  • given a new item, insert it into an already
    ordered list
  • repeat this process for each item in the list
    until the list is sorted
  • for instance, insert the second element with
    respect to the first, then insert the third
    element with respect to the first and second
    (which are already sorted), etc
  • The insertion portion can be done either
    iteratively or recursively (see example to the
    right)
  • Notice that the given code has two comparisons
    (in bold), one being the base case
  • int shiftVacRec(Element E,
  • int vacant, Key x)
  • int xLoc
  • if (vacant 0)
  • xLoc vacant
  • else if (Evacant-1.key
  • xLoc vacant
  • else
  • Evacant Evacant-1
  • xLoc shiftVacRec(E,
  • vacant-1, x)
  • return xLoc

3
Complexity
  • The recurrence equation for shiftVacRec is easy
    to derive
  • T(n) T(n-1) 1 with the base case being T(0)
    1
  • Again remember that we are only counting
    comparisons
  • T(n) T(n-1) 1 T(n-2) 2 T(n-3) 3
    T(0) n 1 n ? ? (n)
  • Therefore, to insert 1 item into an already
    sorted list is in ? (n)
  • How do we perform the insertion sort? We use
    shiftVacRec n-1 times, once for each item in the
    array
  • However, notice that n, from T(n) grows (or
    changes)
  • The number of items in the array for shiftVacRec
    to work with increases from 1 for the first
    iteration to n-1 for the last
  • So in fact, we have, in the worst case, a
    sequence of 1 comparison the first time, 2 the
    second, 3 the third, etc up to n-1 the last time
  • We saw previously that the summation from 1 to
    n-1 n (n 1) / 2
  • So, insertion sort takes ½(n2-n) which is ? ?
    (n2)

4
Iterative Insertion Sort
  • If n is large, the number of recursive calls
    makes our shiftVacRec inefficient because of the
    overhead of a procedure call
  • Instead we might choose to implement insertion
    sort iteratively by replacing procedure calls
    with a loop
  • we will find other sorting algorithms where we
    must use recursion, but here we do not HAVE to
  • This code is given to the right
  • It should be easy to see that the while loop will
    execute at most 1 time for the first iteration of
    xindex, 2 times for the second, etc and n-1
    times for the last, so iterative insertion sort
    does the same amount of work, ? ? (n2)
  • void insertionSort(Element E, int n)
  • int xindex, xLoc Element current
  • for(xindex1xindex
  • current Exindex
  • xLoc shiftVac(E, xindex, x)
  • ExLoc current
  • return
  • int shiftVac(Element E, int xindex, x)
  • int vacant, xLoc
  • vacant xLoc
  • while (vacant 0)
  • if(Evacant-1
  • xLoc vacant break
  • EvacantEvacant-1
  • vacant--
  • return xLoc

5
Average Case Complexity
  • The average case complexity is not as easy to
    determine
  • First, assume that no values of the array are
    equal
  • Next, assume that a value to be inserted has an
    equal chance of being inserted in any location
  • There are i1 locations to insert the ith item
    (before the first, between first and second, etc,
    after last) see figure below
  • In order to determine the average case, we
    compute the average of inserting i into each of
    the i1 possibilities
  • For each of the i1 possibilities, there is 1
    comparison (the if-statement) and there is a 1 /
    (i 1) chance so we have

1/(i1) ?i1j (i 1) 1/(i1) ?i1j (i)
j (1 / (i1) ) 1 / (i 1) (i1)i/2 i)
i / 2 i / (i 1) to insert the ith item
(remember i ranges from 1 to n-1)
6
Average Case Continued
  • So, all of insertion sort takes
  • A(n) ?i1n-1 i / 2 i / (i 1)
  • ?i1n-1 i / 2 1 - 1 / (i 1)
  • (n-1)n / 2 / 2 n 1 - ?i1n-1 1 / (i 1)
  • The first term above is (n2-n)/4, the second term
    is n-1, and the last term is roughly equal to ln
    n
  • see Example 1.7 on pages 26-27 for an explanation
  • Therefore, insertion sort, on average, takes ?
  • (n2 - n)/4 n 1 ln n (n2 5n 4) / n
    ln n ?
  • (n2)/4 so insertion sorts average case is ? ?
    (n2)
  • Insertion sort has a space complexity of ?(n)
    because it is an in-place sort, that is, it
    does not copy the array and only needs 1
    temporary variable, for x (the item being
    inserted)

7
Lower Bound on in-place Sorts
  • The idea behind insertion sort is to sort
    in-place by taking one element and finding its
    proper position
  • Other sorting algorithms are similar
  • The common aspect is that these sorting
    algorithms place 1 item in its proper position
    for each iteration
  • What is the lower bound on such an algorithm?
  • Consider that in such an algorithm, we are
    comparing some portion of an already sorted array
    to a new item
  • Such a comparison must encompass all items in the
    already sorted portion of the array to an item to
    be placed
  • If the comparison is to the entire array (as is
    the case with Selection Sort) then we have n
    comparisons performed n-1 times
  • If the comparison is to a growing portion of the
    array, we have a sequence of comparisons of 1 2
    3 n-1 (nn-1)/2
  • Therefore, the minimum amount of work in which
    exactly 1 item is placed per iteration is ? ?
    (n2)
  • Does this mean insertion sort is an optimal sort?

8
Divide and Conquer Sorts
  • Let us apply recursion to sorting as follows
  • The sorting algorithm is solved by dividing the
    problem into
  • Dividing the array into smaller arrays
  • Sorting the smaller arrays
  • Combining the sorted smaller arrays into a larger
    array
  • See the algorithm to the right
  • Our algorithms complexity is described by the
    following recurrence relation
  • T(n) D(n) ? S(size part i) C(n)
  • We must determine how to divide the arrays (D),
    how to sort each one (S), and how to combine them
    when done (C)
  • We might find that S is simplified by recursion
  • Solve(I)
  • n size(I)
  • if (n
  • solution directlySolve(I)
  • else
  • divide I into I1, , Ik.
  • for each i in 1, , k
  • Si solve(Ii)
  • solution combine(S1, , Sk)
  • return solution

9
Quicksort
  • In Quicksort, we use the exact strategy described
    previously as follows
  • Before dividing, move elements so that, given
    some element x, all elements left of x are less
    than Ex and all elements right of x are greater
    than Ex
  • Now we divide by simply repeating this on the
    left hand side and right hand side of x
  • Combining is not necessary (it is simply a return
    statement)
  • This brings about some questions
  • What element should be x? Is the choice of x
    important?
  • How do we move elements so that x is positioned
    correctly?
  • What kind of complexity will this algorithm yield?

10
Quicksort Continued
  • We divide Quicksort into two procedures, the main
    Quicksort procedure finds x and recursively calls
    itself with the left-hand and right-hand sides
  • Partition will be used to find move the elements
    of the array around so that x falls in its proper
    place with respect to all elements elements x
  • That is, Ex where z
  • Partition does most of the work of the Quicksort
    algorithm
  • How might Partition work?
  • Move from right-to-left until we find an item
    less than Ex, we move that item into x
  • Move from left-to-right (after x) until we find
    an item greater than Ex and move that to the
    newly freed position
  • Repeat until we meet in the middle somewhere and
    place Ex there
  • Partition will take n-1 comparisons

11
Quicksort Algorithm
void quicksort(Element E, int first, int
last) int splitPoint, Element pivot
if(first splitPoint partition(E, pivot, first,
last) EsplitPoint pivot
quicksort(E, first, splitPoint 1)
quicksort(E, splitPoint1, last) return
int partition(Element E, Element pivot,
int first, int last) int low first, high
last int lowVac, highVac lowVac
low highVac high while (low while(EhighVacpivot)
highVac-- ElowVac Ehigh
while(ElowVacEhighVacElow lowlowVac
highhighVac-1 return low

12
Quicksort Analysis
  • Quicksort has a recurrence equation of
  • T(n) T(n-r-1) T(r) n - 1
  • Where r is the number of elements to Pivots
    right
  • notice the - 1 in n r 1 because pivot is
    already in its proper place
  • How do we solve T(n) when r changes each time?
  • The worst case occurs when r1 or rn1, so lets
    use r 1
  • This gives us T(n) T(n-1) T(1) n 1 with
    T(1) 0
  • So T(n) T(n-1) n - 1 T(n-2) 2n - 2
    T(n-3) 3n - 3 T(1) (n-1)n - n
    n(n-1) n n2 2n
  • So, Quicksorts worst case is in ? ? (n2)

13
Quicksort Analysis continued
  • What about Quicksorts average case?
  • This will happen when r 1 and r will r be?
  • On average, r will be between these two extremes
    and the average value of r is then
  • 1/n ?i1 to n-1 i n(n-1)/2n (n-1) / 2
  • So, the average case complexity has the following
    recurrence equation
  • T(n) T((n-1)/2) T((n-1)/2) n 1 2
    T((n-1)/2) n 1
  • Using the Master Theorem from chapter 3, we have
    f(n) n 1, b 2 and c 2 so that E 1 and
    that means that T(n) is in ? ? (n log n)
  • A more formal analysis is given on pages 167-168
    if you want to see the math!
  • What is Quicksorts space usage? Partition is
    done in place and so the only space required is
    the array plus a few temporary variables, or
    Quicksorts space usage is ? ? (n)

14
Improving Quicksort
  • Even though Quicksort has a poor worst-case
    complexity (as opposed to some other sorting
    algorithms), the partition strategy is faster
    than most other sorting mechanisms, so Quicksort
    is a desired sort as long as we can prevent the
    worst case from arising
  • How?
  • There are numerous improvements
  • Make sure the pivot is a good one, possibly by
    selecting 3 values (at random, or the first 3, or
    the first, middle and last in the array) and
    finding the median. This causes partition to be
    slightly more complicated, but not any more
    computationally complex
  • Remove the subroutines from partition (the
    version presented in these notes is like that,
    the version in the textbook uses the subroutines)
  • For small arrays, use a different sort
  • Optimize the stack space so that the recursive
    calls do not slow down the process

15
Merging Sorted Sequences
  • Recall from our earlier idea to use divide and
    conquer to sort, we need a way to combine the
    sorted subarrays
  • We did not need this in Quicksort because
    Quicksort didnt really need a combining step
  • How can we do this?
  • Lets assume we have two sorted subarrays, A and
    B
  • We want to sort them into a new array C which is
    equal in size to A B
  • Which item goes into C0? It will either be
    A0 or B0
  • Which item goes into C1? It will either be
    A1 or B0 if A0 has already been placed, or
    A0 or B1 if B1 has already been placed
  • etc
  • Until we have placed all of A or B into C, then
    the remainder of the merge requires copying the
    rest of whichever array still remains

16
Recursive Merge
  • Merge algorithm (to the right)
  • moves the rest of A or B into C if the other
    array is done, or finds the smaller of
    acurrentA, bcurrentB and moves it into c, and
    then recursively calls itself with the rest of a
    and b
  • The recurrence equation for merge is
  • T(n) T(n -1) 1
  • The base case 0, however what n is the base
    case? It depends on when we run out of one of
    the two arrays, but this will be no greater than
    n, so T(n) ? ? (n) and in the worst case, T(n)
    n - 1

void merge(Element a, b, c, int sizeA,
sizeB, currentA, currentB, currentC)
if(currentA sizeA) for(int
kcurrentB k
ccurrentC bk else if(currentB
sizeB) for(int kcurrentA k
ccurrentC ak else
if(acurrentA
ccurrentC acurrentA merge(a,
b, c, sizeA, sizeB, currentA, currentB,
currentC) else ccurrentC
bcurrentB merge(a, b, c, sizeA,
sizeB, currentA, currentB, currentC) return
17
Iterative Merge
  • The iterative version of merge works similar to
    the recursive one, move one element into array c
    based on whether the current element of a or b is
    smaller, and repeat until one array is emptied
    out and then move the remainder of the other
    array into c
  • see algorithm 4.4, page 172
  • This requires n total copies (number of elements
    in a and b n) but the number of comparisons
    varies depending on when the first array is
    emptied
  • We have the same situation as with the recursive
    version T(n) n 1 in the worst case and T(n) ?
    ? (n) in any case

18
Optimality of Merge
  • Is there any better way to merge two arrays?
  • Consider the following two arrays
  • A 1, 2, 3, 4, 5, 6, 7, 8
  • B 9, 10, 11, 12, 13, 14, 15, 16
  • We can merge these two arrays with 1 comparison
    (how?)
  • In the worst case, our merge algorithm does the
    least amount of work required, consider these
    arrays
  • A 1, 3, 5, 7, 9, 11, 13, 15
  • B 2, 4, 6, 8, 10, 12, 14, 16
  • We could not improve over merge in this case
    because every pair of Ai and Bj will have to
    be compared where i and j are equal or off by one
  • Thus, there are n-1 comparisons in the worst case
    (n-2 is possible in the worst case if A and B
    differ in size by 1 element)

19
Merges Space Usage
  • While Merge gives us an optimal worst-case
    merger, it does so at a cost of space usage
  • As seen in the previous examples, we have two
    arrays of size n combined to merge into array C
  • Array C then must be of size n
  • So Merge takes 2 n space as opposed to n for
    Quicksort and Insertion sort
  • Could we improve?
  • In a non-worst case situation, yes, by not
    copying all of the second array into C, but
    instead, copying what we have in C back into the
    original array
  • There is a discussion of this on pages 173-174 if
    you are interested

20
Mergesort
  • Now that we have a merge algorithm, we can define
    the rest of the sorting algorithm
  • Recall that we need a mechanism to
  • Divide arrays into subarrays
  • Sort each subarray
  • Combine the subarrays
  • Merge will combine the subarrays
  • Sorting each subarray is done when two sorted
    subarrays are merged, so we wont need a sort
  • Dividing arrays into two subarrays will be done
    by finding the midpoint of the two arrays and
    calling the divide/sort/combine procedure
    recursively with the two halves

21
Mergesort Continued
  • The Mergesort algorithm is given to the right
  • The midpoint will either be at n/2 or (n-1)/2
    creating two subarrays that will be (n-1)/2 in
    size, or n/2 in size
  • To simplify, we will consider these to be
    floor(n/2) and ceiling(n/2)
  • Each recursive call on a subarray of size k will
    require at most k-1 comparisons to merge them
  • Since all subarrays at any recursive level will
    sum up to a total of n array elements, the number
    of comparisons at that level is n-1
  • The recurrence relation is then
  • T(n) T(floor(n/2)) T(ceil(n/2)) n-1
  • To simplify, T(n) 2 T(n/2) n - 1
  • void mergeSort(Element E,
  • int first, int last)
  • if (first
  • int mid (firstlast)/2
  • mergeSort(E, first, mid)
  • mergeSort(E, mid1, last)
  • merge(E, first, mid, last)
  • return
  • The base case T(1) 0
  • With f(n) n 1, b 2, c 2, we have E 1
    and the Master Theorem tells us that T(n) ? ? (n
    log n)

22
Recursion Tree for Mergesort
  • We can also visualize Mergesorts complexity
    through the recursion tree to obtain a more
    precise complexity
  • Notice that each level doubles the number of
    recursive calls but at each level, the amount of
    work needed is x fewer comparisons where x
    doubles per level (1, 2, 4, 8, etc)
  • Thus, Mergesort will require the following amount
    of work ? (i 0 to ceiling (log n)) (n 2i)
    (n 1) (n 2) (n 4) (n 8) (n
    (n 1)) (n n)
  • ? (i 0 to ceiling (log n)) (n)
  • n log n
  • ? (i 0 to ceiling (log n)) 2i
  • 2log n1 1
  • (n 1) 1 n
  • So, we get a worst case
  • complexity of n log n n
  • In fact, mergesorts worst case
  • complexity is between
  • ceiling(n log n n 1)
  • ceiling(n log n .914 n)

23
Lower Bound for Sorting with Comparisons
  • Consider our worst-case complexities for sorting
  • ? (n2) for in-place sorts that move 1 item at a
    time
  • ? (n log n) for divide and conquer based sorts
  • Can we do better for any sorting algorithm that
    compares pairs of values to find their positions?
  • The reason we ask the question as above is that
    we will next see a sort that doesnt compare
    values against themselves
  • The answer to our question is unfortunately no,
    we cannot do better
  • Why not?
  • Consider for n items
  • there is n! possible permutations of those n
    items
  • for n 3, we would have 6 combinations
  • x1, x2, x3, x1, x3, x2, x2, x1, x3, x2,
    x3, x1, x3, x1, x2, x3, x2, x1
  • lets arrange the possibilities in a tree where
    we traverse the tree to make the fewest
    comparisons this is known as a decision tree

24
Our Decision Tree (for n 3)
  • First, compare x1 and x2
  • if x1 branch
  • On left, compare x2 and x3, on right, compare x1
    and x3
  • On left, if x2 x1, x2, x3
  • On right, if x1 is x2, x1, x3
  • We might need another comparison
  • How many comparisons might we have to make and
    why?

The height of this tree is log n! because there
are n! leaf nodes So, the maximum number of
comparisons is log n! since we might not know
the sequence until reaching the leaves of the
tree How much is log n! ?
25
Lower Bound for Worst Case
  • We see from the previous slide that the least
    number of comparisons needed to sort by comparing
    individual array elements is log n!
  • How does log n! compare to n log n, our current
    lower bound for worst case?
  • n! n (n 1) (n 2) 3 2 1
  • log n! log (n (n 1) (n 2) 2 1)
  • log n log n 1 log n 2 log 2 log
    1
  • ? (i 1 to n) log i 1n log x dx
    log e 1n ln x dx
  • log e (x ln x x) 1n log e (n ln n n
    1) n log n n log e log e
  • log e n log n
    1.443 n 1.443
  • for our worst case complexity, we round off to
    the nearest integer and so log n! ceiling (n
    log n 1.443 n) we can omit the final
    1.443 as it is merely a constant
  • So our lower bound for a worst case complexity is
    ceiling (n log n 1.443 n)

26
Lower Bound for Average Case
  • Can we improve over n log n for an average case
    complexity?
  • The proof for this is given on pages 180-181
  • However, we can simplify this proof by
    reconsidering the decision tree
  • In any decision tree, the leaves will all occur
    at either level log n! or (log n!) 1
  • So, our average case complexity will be between
    log n! and log n! 1 n log n 1.443 n so,
    like our worst case complexity, the lower bound
    for average case complexity is n log n 1.443
    n
  • Notice in this case that we do not need to take
    the ceiling since average case complexity does
    not have to be an integer value

27
Mergesort vs. Quicksort
  • We have two sorting algorithms (so far) that can
    give us n log n average case complexity, but
  • Mergesort requires twice the amount of storage
    space
  • Quicksort cannot guarantee n log n complexity
  • Mergesorts merge operation is more time
    consuming than Quicksorts partition operation
    even though they are both in ? (n)
  • In practice, Mergesort does 30 fewer comparisons
    in the worst case than Quicksort does in the
    average case, but because Quicksort does far
    fewer element movement, Quicksort turns out to
    often be faster
  • But what if we want a guaranteed better
    performance than Mergesort? Quicksort cannot
    give us that.
  • So we turn to a third n log n sort, Heapsort
  • Heapsort is interesting for two reasons
  • It guarantees n log n performance in average and
    worst case, like Mergesort, but is faster than
    Mergesort
  • It does no recursion, thus save stack space and
    the overhead of a stack

28
Heapsort and Heaps
  • You might recall from 364 a Heap
  • A binary tree stored in an array with the Heap
    property
  • A value stored in a heap will be greater than
    values stored in the values nodes subtrees
  • The tree must be a left-complete tree, which
    means that all leaf nodes are on the bottom two
    levels such that the nodes on the lowest level
    have no open nodes to their left
  • In an array, a tree has the following pattern
  • Node i has children in position i2 and i21
  • To satisfy the second attribute of a heap above,
    it means that the tree will be stored in array
    locations 0..n-1 for n nodes (or 1..n if we are
    using a language other than Java/C/C)
  • Because of the first attribute, we know the
    largest value will be at the root of the heap, so
    Heapsort iteratively removes the root of the heap
    and restructures the heap until the heap is empty
  • Thus, Heapsort will sort in descending order
  • NOTE we can change the heap property so that a
    node is less than any value in its subtree to
    sort in ascending order

29
Heapsort
  • The Heapsort algorithm itself is simple once you
    have the Heap ADT implemented
  • Given an array A of elements
  • for(i0i
  • heap.add(ai)
  • for(ia.length-1i0i--)
  • aiheap.delete( )
  • That is, take the original array and build the
    heap by adding 1 element at a time
  • Each time a new value is added to the heap, the
    heap is restructured to ensure the heap structure
    such that it is left-complete and each node is
    greater than all nodes in the subtree
  • Now, refill array a by removing the largest item
    from the heap, restructure the heap, and repeat
    until the heap is empty

30
Adding to the Heap
  • Consider a heap as stored in the array given
    below
  • We now want to add a new value, 16
  • Start by placing 16 at the end of the array and
    then walk the value up in position until it
    reaches its proper place
  • If the array currently has n items, then insert
    16 at location n and set temp n
  • Now, compare 16 to its parent (which will be at
    temp / 2)
  • If (heaptemp heaptemp / 2) then this value
    is greater than the parent, swap the two
  • Continue until either heaptemp or temp 0 (we have reached the root of the
    tree)

Original Heap 16 inserted 16 walked into place
31
Deleting from the Heap
  • We only want to remove the largest element from
    the heap, which by definition must be at the root
    (index 0)
  • Store heap0 in a temporary variable to be
    returned
  • Now, restructure the heap by moving the item at
    heapn-1 (the last element in the array) to the
    root and walking it down into its proper
    position. How?
  • Let temp 0
  • Compare heaptemp with heaptemp2 and
    heaptemp21 that is, with its two children
  • If heaptemp is not the largest of the three,
    swap heaptemp with the larger of the two
    children and repeat until either the value is at
    a leaf, or is greater than its two children and
    thus in its proper place
  • Return the old root (temporary value) and
    subtract 1 from n

Return 22 Walk down 10
After walking 10 into place
32
Heapsort Code and Analysis
  • The student is invited to examine the algorithm
    as given on page 184 and 186 and the code on page
    190 and 191
  • It is not reproduced here for brevity (also, the
    code in the book is not the best version!)
  • What is the complexity of this algorithm?
  • Since the heap is always a balanced tree, it
    should be obvious that the heaps height will be
    log n or log n 1
  • To add an element requires walking it up from a
    leaf node to its proper position, which at most,
    will be the root, or log n operations
  • To remove an element requires deleting the root,
    moving the last item to the root and walking it
    down to its proper position, which at most will
    be a leaf, or log n operations

33
Analysis Continued
  • How many times do we perform walkup and walkdown
    during a sort? Once per add and once per delete
  • How many times do we add? n times
  • How many times do we delete? n times
  • Since walkup and walkdown are both log n
    operations, it takes ? (n log n) to build the
    heap and ? (n log n) to remove all items from the
    heap
  • So, Heapsort is in ? ? (n log n) in the worst
    case
  • A more formal analysis is given on pages 190-191
    where we see that the actual number of
    comparisons is roughly 2 (n log n 1.443 n)
  • What about the average case?
  • We must determine the average amount of work
    performed by walkup and walkdown
  • Lets assume that all elements will differ in the
    array to be sorted
  • Then, to insert element i, there is a 1 / (i 1)
    chance that the element will fall between any two
    other elements (as we saw with insertion sort)
  • However, unlike insertion sort, the amount of
    work does not range from 1 to i comparisons but
    instead from 1 to log i comparisons

34
Analysis Continued
  • The average amount for any given walk up or walk
    down is 1 / (j 1) ? (i 1 to j) log i
  • (j log j 1.443 j) / (j 1) ? log j 1.443
  • We must now sum this for all j from 1 to n twice
    (once for each walk up and for each walk down)
  • So, the average case complexity for Heapsort
  • 2 ? (i 1 to n) log i 1.443
  • 2 (? (i 1 to n) log i 1.443 n)
  • 2 (n log n 1.443 n 1.443 n) 2 (n
    log n 2 1.443 n)
  • Thus, the only change in complexity between the
    worst and average cases is a doubling of 1.443 in
    the latter term
  • So, the average case of Heapsort is in ? ? (n log
    n)

35
Improving Heapsort
  • Walkup requires fewer comparisons than walkdown
  • walkup compares a given value against its parent,
    walkdown compares a given value against both
    children
  • When a heap is large,
  • the amount of walking down a value might be
    improved by adding some guessing in terms of how
    far down the value might be walked
  • rather than walking it down the whole tree, we
    might walk it down some distance, and then bubble
    up a larger value at a leaf level
  • a value walked down from root to leaf takes ? 2
    log n comparisons
  • if we can walk it halfway down and bubble up a
    value from the leaf, we only need 2 (log n) / 2
    1 (log n) / 2 comparisons 3 log n / 2
    comparisons log (n3) / 2
  • How much of an improvement is this over log n?
  • If n 1000, the normal walkdown takes 20
    comparisons, the bubble up walkdown takes 15
  • But this improvement is risky, what if we dont
    need to do any bubbling up? See pages 192-196
    for a more detailed analysis

36
The Shell Sort
  • Shell Sort algorithm is somewhat like the
    Insertion Sort
  • It is an in-place sort
  • Keys are compared so that smaller keys are moved
    before larger keys
  • The main difference is that in Shell Sort, the
    values being compared are not necessarily next to
    each other
  • Instead, we start by comparing values in
    intervals and lower the interval distance
  • For instance, compare A0, A5, A10, A15
    and compare A1, A6, A11, A16 and compare
    A2, A7, A12, A17 and compare A3, A8,
    A13, A18 and compare A4, A9, A14, A19
  • Next, lower the interval to be 3
  • Next, lower the interval to be 2
  • Finally, lower the interval to be 1

37
The Advantage
  • We already know from earlier that an in-place
    sort has a worst-case complexity in ? (n2) so is
    this an improvement?
  • It turns out that it is because we are not moving
    a single value into its proper place with each
    pass through the list, but instead moving several
    values to their proper place within the given
    intervals
  • We then repeat
  • We do not have to repeat the process n times
    either, but we have to pick a proper interval to
    make it work
  • We wont go over the algorithm in detail (see
    section 4.10 if you are interested) but we will
    note its analysis next

38
Shell Sort Analysis
  • The exact worst case performance of shell sort
    has yet to be proven because it is not known what
    set of intervals is best
  • It has been shown that if only two intervals are
    used, first when 1.72 (n 1/3) and 1, then the
    performance is roughly n5/3
  • It is also known that for intervals of 2k 1 for
    1
  • Finally, there is a sequence of intervals that
    gives Shell Sort a performance in O(n (log n)2)
  • To determine how good this is, we know the
    following
  • n log n
  • So, Shell Sort improves over some of the
    algorithms we have seen in this chapter without
    the overhead of additional memory space or ? (n)
    operations per iteration (as with Mergesort or
    Heapsort)

39
Bucket Sorting
  • Recall earlier that we proved the worst case
    lower bound for a sort that compares values is n
    log n
  • But not all sorting algorithms must compare
    values against other values to be sorted
  • How can we get around this?
  • Lets sort a list of values by analyzing the key
    of each value and placing it in a bucket (or
    pile)
  • We can now sort only those keys in a given pile
  • Then scoop up all of the piles keeping them in
    their proper order
  • If we can create a pile using some n operations
    for n keys, and scoop them up in n operations,
    two thirds of our work is in ? ? (n)
  • Can we also sort each pile in ? ? (n)? Only if
    we can do so without comparing the values in that
    pile
  • Or if we can keep each pile small enough so that
    the k log k operations on the given pile (where k
    is the size of the pile) is small enough to keep
    the entire solution in ? ? (n)

40
Radix Sort
  • Radix sort is an example of a bucket sort where
    the sort a pile step is omitted
  • However, the distribute keys to a pile and
    scoop up pile steps must be repeated
  • The good news because the distribute and
    scoop steps are ? ? (n), and because the number
    of repetitions is a constant based not on n, but
    on the size of the keys, then the Radix sort is ?
    ? (n)
  • The bad news in many cases, the algorithm may
    be difficult to implement and the amount of work
    required to distribute and scoop is in ? ?
    (n) with a large constant multiplier
  • The result is interesting an algorithm with a
    worst case complexity in ? ? (n) but with a
    run-time substantially longer than sorts in ? ?
    (n log n)

41
The Radix Sort Algorithm
  • For the purpose of describing the algorithm
  • we will assume that we are dealing with int keys
    that are no longer than k digits
  • create 10 FIFO queues, queue0..queue9
  • for i k downto 1 do
  • for j 0 to n-1 do
  • temp jth value in the list
  • peel off digit i from temp and place temp into
    queuei
  • for j0 to 9
  • remove all items from queuej and return them to
    the original list in the order they were removed

42
Believe it or not
  • The algorithm really works! We show by
    induction
  • After one pass, the values are sorted by their
    final digit
  • Assuming that after pass k-1, the values are
    sorted from their second to last digits, then
  • If we remove all items from queue 0, then they
    are already sorted from second to last digit.
    Since they all start with 0, they are sorted
    correctly. Next we remove all items from queue 1
    (already sorted), etc and so we will have a
    completely sorted list
  • Example ?

43
Radix Sort Analysis
  • The complexity is easy to figure out from the
    pseudocode earlier
  • In sorting int values
  • We create 10 queues (0 comparisons)
  • We iterate for each digit
  • for int values, assume 10 digits since int values
    are stored in 32 bits, giving a range of roughly
    2 billion to 2 billion
  • The inner loop requires taking each key (n of
    them), peeling off the current digit, and
    determining which queue it is placed into n
    comparisons
  • this assumes a ?(1) enqueue operation and a ?(1)
    peel digit off operation
  • Now we have to remove every item from every
    queue, this requires n dequeues, again assuming
    ?(1) for dequeue
  • Thus, the algorithm has 2 n 2 ? (1)
    operations per iteration and there are 10
    iterations or roughly 20 n comparisons which is
    ? (n) !
  • What if we were dealing with floats or doubles
    instead of int values?
  • What if we were dealing with strings instead of
    int values?

44
The Bad News
  • Radix Sort isnt a great sort because of
    problems
  • We need a lot of extra storage space for the
    queues
  • How much? We need an array of k queues where k
    10 for numeric keys and 26, 52, or 128 for string
    keys (or even 65336 if we are dealing with
    Unicode!)
  • If our queues are linked list based, then our
    queues will take up a total of n entries, but if
    we use arrays for our queues, then we will need n
    entries per queue to make sure we have enough
    space, and so we wind up wasting 9 n (or 25 n
    or 51 n or 127 n or 65335 n) entries!
  • Peeling off a digit from an int, float or double
    is not easy, especially in a language other than
    Java
  • Peeling a char off a string is easy in most
    languages, but peeling a digit off a number might
    require first converting the number to a string,
    or isolating the digit by a series of / and
    operations
  • While Radix sort is in ? (n), the constant
    multiplier is quite large depending on the size
    of the keys
  • Strings might be as many as 255 characters!
Write a Comment
User Comments (0)
About PowerShow.com