Title: Chapter 4: Sorting
1Chapter 4 Sorting
- Sorting is an important application to study
- Sorting is used in numerous applications
- As time as gone on, technology has permitted
easier sorting - For instance, it is no longer necessary (usually)
to worry about how to keep portions of a list on
disk because memory was too small in size - From an algorithmic perspective, we find many
different strategies for sorting - Each has its own advantages and disadvantages
- We will study several sorting algorithms here and
determine their average and worst-case
complexities - We will also note the amount of temporary storage
required for each so that we can compare their
space complexity as well as their time complexity - Many of the algorithms we will view will be
limited to sorting lists implemented as arrays,
but some also accommodate linked lists
2Insertion Sort
- Basic idea
- given a new item, insert it into an already
ordered list - repeat this process for each item in the list
until the list is sorted - for instance, insert the second element with
respect to the first, then insert the third
element with respect to the first and second
(which are already sorted), etc - The insertion portion can be done either
iteratively or recursively (see example to the
right) - Notice that the given code has two comparisons
(in bold), one being the base case
- int shiftVacRec(Element E,
- int vacant, Key x)
- int xLoc
- if (vacant 0)
- xLoc vacant
- else if (Evacant-1.key
- xLoc vacant
- else
- Evacant Evacant-1
- xLoc shiftVacRec(E,
- vacant-1, x)
- return xLoc
3Complexity
- The recurrence equation for shiftVacRec is easy
to derive - T(n) T(n-1) 1 with the base case being T(0)
1 - Again remember that we are only counting
comparisons - T(n) T(n-1) 1 T(n-2) 2 T(n-3) 3
T(0) n 1 n ? ? (n) - Therefore, to insert 1 item into an already
sorted list is in ? (n) - How do we perform the insertion sort? We use
shiftVacRec n-1 times, once for each item in the
array - However, notice that n, from T(n) grows (or
changes) - The number of items in the array for shiftVacRec
to work with increases from 1 for the first
iteration to n-1 for the last - So in fact, we have, in the worst case, a
sequence of 1 comparison the first time, 2 the
second, 3 the third, etc up to n-1 the last time - We saw previously that the summation from 1 to
n-1 n (n 1) / 2 - So, insertion sort takes ½(n2-n) which is ? ?
(n2)
4Iterative Insertion Sort
- If n is large, the number of recursive calls
makes our shiftVacRec inefficient because of the
overhead of a procedure call - Instead we might choose to implement insertion
sort iteratively by replacing procedure calls
with a loop - we will find other sorting algorithms where we
must use recursion, but here we do not HAVE to - This code is given to the right
- It should be easy to see that the while loop will
execute at most 1 time for the first iteration of
xindex, 2 times for the second, etc and n-1
times for the last, so iterative insertion sort
does the same amount of work, ? ? (n2)
- void insertionSort(Element E, int n)
- int xindex, xLoc Element current
- for(xindex1xindex
- current Exindex
- xLoc shiftVac(E, xindex, x)
- ExLoc current
- return
- int shiftVac(Element E, int xindex, x)
- int vacant, xLoc
- vacant xLoc
- while (vacant 0)
- if(Evacant-1
- xLoc vacant break
- EvacantEvacant-1
- vacant--
- return xLoc
5Average Case Complexity
- The average case complexity is not as easy to
determine - First, assume that no values of the array are
equal - Next, assume that a value to be inserted has an
equal chance of being inserted in any location - There are i1 locations to insert the ith item
(before the first, between first and second, etc,
after last) see figure below - In order to determine the average case, we
compute the average of inserting i into each of
the i1 possibilities - For each of the i1 possibilities, there is 1
comparison (the if-statement) and there is a 1 /
(i 1) chance so we have
1/(i1) ?i1j (i 1) 1/(i1) ?i1j (i)
j (1 / (i1) ) 1 / (i 1) (i1)i/2 i)
i / 2 i / (i 1) to insert the ith item
(remember i ranges from 1 to n-1)
6Average Case Continued
- So, all of insertion sort takes
- A(n) ?i1n-1 i / 2 i / (i 1)
- ?i1n-1 i / 2 1 - 1 / (i 1)
- (n-1)n / 2 / 2 n 1 - ?i1n-1 1 / (i 1)
- The first term above is (n2-n)/4, the second term
is n-1, and the last term is roughly equal to ln
n - see Example 1.7 on pages 26-27 for an explanation
- Therefore, insertion sort, on average, takes ?
- (n2 - n)/4 n 1 ln n (n2 5n 4) / n
ln n ? - (n2)/4 so insertion sorts average case is ? ?
(n2) - Insertion sort has a space complexity of ?(n)
because it is an in-place sort, that is, it
does not copy the array and only needs 1
temporary variable, for x (the item being
inserted)
7Lower Bound on in-place Sorts
- The idea behind insertion sort is to sort
in-place by taking one element and finding its
proper position - Other sorting algorithms are similar
- The common aspect is that these sorting
algorithms place 1 item in its proper position
for each iteration - What is the lower bound on such an algorithm?
- Consider that in such an algorithm, we are
comparing some portion of an already sorted array
to a new item - Such a comparison must encompass all items in the
already sorted portion of the array to an item to
be placed - If the comparison is to the entire array (as is
the case with Selection Sort) then we have n
comparisons performed n-1 times - If the comparison is to a growing portion of the
array, we have a sequence of comparisons of 1 2
3 n-1 (nn-1)/2 - Therefore, the minimum amount of work in which
exactly 1 item is placed per iteration is ? ?
(n2) - Does this mean insertion sort is an optimal sort?
8Divide and Conquer Sorts
- Let us apply recursion to sorting as follows
- The sorting algorithm is solved by dividing the
problem into - Dividing the array into smaller arrays
- Sorting the smaller arrays
- Combining the sorted smaller arrays into a larger
array - See the algorithm to the right
- Our algorithms complexity is described by the
following recurrence relation - T(n) D(n) ? S(size part i) C(n)
- We must determine how to divide the arrays (D),
how to sort each one (S), and how to combine them
when done (C) - We might find that S is simplified by recursion
- Solve(I)
- n size(I)
- if (n
- solution directlySolve(I)
- else
- divide I into I1, , Ik.
- for each i in 1, , k
- Si solve(Ii)
- solution combine(S1, , Sk)
- return solution
9Quicksort
- In Quicksort, we use the exact strategy described
previously as follows - Before dividing, move elements so that, given
some element x, all elements left of x are less
than Ex and all elements right of x are greater
than Ex - Now we divide by simply repeating this on the
left hand side and right hand side of x - Combining is not necessary (it is simply a return
statement) - This brings about some questions
- What element should be x? Is the choice of x
important? - How do we move elements so that x is positioned
correctly? - What kind of complexity will this algorithm yield?
10Quicksort Continued
- We divide Quicksort into two procedures, the main
Quicksort procedure finds x and recursively calls
itself with the left-hand and right-hand sides - Partition will be used to find move the elements
of the array around so that x falls in its proper
place with respect to all elements elements x - That is, Ex where z
- Partition does most of the work of the Quicksort
algorithm - How might Partition work?
- Move from right-to-left until we find an item
less than Ex, we move that item into x - Move from left-to-right (after x) until we find
an item greater than Ex and move that to the
newly freed position - Repeat until we meet in the middle somewhere and
place Ex there - Partition will take n-1 comparisons
11Quicksort Algorithm
void quicksort(Element E, int first, int
last) int splitPoint, Element pivot
if(first splitPoint partition(E, pivot, first,
last) EsplitPoint pivot
quicksort(E, first, splitPoint 1)
quicksort(E, splitPoint1, last) return
int partition(Element E, Element pivot,
int first, int last) int low first, high
last int lowVac, highVac lowVac
low highVac high while (low while(EhighVacpivot)
highVac-- ElowVac Ehigh
while(ElowVacEhighVacElow lowlowVac
highhighVac-1 return low
12Quicksort Analysis
- Quicksort has a recurrence equation of
- T(n) T(n-r-1) T(r) n - 1
- Where r is the number of elements to Pivots
right - notice the - 1 in n r 1 because pivot is
already in its proper place - How do we solve T(n) when r changes each time?
- The worst case occurs when r1 or rn1, so lets
use r 1 - This gives us T(n) T(n-1) T(1) n 1 with
T(1) 0 - So T(n) T(n-1) n - 1 T(n-2) 2n - 2
T(n-3) 3n - 3 T(1) (n-1)n - n
n(n-1) n n2 2n - So, Quicksorts worst case is in ? ? (n2)
13Quicksort Analysis continued
- What about Quicksorts average case?
- This will happen when r 1 and r will r be?
- On average, r will be between these two extremes
and the average value of r is then - 1/n ?i1 to n-1 i n(n-1)/2n (n-1) / 2
- So, the average case complexity has the following
recurrence equation - T(n) T((n-1)/2) T((n-1)/2) n 1 2
T((n-1)/2) n 1 - Using the Master Theorem from chapter 3, we have
f(n) n 1, b 2 and c 2 so that E 1 and
that means that T(n) is in ? ? (n log n) - A more formal analysis is given on pages 167-168
if you want to see the math! - What is Quicksorts space usage? Partition is
done in place and so the only space required is
the array plus a few temporary variables, or
Quicksorts space usage is ? ? (n)
14Improving Quicksort
- Even though Quicksort has a poor worst-case
complexity (as opposed to some other sorting
algorithms), the partition strategy is faster
than most other sorting mechanisms, so Quicksort
is a desired sort as long as we can prevent the
worst case from arising - How?
- There are numerous improvements
- Make sure the pivot is a good one, possibly by
selecting 3 values (at random, or the first 3, or
the first, middle and last in the array) and
finding the median. This causes partition to be
slightly more complicated, but not any more
computationally complex - Remove the subroutines from partition (the
version presented in these notes is like that,
the version in the textbook uses the subroutines) - For small arrays, use a different sort
- Optimize the stack space so that the recursive
calls do not slow down the process
15Merging Sorted Sequences
- Recall from our earlier idea to use divide and
conquer to sort, we need a way to combine the
sorted subarrays - We did not need this in Quicksort because
Quicksort didnt really need a combining step - How can we do this?
- Lets assume we have two sorted subarrays, A and
B - We want to sort them into a new array C which is
equal in size to A B - Which item goes into C0? It will either be
A0 or B0 - Which item goes into C1? It will either be
A1 or B0 if A0 has already been placed, or
A0 or B1 if B1 has already been placed - etc
- Until we have placed all of A or B into C, then
the remainder of the merge requires copying the
rest of whichever array still remains
16Recursive Merge
- Merge algorithm (to the right)
- moves the rest of A or B into C if the other
array is done, or finds the smaller of
acurrentA, bcurrentB and moves it into c, and
then recursively calls itself with the rest of a
and b - The recurrence equation for merge is
- T(n) T(n -1) 1
- The base case 0, however what n is the base
case? It depends on when we run out of one of
the two arrays, but this will be no greater than
n, so T(n) ? ? (n) and in the worst case, T(n)
n - 1
void merge(Element a, b, c, int sizeA,
sizeB, currentA, currentB, currentC)
if(currentA sizeA) for(int
kcurrentB k
ccurrentC bk else if(currentB
sizeB) for(int kcurrentA k
ccurrentC ak else
if(acurrentA
ccurrentC acurrentA merge(a,
b, c, sizeA, sizeB, currentA, currentB,
currentC) else ccurrentC
bcurrentB merge(a, b, c, sizeA,
sizeB, currentA, currentB, currentC) return
17Iterative Merge
- The iterative version of merge works similar to
the recursive one, move one element into array c
based on whether the current element of a or b is
smaller, and repeat until one array is emptied
out and then move the remainder of the other
array into c - see algorithm 4.4, page 172
- This requires n total copies (number of elements
in a and b n) but the number of comparisons
varies depending on when the first array is
emptied - We have the same situation as with the recursive
version T(n) n 1 in the worst case and T(n) ?
? (n) in any case
18Optimality of Merge
- Is there any better way to merge two arrays?
- Consider the following two arrays
- A 1, 2, 3, 4, 5, 6, 7, 8
- B 9, 10, 11, 12, 13, 14, 15, 16
- We can merge these two arrays with 1 comparison
(how?) - In the worst case, our merge algorithm does the
least amount of work required, consider these
arrays - A 1, 3, 5, 7, 9, 11, 13, 15
- B 2, 4, 6, 8, 10, 12, 14, 16
- We could not improve over merge in this case
because every pair of Ai and Bj will have to
be compared where i and j are equal or off by one - Thus, there are n-1 comparisons in the worst case
(n-2 is possible in the worst case if A and B
differ in size by 1 element)
19Merges Space Usage
- While Merge gives us an optimal worst-case
merger, it does so at a cost of space usage - As seen in the previous examples, we have two
arrays of size n combined to merge into array C - Array C then must be of size n
- So Merge takes 2 n space as opposed to n for
Quicksort and Insertion sort - Could we improve?
- In a non-worst case situation, yes, by not
copying all of the second array into C, but
instead, copying what we have in C back into the
original array - There is a discussion of this on pages 173-174 if
you are interested
20Mergesort
- Now that we have a merge algorithm, we can define
the rest of the sorting algorithm - Recall that we need a mechanism to
- Divide arrays into subarrays
- Sort each subarray
- Combine the subarrays
- Merge will combine the subarrays
- Sorting each subarray is done when two sorted
subarrays are merged, so we wont need a sort - Dividing arrays into two subarrays will be done
by finding the midpoint of the two arrays and
calling the divide/sort/combine procedure
recursively with the two halves
21Mergesort Continued
- The Mergesort algorithm is given to the right
- The midpoint will either be at n/2 or (n-1)/2
creating two subarrays that will be (n-1)/2 in
size, or n/2 in size - To simplify, we will consider these to be
floor(n/2) and ceiling(n/2) - Each recursive call on a subarray of size k will
require at most k-1 comparisons to merge them - Since all subarrays at any recursive level will
sum up to a total of n array elements, the number
of comparisons at that level is n-1 - The recurrence relation is then
- T(n) T(floor(n/2)) T(ceil(n/2)) n-1
- To simplify, T(n) 2 T(n/2) n - 1
- void mergeSort(Element E,
- int first, int last)
- if (first
- int mid (firstlast)/2
- mergeSort(E, first, mid)
- mergeSort(E, mid1, last)
- merge(E, first, mid, last)
- return
- The base case T(1) 0
- With f(n) n 1, b 2, c 2, we have E 1
and the Master Theorem tells us that T(n) ? ? (n
log n)
22Recursion Tree for Mergesort
- We can also visualize Mergesorts complexity
through the recursion tree to obtain a more
precise complexity - Notice that each level doubles the number of
recursive calls but at each level, the amount of
work needed is x fewer comparisons where x
doubles per level (1, 2, 4, 8, etc) - Thus, Mergesort will require the following amount
of work ? (i 0 to ceiling (log n)) (n 2i)
(n 1) (n 2) (n 4) (n 8) (n
(n 1)) (n n)
- ? (i 0 to ceiling (log n)) (n)
- n log n
- ? (i 0 to ceiling (log n)) 2i
- 2log n1 1
- (n 1) 1 n
- So, we get a worst case
- complexity of n log n n
- In fact, mergesorts worst case
- complexity is between
- ceiling(n log n n 1)
- ceiling(n log n .914 n)
23Lower Bound for Sorting with Comparisons
- Consider our worst-case complexities for sorting
- ? (n2) for in-place sorts that move 1 item at a
time - ? (n log n) for divide and conquer based sorts
- Can we do better for any sorting algorithm that
compares pairs of values to find their positions? - The reason we ask the question as above is that
we will next see a sort that doesnt compare
values against themselves - The answer to our question is unfortunately no,
we cannot do better - Why not?
- Consider for n items
- there is n! possible permutations of those n
items - for n 3, we would have 6 combinations
- x1, x2, x3, x1, x3, x2, x2, x1, x3, x2,
x3, x1, x3, x1, x2, x3, x2, x1 - lets arrange the possibilities in a tree where
we traverse the tree to make the fewest
comparisons this is known as a decision tree
24Our Decision Tree (for n 3)
- First, compare x1 and x2
- if x1 branch
- On left, compare x2 and x3, on right, compare x1
and x3 - On left, if x2 x1, x2, x3
- On right, if x1 is x2, x1, x3
- We might need another comparison
- How many comparisons might we have to make and
why?
The height of this tree is log n! because there
are n! leaf nodes So, the maximum number of
comparisons is log n! since we might not know
the sequence until reaching the leaves of the
tree How much is log n! ?
25Lower Bound for Worst Case
- We see from the previous slide that the least
number of comparisons needed to sort by comparing
individual array elements is log n! - How does log n! compare to n log n, our current
lower bound for worst case? - n! n (n 1) (n 2) 3 2 1
- log n! log (n (n 1) (n 2) 2 1)
- log n log n 1 log n 2 log 2 log
1 - ? (i 1 to n) log i 1n log x dx
log e 1n ln x dx -
- log e (x ln x x) 1n log e (n ln n n
1) n log n n log e log e - log e n log n
1.443 n 1.443 - for our worst case complexity, we round off to
the nearest integer and so log n! ceiling (n
log n 1.443 n) we can omit the final
1.443 as it is merely a constant - So our lower bound for a worst case complexity is
ceiling (n log n 1.443 n)
26Lower Bound for Average Case
- Can we improve over n log n for an average case
complexity? - The proof for this is given on pages 180-181
- However, we can simplify this proof by
reconsidering the decision tree - In any decision tree, the leaves will all occur
at either level log n! or (log n!) 1 - So, our average case complexity will be between
log n! and log n! 1 n log n 1.443 n so,
like our worst case complexity, the lower bound
for average case complexity is n log n 1.443
n - Notice in this case that we do not need to take
the ceiling since average case complexity does
not have to be an integer value
27Mergesort vs. Quicksort
- We have two sorting algorithms (so far) that can
give us n log n average case complexity, but - Mergesort requires twice the amount of storage
space - Quicksort cannot guarantee n log n complexity
- Mergesorts merge operation is more time
consuming than Quicksorts partition operation
even though they are both in ? (n) - In practice, Mergesort does 30 fewer comparisons
in the worst case than Quicksort does in the
average case, but because Quicksort does far
fewer element movement, Quicksort turns out to
often be faster - But what if we want a guaranteed better
performance than Mergesort? Quicksort cannot
give us that. - So we turn to a third n log n sort, Heapsort
- Heapsort is interesting for two reasons
- It guarantees n log n performance in average and
worst case, like Mergesort, but is faster than
Mergesort - It does no recursion, thus save stack space and
the overhead of a stack
28Heapsort and Heaps
- You might recall from 364 a Heap
- A binary tree stored in an array with the Heap
property - A value stored in a heap will be greater than
values stored in the values nodes subtrees - The tree must be a left-complete tree, which
means that all leaf nodes are on the bottom two
levels such that the nodes on the lowest level
have no open nodes to their left - In an array, a tree has the following pattern
- Node i has children in position i2 and i21
- To satisfy the second attribute of a heap above,
it means that the tree will be stored in array
locations 0..n-1 for n nodes (or 1..n if we are
using a language other than Java/C/C) - Because of the first attribute, we know the
largest value will be at the root of the heap, so
Heapsort iteratively removes the root of the heap
and restructures the heap until the heap is empty - Thus, Heapsort will sort in descending order
- NOTE we can change the heap property so that a
node is less than any value in its subtree to
sort in ascending order
29Heapsort
- The Heapsort algorithm itself is simple once you
have the Heap ADT implemented - Given an array A of elements
- for(i0i
- heap.add(ai)
- for(ia.length-1i0i--)
- aiheap.delete( )
- That is, take the original array and build the
heap by adding 1 element at a time - Each time a new value is added to the heap, the
heap is restructured to ensure the heap structure
such that it is left-complete and each node is
greater than all nodes in the subtree - Now, refill array a by removing the largest item
from the heap, restructure the heap, and repeat
until the heap is empty
30Adding to the Heap
- Consider a heap as stored in the array given
below - We now want to add a new value, 16
- Start by placing 16 at the end of the array and
then walk the value up in position until it
reaches its proper place - If the array currently has n items, then insert
16 at location n and set temp n - Now, compare 16 to its parent (which will be at
temp / 2) - If (heaptemp heaptemp / 2) then this value
is greater than the parent, swap the two - Continue until either heaptemp or temp 0 (we have reached the root of the
tree)
Original Heap 16 inserted 16 walked into place
31Deleting from the Heap
- We only want to remove the largest element from
the heap, which by definition must be at the root
(index 0) - Store heap0 in a temporary variable to be
returned - Now, restructure the heap by moving the item at
heapn-1 (the last element in the array) to the
root and walking it down into its proper
position. How? - Let temp 0
- Compare heaptemp with heaptemp2 and
heaptemp21 that is, with its two children - If heaptemp is not the largest of the three,
swap heaptemp with the larger of the two
children and repeat until either the value is at
a leaf, or is greater than its two children and
thus in its proper place - Return the old root (temporary value) and
subtract 1 from n
Return 22 Walk down 10
After walking 10 into place
32Heapsort Code and Analysis
- The student is invited to examine the algorithm
as given on page 184 and 186 and the code on page
190 and 191 - It is not reproduced here for brevity (also, the
code in the book is not the best version!) - What is the complexity of this algorithm?
- Since the heap is always a balanced tree, it
should be obvious that the heaps height will be
log n or log n 1 - To add an element requires walking it up from a
leaf node to its proper position, which at most,
will be the root, or log n operations - To remove an element requires deleting the root,
moving the last item to the root and walking it
down to its proper position, which at most will
be a leaf, or log n operations
33Analysis Continued
- How many times do we perform walkup and walkdown
during a sort? Once per add and once per delete - How many times do we add? n times
- How many times do we delete? n times
- Since walkup and walkdown are both log n
operations, it takes ? (n log n) to build the
heap and ? (n log n) to remove all items from the
heap - So, Heapsort is in ? ? (n log n) in the worst
case - A more formal analysis is given on pages 190-191
where we see that the actual number of
comparisons is roughly 2 (n log n 1.443 n) - What about the average case?
- We must determine the average amount of work
performed by walkup and walkdown - Lets assume that all elements will differ in the
array to be sorted - Then, to insert element i, there is a 1 / (i 1)
chance that the element will fall between any two
other elements (as we saw with insertion sort) - However, unlike insertion sort, the amount of
work does not range from 1 to i comparisons but
instead from 1 to log i comparisons
34Analysis Continued
- The average amount for any given walk up or walk
down is 1 / (j 1) ? (i 1 to j) log i - (j log j 1.443 j) / (j 1) ? log j 1.443
- We must now sum this for all j from 1 to n twice
(once for each walk up and for each walk down) - So, the average case complexity for Heapsort
- 2 ? (i 1 to n) log i 1.443
- 2 (? (i 1 to n) log i 1.443 n)
- 2 (n log n 1.443 n 1.443 n) 2 (n
log n 2 1.443 n) - Thus, the only change in complexity between the
worst and average cases is a doubling of 1.443 in
the latter term - So, the average case of Heapsort is in ? ? (n log
n)
35Improving Heapsort
- Walkup requires fewer comparisons than walkdown
- walkup compares a given value against its parent,
walkdown compares a given value against both
children - When a heap is large,
- the amount of walking down a value might be
improved by adding some guessing in terms of how
far down the value might be walked - rather than walking it down the whole tree, we
might walk it down some distance, and then bubble
up a larger value at a leaf level - a value walked down from root to leaf takes ? 2
log n comparisons - if we can walk it halfway down and bubble up a
value from the leaf, we only need 2 (log n) / 2
1 (log n) / 2 comparisons 3 log n / 2
comparisons log (n3) / 2 - How much of an improvement is this over log n?
- If n 1000, the normal walkdown takes 20
comparisons, the bubble up walkdown takes 15 - But this improvement is risky, what if we dont
need to do any bubbling up? See pages 192-196
for a more detailed analysis
36The Shell Sort
- Shell Sort algorithm is somewhat like the
Insertion Sort - It is an in-place sort
- Keys are compared so that smaller keys are moved
before larger keys - The main difference is that in Shell Sort, the
values being compared are not necessarily next to
each other - Instead, we start by comparing values in
intervals and lower the interval distance - For instance, compare A0, A5, A10, A15
and compare A1, A6, A11, A16 and compare
A2, A7, A12, A17 and compare A3, A8,
A13, A18 and compare A4, A9, A14, A19 - Next, lower the interval to be 3
- Next, lower the interval to be 2
- Finally, lower the interval to be 1
37The Advantage
- We already know from earlier that an in-place
sort has a worst-case complexity in ? (n2) so is
this an improvement? - It turns out that it is because we are not moving
a single value into its proper place with each
pass through the list, but instead moving several
values to their proper place within the given
intervals - We then repeat
- We do not have to repeat the process n times
either, but we have to pick a proper interval to
make it work - We wont go over the algorithm in detail (see
section 4.10 if you are interested) but we will
note its analysis next
38Shell Sort Analysis
- The exact worst case performance of shell sort
has yet to be proven because it is not known what
set of intervals is best - It has been shown that if only two intervals are
used, first when 1.72 (n 1/3) and 1, then the
performance is roughly n5/3 - It is also known that for intervals of 2k 1 for
1 - Finally, there is a sequence of intervals that
gives Shell Sort a performance in O(n (log n)2) - To determine how good this is, we know the
following - n log n
- So, Shell Sort improves over some of the
algorithms we have seen in this chapter without
the overhead of additional memory space or ? (n)
operations per iteration (as with Mergesort or
Heapsort)
39Bucket Sorting
- Recall earlier that we proved the worst case
lower bound for a sort that compares values is n
log n - But not all sorting algorithms must compare
values against other values to be sorted - How can we get around this?
- Lets sort a list of values by analyzing the key
of each value and placing it in a bucket (or
pile) - We can now sort only those keys in a given pile
- Then scoop up all of the piles keeping them in
their proper order - If we can create a pile using some n operations
for n keys, and scoop them up in n operations,
two thirds of our work is in ? ? (n) - Can we also sort each pile in ? ? (n)? Only if
we can do so without comparing the values in that
pile - Or if we can keep each pile small enough so that
the k log k operations on the given pile (where k
is the size of the pile) is small enough to keep
the entire solution in ? ? (n)
40Radix Sort
- Radix sort is an example of a bucket sort where
the sort a pile step is omitted - However, the distribute keys to a pile and
scoop up pile steps must be repeated - The good news because the distribute and
scoop steps are ? ? (n), and because the number
of repetitions is a constant based not on n, but
on the size of the keys, then the Radix sort is ?
? (n) - The bad news in many cases, the algorithm may
be difficult to implement and the amount of work
required to distribute and scoop is in ? ?
(n) with a large constant multiplier - The result is interesting an algorithm with a
worst case complexity in ? ? (n) but with a
run-time substantially longer than sorts in ? ?
(n log n)
41The Radix Sort Algorithm
- For the purpose of describing the algorithm
- we will assume that we are dealing with int keys
that are no longer than k digits - create 10 FIFO queues, queue0..queue9
- for i k downto 1 do
- for j 0 to n-1 do
- temp jth value in the list
- peel off digit i from temp and place temp into
queuei - for j0 to 9
- remove all items from queuej and return them to
the original list in the order they were removed
42Believe it or not
- The algorithm really works! We show by
induction - After one pass, the values are sorted by their
final digit - Assuming that after pass k-1, the values are
sorted from their second to last digits, then - If we remove all items from queue 0, then they
are already sorted from second to last digit.
Since they all start with 0, they are sorted
correctly. Next we remove all items from queue 1
(already sorted), etc and so we will have a
completely sorted list - Example ?
43Radix Sort Analysis
- The complexity is easy to figure out from the
pseudocode earlier - In sorting int values
- We create 10 queues (0 comparisons)
- We iterate for each digit
- for int values, assume 10 digits since int values
are stored in 32 bits, giving a range of roughly
2 billion to 2 billion - The inner loop requires taking each key (n of
them), peeling off the current digit, and
determining which queue it is placed into n
comparisons - this assumes a ?(1) enqueue operation and a ?(1)
peel digit off operation - Now we have to remove every item from every
queue, this requires n dequeues, again assuming
?(1) for dequeue - Thus, the algorithm has 2 n 2 ? (1)
operations per iteration and there are 10
iterations or roughly 20 n comparisons which is
? (n) ! - What if we were dealing with floats or doubles
instead of int values? - What if we were dealing with strings instead of
int values?
44The Bad News
- Radix Sort isnt a great sort because of
problems - We need a lot of extra storage space for the
queues - How much? We need an array of k queues where k
10 for numeric keys and 26, 52, or 128 for string
keys (or even 65336 if we are dealing with
Unicode!) - If our queues are linked list based, then our
queues will take up a total of n entries, but if
we use arrays for our queues, then we will need n
entries per queue to make sure we have enough
space, and so we wind up wasting 9 n (or 25 n
or 51 n or 127 n or 65335 n) entries! - Peeling off a digit from an int, float or double
is not easy, especially in a language other than
Java - Peeling a char off a string is easy in most
languages, but peeling a digit off a number might
require first converting the number to a string,
or isolating the digit by a series of / and
operations - While Radix sort is in ? (n), the constant
multiplier is quite large depending on the size
of the keys - Strings might be as many as 255 characters!