Chapter 4: Sorting presentation

About This Presentation

Transcript and Presenter's Notes

Title: Chapter 4: Sorting

1
Chapter 4 Sorting

Sorting is an important application to study
Sorting is used in numerous applications
As time as gone on, technology has permitted
easier sorting
For instance, it is no longer necessary (usually)
to worry about how to keep portions of a list on
disk because memory was too small in size
From an algorithmic perspective, we find many
different strategies for sorting
Each has its own advantages and disadvantages
We will study several sorting algorithms here and
determine their average and worst-case
complexities
We will also note the amount of temporary storage
required for each so that we can compare their
space complexity as well as their time complexity
Many of the algorithms we will view will be
limited to sorting lists implemented as arrays,
but some also accommodate linked lists

2
Insertion Sort

Basic idea
given a new item, insert it into an already
ordered list
repeat this process for each item in the list
until the list is sorted
for instance, insert the second element with
respect to the first, then insert the third
element with respect to the first and second
(which are already sorted), etc
The insertion portion can be done either
iteratively or recursively (see example to the
right)
Notice that the given code has two comparisons
(in bold), one being the base case

int shiftVacRec(Element E,
int vacant, Key x)
int xLoc
if (vacant 0)
xLoc vacant
else if (Evacant-1.key
xLoc vacant
else
Evacant Evacant-1
xLoc shiftVacRec(E,
vacant-1, x)
return xLoc

3
Complexity

The recurrence equation for shiftVacRec is easy
to derive
T(n) T(n-1) 1 with the base case being T(0)
1
Again remember that we are only counting
comparisons
T(n) T(n-1) 1 T(n-2) 2 T(n-3) 3
T(0) n 1 n ? ? (n)
Therefore, to insert 1 item into an already
sorted list is in ? (n)
How do we perform the insertion sort? We use
shiftVacRec n-1 times, once for each item in the
array
However, notice that n, from T(n) grows (or
changes)
The number of items in the array for shiftVacRec
to work with increases from 1 for the first
iteration to n-1 for the last
So in fact, we have, in the worst case, a
sequence of 1 comparison the first time, 2 the
second, 3 the third, etc up to n-1 the last time
We saw previously that the summation from 1 to
n-1 n (n 1) / 2
So, insertion sort takes ½(n2-n) which is ? ?
(n2)

4
Iterative Insertion Sort

If n is large, the number of recursive calls
makes our shiftVacRec inefficient because of the
overhead of a procedure call
Instead we might choose to implement insertion
sort iteratively by replacing procedure calls
with a loop
we will find other sorting algorithms where we
must use recursion, but here we do not HAVE to
This code is given to the right
It should be easy to see that the while loop will
execute at most 1 time for the first iteration of
xindex, 2 times for the second, etc and n-1
times for the last, so iterative insertion sort
does the same amount of work, ? ? (n2)

void insertionSort(Element E, int n)
int xindex, xLoc Element current
for(xindex1xindex
current Exindex
xLoc shiftVac(E, xindex, x)
ExLoc current
return
int shiftVac(Element E, int xindex, x)
int vacant, xLoc
vacant xLoc
while (vacant 0)
if(Evacant-1
xLoc vacant break
EvacantEvacant-1
vacant--
return xLoc

5
Average Case Complexity

The average case complexity is not as easy to
determine
First, assume that no values of the array are
equal
Next, assume that a value to be inserted has an
equal chance of being inserted in any location
There are i1 locations to insert the ith item
(before the first, between first and second, etc,
after last) see figure below
In order to determine the average case, we
compute the average of inserting i into each of
the i1 possibilities
For each of the i1 possibilities, there is 1
comparison (the if-statement) and there is a 1 /
(i 1) chance so we have

1/(i1) ?i1j (i 1) 1/(i1) ?i1j (i)
j (1 / (i1) ) 1 / (i 1) (i1)i/2 i)
i / 2 i / (i 1) to insert the ith item
(remember i ranges from 1 to n-1)
6
Average Case Continued

So, all of insertion sort takes
A(n) ?i1n-1 i / 2 i / (i 1)
?i1n-1 i / 2 1 - 1 / (i 1)
(n-1)n / 2 / 2 n 1 - ?i1n-1 1 / (i 1)
The first term above is (n2-n)/4, the second term
is n-1, and the last term is roughly equal to ln
n
see Example 1.7 on pages 26-27 for an explanation
Therefore, insertion sort, on average, takes ?
(n2 - n)/4 n 1 ln n (n2 5n 4) / n
ln n ?
(n2)/4 so insertion sorts average case is ? ?
(n2)
Insertion sort has a space complexity of ?(n)
because it is an in-place sort, that is, it
does not copy the array and only needs 1
temporary variable, for x (the item being
inserted)

7
Lower Bound on in-place Sorts

The idea behind insertion sort is to sort
in-place by taking one element and finding its
proper position
Other sorting algorithms are similar
The common aspect is that these sorting
algorithms place 1 item in its proper position
for each iteration
What is the lower bound on such an algorithm?
Consider that in such an algorithm, we are
comparing some portion of an already sorted array
to a new item
Such a comparison must encompass all items in the
already sorted portion of the array to an item to
be placed
If the comparison is to the entire array (as is
the case with Selection Sort) then we have n
comparisons performed n-1 times
If the comparison is to a growing portion of the
array, we have a sequence of comparisons of 1 2
3 n-1 (nn-1)/2
Therefore, the minimum amount of work in which
exactly 1 item is placed per iteration is ? ?
(n2)
Does this mean insertion sort is an optimal sort?

8
Divide and Conquer Sorts

Let us apply recursion to sorting as follows
The sorting algorithm is solved by dividing the
problem into
Dividing the array into smaller arrays
Sorting the smaller arrays
Combining the sorted smaller arrays into a larger
array
See the algorithm to the right
Our algorithms complexity is described by the
following recurrence relation
T(n) D(n) ? S(size part i) C(n)
We must determine how to divide the arrays (D),
how to sort each one (S), and how to combine them
when done (C)
We might find that S is simplified by recursion

Solve(I)
n size(I)
if (n
solution directlySolve(I)
else
divide I into I1, , Ik.
for each i in 1, , k
Si solve(Ii)
solution combine(S1, , Sk)
return solution

9
Quicksort

In Quicksort, we use the exact strategy described
previously as follows
Before dividing, move elements so that, given
some element x, all elements left of x are less
than Ex and all elements right of x are greater
than Ex
Now we divide by simply repeating this on the
left hand side and right hand side of x
Combining is not necessary (it is simply a return
statement)
This brings about some questions
What element should be x? Is the choice of x
important?
How do we move elements so that x is positioned
correctly?
What kind of complexity will this algorithm yield?

10
Quicksort Continued

We divide Quicksort into two procedures, the main
Quicksort procedure finds x and recursively calls
itself with the left-hand and right-hand sides
Partition will be used to find move the elements
of the array around so that x falls in its proper
place with respect to all elements elements x
That is, Ex where z
Partition does most of the work of the Quicksort
algorithm
How might Partition work?
Move from right-to-left until we find an item
less than Ex, we move that item into x
Move from left-to-right (after x) until we find
an item greater than Ex and move that to the
newly freed position
Repeat until we meet in the middle somewhere and
place Ex there
Partition will take n-1 comparisons

11
Quicksort Algorithm
void quicksort(Element E, int first, int
last) int splitPoint, Element pivot
if(first splitPoint partition(E, pivot, first,
last) EsplitPoint pivot
quicksort(E, first, splitPoint 1)
quicksort(E, splitPoint1, last) return
int partition(Element E, Element pivot,
int first, int last) int low first, high
last int lowVac, highVac lowVac
low highVac high while (low while(EhighVacpivot)
highVac-- ElowVac Ehigh
while(ElowVacEhighVacElow lowlowVac
highhighVac-1 return low

12
Quicksort Analysis

Quicksort has a recurrence equation of
T(n) T(n-r-1) T(r) n - 1
Where r is the number of elements to Pivots
right
notice the - 1 in n r 1 because pivot is
already in its proper place
How do we solve T(n) when r changes each time?
The worst case occurs when r1 or rn1, so lets
use r 1
This gives us T(n) T(n-1) T(1) n 1 with
T(1) 0
So T(n) T(n-1) n - 1 T(n-2) 2n - 2
T(n-3) 3n - 3 T(1) (n-1)n - n
n(n-1) n n2 2n
So, Quicksorts worst case is in ? ? (n2)

13
Quicksort Analysis continued

What about Quicksorts average case?
This will happen when r 1 and r will r be?
On average, r will be between these two extremes
and the average value of r is then
1/n ?i1 to n-1 i n(n-1)/2n (n-1) / 2
So, the average case complexity has the following
recurrence equation
T(n) T((n-1)/2) T((n-1)/2) n 1 2
T((n-1)/2) n 1
Using the Master Theorem from chapter 3, we have
f(n) n 1, b 2 and c 2 so that E 1 and
that means that T(n) is in ? ? (n log n)
A more formal analysis is given on pages 167-168
if you want to see the math!
What is Quicksorts space usage? Partition is
done in place and so the only space required is
the array plus a few temporary variables, or
Quicksorts space usage is ? ? (n)

14
Improving Quicksort

Even though Quicksort has a poor worst-case
complexity (as opposed to some other sorting
algorithms), the partition strategy is faster
than most other sorting mechanisms, so Quicksort
is a desired sort as long as we can prevent the
worst case from arising
How?
There are numerous improvements
Make sure the pivot is a good one, possibly by
selecting 3 values (at random, or the first 3, or
the first, middle and last in the array) and
finding the median. This causes partition to be
slightly more complicated, but not any more
computationally complex
Remove the subroutines from partition (the
version presented in these notes is like that,
the version in the textbook uses the subroutines)
For small arrays, use a different sort
Optimize the stack space so that the recursive
calls do not slow down the process

15
Merging Sorted Sequences

Recall from our earlier idea to use divide and
conquer to sort, we need a way to combine the
sorted subarrays
We did not need this in Quicksort because
Quicksort didnt really need a combining step
How can we do this?
Lets assume we have two sorted subarrays, A and
B
We want to sort them into a new array C which is
equal in size to A B
Which item goes into C0? It will either be
A0 or B0
Which item goes into C1? It will either be
A1 or B0 if A0 has already been placed, or
A0 or B1 if B1 has already been placed
etc
Until we have placed all of A or B into C, then
the remainder of the merge requires copying the
rest of whichever array still remains

16
Recursive Merge

Merge algorithm (to the right)
moves the rest of A or B into C if the other
array is done, or finds the smaller of
acurrentA, bcurrentB and moves it into c, and
then recursively calls itself with the rest of a
and b
The recurrence equation for merge is
T(n) T(n -1) 1
The base case 0, however what n is the base
case? It depends on when we run out of one of
the two arrays, but this will be no greater than
n, so T(n) ? ? (n) and in the worst case, T(n)
n - 1

void merge(Element a, b, c, int sizeA,
sizeB, currentA, currentB, currentC)
if(currentA sizeA) for(int
kcurrentB k
ccurrentC bk else if(currentB
sizeB) for(int kcurrentA k
ccurrentC ak else
if(acurrentA
ccurrentC acurrentA merge(a,
b, c, sizeA, sizeB, currentA, currentB,
currentC) else ccurrentC
bcurrentB merge(a, b, c, sizeA,
sizeB, currentA, currentB, currentC) return
17
Iterative Merge

The iterative version of merge works similar to
the recursive one, move one element into array c
based on whether the current element of a or b is
smaller, and repeat until one array is emptied
out and then move the remainder of the other
array into c
see algorithm 4.4, page 172
This requires n total copies (number of elements
in a and b n) but the number of comparisons
varies depending on when the first array is
emptied
We have the same situation as with the recursive
version T(n) n 1 in the worst case and T(n) ?
? (n) in any case

18
Optimality of Merge

Is there any better way to merge two arrays?
Consider the following two arrays
A 1, 2, 3, 4, 5, 6, 7, 8
B 9, 10, 11, 12, 13, 14, 15, 16
We can merge these two arrays with 1 comparison
(how?)
In the worst case, our merge algorithm does the
least amount of work required, consider these
arrays
A 1, 3, 5, 7, 9, 11, 13, 15
B 2, 4, 6, 8, 10, 12, 14, 16
We could not improve over merge in this case
because every pair of Ai and Bj will have to
be compared where i and j are equal or off by one
Thus, there are n-1 comparisons in the worst case
(n-2 is possible in the worst case if A and B
differ in size by 1 element)

19
Merges Space Usage

While Merge gives us an optimal worst-case
merger, it does so at a cost of space usage
As seen in the previous examples, we have two
arrays of size n combined to merge into array C
Array C then must be of size n
So Merge takes 2 n space as opposed to n for
Quicksort and Insertion sort
Could we improve?
In a non-worst case situation, yes, by not
copying all of the second array into C, but
instead, copying what we have in C back into the
original array
There is a discussion of this on pages 173-174 if
you are interested

20
Mergesort

Now that we have a merge algorithm, we can define
the rest of the sorting algorithm
Recall that we need a mechanism to
Divide arrays into subarrays
Sort each subarray
Combine the subarrays

Merge will combine the subarrays
Sorting each subarray is done when two sorted
subarrays are merged, so we wont need a sort
Dividing arrays into two subarrays will be done
by finding the midpoint of the two arrays and
calling the divide/sort/combine procedure
recursively with the two halves

21
Mergesort Continued

The Mergesort algorithm is given to the right
The midpoint will either be at n/2 or (n-1)/2
creating two subarrays that will be (n-1)/2 in
size, or n/2 in size
To simplify, we will consider these to be
floor(n/2) and ceiling(n/2)
Each recursive call on a subarray of size k will
require at most k-1 comparisons to merge them
Since all subarrays at any recursive level will
sum up to a total of n array elements, the number
of comparisons at that level is n-1
The recurrence relation is then
T(n) T(floor(n/2)) T(ceil(n/2)) n-1
To simplify, T(n) 2 T(n/2) n - 1

void mergeSort(Element E,
int first, int last)
if (first
int mid (firstlast)/2
mergeSort(E, first, mid)
mergeSort(E, mid1, last)
merge(E, first, mid, last)
return

The base case T(1) 0
With f(n) n 1, b 2, c 2, we have E 1
and the Master Theorem tells us that T(n) ? ? (n
log n)

22
Recursion Tree for Mergesort

We can also visualize Mergesorts complexity
through the recursion tree to obtain a more
precise complexity
Notice that each level doubles the number of
recursive calls but at each level, the amount of
work needed is x fewer comparisons where x
doubles per level (1, 2, 4, 8, etc)
Thus, Mergesort will require the following amount
of work ? (i 0 to ceiling (log n)) (n 2i)
(n 1) (n 2) (n 4) (n 8) (n
(n 1)) (n n)

? (i 0 to ceiling (log n)) (n)
n log n
? (i 0 to ceiling (log n)) 2i
2log n1 1
(n 1) 1 n
So, we get a worst case
complexity of n log n n
In fact, mergesorts worst case
complexity is between
ceiling(n log n n 1)
ceiling(n log n .914 n)

23
Lower Bound for Sorting with Comparisons

Consider our worst-case complexities for sorting
? (n2) for in-place sorts that move 1 item at a
time
? (n log n) for divide and conquer based sorts
Can we do better for any sorting algorithm that
compares pairs of values to find their positions?
The reason we ask the question as above is that
we will next see a sort that doesnt compare
values against themselves
The answer to our question is unfortunately no,
we cannot do better
Why not?
Consider for n items
there is n! possible permutations of those n
items
for n 3, we would have 6 combinations
x1, x2, x3, x1, x3, x2, x2, x1, x3, x2,
x3, x1, x3, x1, x2, x3, x2, x1
lets arrange the possibilities in a tree where
we traverse the tree to make the fewest
comparisons this is known as a decision tree

24
Our Decision Tree (for n 3)

First, compare x1 and x2
if x1 branch
On left, compare x2 and x3, on right, compare x1
and x3
On left, if x2 x1, x2, x3
On right, if x1 is x2, x1, x3
We might need another comparison
How many comparisons might we have to make and
why?

The height of this tree is log n! because there
are n! leaf nodes So, the maximum number of
comparisons is log n! since we might not know
the sequence until reaching the leaves of the
tree How much is log n! ?
25
Lower Bound for Worst Case

We see from the previous slide that the least
number of comparisons needed to sort by comparing
individual array elements is log n!
How does log n! compare to n log n, our current
lower bound for worst case?
n! n (n 1) (n 2) 3 2 1
log n! log (n (n 1) (n 2) 2 1)
log n log n 1 log n 2 log 2 log
1
? (i 1 to n) log i 1n log x dx
log e 1n ln x dx
log e (x ln x x) 1n log e (n ln n n
1) n log n n log e log e
log e n log n
1.443 n 1.443
for our worst case complexity, we round off to
the nearest integer and so log n! ceiling (n
log n 1.443 n) we can omit the final
1.443 as it is merely a constant
So our lower bound for a worst case complexity is
ceiling (n log n 1.443 n)

26
Lower Bound for Average Case

Can we improve over n log n for an average case
complexity?
The proof for this is given on pages 180-181
However, we can simplify this proof by
reconsidering the decision tree
In any decision tree, the leaves will all occur
at either level log n! or (log n!) 1
So, our average case complexity will be between
log n! and log n! 1 n log n 1.443 n so,
like our worst case complexity, the lower bound
for average case complexity is n log n 1.443
n
Notice in this case that we do not need to take
the ceiling since average case complexity does
not have to be an integer value

27
Mergesort vs. Quicksort

We have two sorting algorithms (so far) that can
give us n log n average case complexity, but
Mergesort requires twice the amount of storage
space
Quicksort cannot guarantee n log n complexity
Mergesorts merge operation is more time
consuming than Quicksorts partition operation
even though they are both in ? (n)
In practice, Mergesort does 30 fewer comparisons
in the worst case than Quicksort does in the
average case, but because Quicksort does far
fewer element movement, Quicksort turns out to
often be faster
But what if we want a guaranteed better
performance than Mergesort? Quicksort cannot
give us that.
So we turn to a third n log n sort, Heapsort
Heapsort is interesting for two reasons
It guarantees n log n performance in average and
worst case, like Mergesort, but is faster than
Mergesort
It does no recursion, thus save stack space and
the overhead of a stack

28
Heapsort and Heaps

You might recall from 364 a Heap
A binary tree stored in an array with the Heap
property
A value stored in a heap will be greater than
values stored in the values nodes subtrees
The tree must be a left-complete tree, which
means that all leaf nodes are on the bottom two
levels such that the nodes on the lowest level
have no open nodes to their left
In an array, a tree has the following pattern
Node i has children in position i2 and i21
To satisfy the second attribute of a heap above,
it means that the tree will be stored in array
locations 0..n-1 for n nodes (or 1..n if we are
using a language other than Java/C/C)
Because of the first attribute, we know the
largest value will be at the root of the heap, so
Heapsort iteratively removes the root of the heap
and restructures the heap until the heap is empty
Thus, Heapsort will sort in descending order
NOTE we can change the heap property so that a
node is less than any value in its subtree to
sort in ascending order

29
Heapsort

The Heapsort algorithm itself is simple once you
have the Heap ADT implemented
Given an array A of elements
for(i0i
heap.add(ai)
for(ia.length-1i0i--)
aiheap.delete( )
That is, take the original array and build the
heap by adding 1 element at a time
Each time a new value is added to the heap, the
heap is restructured to ensure the heap structure
such that it is left-complete and each node is
greater than all nodes in the subtree
Now, refill array a by removing the largest item
from the heap, restructure the heap, and repeat
until the heap is empty

30
Adding to the Heap

Consider a heap as stored in the array given
below
We now want to add a new value, 16
Start by placing 16 at the end of the array and
then walk the value up in position until it
reaches its proper place
If the array currently has n items, then insert
16 at location n and set temp n
Now, compare 16 to its parent (which will be at
temp / 2)
If (heaptemp heaptemp / 2) then this value
is greater than the parent, swap the two
Continue until either heaptemp or temp 0 (we have reached the root of the
tree)

Original Heap 16 inserted 16 walked into place
31
Deleting from the Heap

We only want to remove the largest element from
the heap, which by definition must be at the root
(index 0)
Store heap0 in a temporary variable to be
returned
Now, restructure the heap by moving the item at
heapn-1 (the last element in the array) to the
root and walking it down into its proper
position. How?
Let temp 0
Compare heaptemp with heaptemp2 and
heaptemp21 that is, with its two children
If heaptemp is not the largest of the three,
swap heaptemp with the larger of the two
children and repeat until either the value is at
a leaf, or is greater than its two children and
thus in its proper place
Return the old root (temporary value) and
subtract 1 from n

Return 22 Walk down 10
After walking 10 into place
32
Heapsort Code and Analysis

The student is invited to examine the algorithm
as given on page 184 and 186 and the code on page
190 and 191
It is not reproduced here for brevity (also, the
code in the book is not the best version!)
What is the complexity of this algorithm?
Since the heap is always a balanced tree, it
should be obvious that the heaps height will be
log n or log n 1
To add an element requires walking it up from a
leaf node to its proper position, which at most,
will be the root, or log n operations
To remove an element requires deleting the root,
moving the last item to the root and walking it
down to its proper position, which at most will
be a leaf, or log n operations

33
Analysis Continued

How many times do we perform walkup and walkdown
during a sort? Once per add and once per delete
How many times do we add? n times
How many times do we delete? n times
Since walkup and walkdown are both log n
operations, it takes ? (n log n) to build the
heap and ? (n log n) to remove all items from the
heap
So, Heapsort is in ? ? (n log n) in the worst
case
A more formal analysis is given on pages 190-191
where we see that the actual number of
comparisons is roughly 2 (n log n 1.443 n)
What about the average case?
We must determine the average amount of work
performed by walkup and walkdown
Lets assume that all elements will differ in the
array to be sorted
Then, to insert element i, there is a 1 / (i 1)
chance that the element will fall between any two
other elements (as we saw with insertion sort)
However, unlike insertion sort, the amount of
work does not range from 1 to i comparisons but
instead from 1 to log i comparisons

34
Analysis Continued

The average amount for any given walk up or walk
down is 1 / (j 1) ? (i 1 to j) log i
(j log j 1.443 j) / (j 1) ? log j 1.443
We must now sum this for all j from 1 to n twice
(once for each walk up and for each walk down)
So, the average case complexity for Heapsort
2 ? (i 1 to n) log i 1.443
2 (? (i 1 to n) log i 1.443 n)
2 (n log n 1.443 n 1.443 n) 2 (n
log n 2 1.443 n)
Thus, the only change in complexity between the
worst and average cases is a doubling of 1.443 in
the latter term
So, the average case of Heapsort is in ? ? (n log
n)

35
Improving Heapsort

Walkup requires fewer comparisons than walkdown
walkup compares a given value against its parent,
walkdown compares a given value against both
children
When a heap is large,
the amount of walking down a value might be
improved by adding some guessing in terms of how
far down the value might be walked
rather than walking it down the whole tree, we
might walk it down some distance, and then bubble
up a larger value at a leaf level
a value walked down from root to leaf takes ? 2
log n comparisons
if we can walk it halfway down and bubble up a
value from the leaf, we only need 2 (log n) / 2
1 (log n) / 2 comparisons 3 log n / 2
comparisons log (n3) / 2
How much of an improvement is this over log n?
If n 1000, the normal walkdown takes 20
comparisons, the bubble up walkdown takes 15
But this improvement is risky, what if we dont
need to do any bubbling up? See pages 192-196
for a more detailed analysis

36
The Shell Sort

Shell Sort algorithm is somewhat like the
Insertion Sort
It is an in-place sort
Keys are compared so that smaller keys are moved
before larger keys
The main difference is that in Shell Sort, the
values being compared are not necessarily next to
each other
Instead, we start by comparing values in
intervals and lower the interval distance
For instance, compare A0, A5, A10, A15
and compare A1, A6, A11, A16 and compare
A2, A7, A12, A17 and compare A3, A8,
A13, A18 and compare A4, A9, A14, A19
Next, lower the interval to be 3
Next, lower the interval to be 2
Finally, lower the interval to be 1

37
The Advantage

We already know from earlier that an in-place
sort has a worst-case complexity in ? (n2) so is
this an improvement?
It turns out that it is because we are not moving
a single value into its proper place with each
pass through the list, but instead moving several
values to their proper place within the given
intervals
We then repeat
We do not have to repeat the process n times
either, but we have to pick a proper interval to
make it work
We wont go over the algorithm in detail (see
section 4.10 if you are interested) but we will
note its analysis next

38
Shell Sort Analysis

The exact worst case performance of shell sort
has yet to be proven because it is not known what
set of intervals is best
It has been shown that if only two intervals are
used, first when 1.72 (n 1/3) and 1, then the
performance is roughly n5/3
It is also known that for intervals of 2k 1 for
1
Finally, there is a sequence of intervals that
gives Shell Sort a performance in O(n (log n)2)
To determine how good this is, we know the
following
n log n
So, Shell Sort improves over some of the
algorithms we have seen in this chapter without
the overhead of additional memory space or ? (n)
operations per iteration (as with Mergesort or
Heapsort)

39
Bucket Sorting

Recall earlier that we proved the worst case
lower bound for a sort that compares values is n
log n
But not all sorting algorithms must compare
values against other values to be sorted
How can we get around this?
Lets sort a list of values by analyzing the key
of each value and placing it in a bucket (or
pile)
We can now sort only those keys in a given pile
Then scoop up all of the piles keeping them in
their proper order
If we can create a pile using some n operations
for n keys, and scoop them up in n operations,
two thirds of our work is in ? ? (n)
Can we also sort each pile in ? ? (n)? Only if
we can do so without comparing the values in that
pile
Or if we can keep each pile small enough so that
the k log k operations on the given pile (where k
is the size of the pile) is small enough to keep
the entire solution in ? ? (n)

40
Radix Sort

Radix sort is an example of a bucket sort where
the sort a pile step is omitted
However, the distribute keys to a pile and
scoop up pile steps must be repeated
The good news because the distribute and
scoop steps are ? ? (n), and because the number
of repetitions is a constant based not on n, but
on the size of the keys, then the Radix sort is ?
? (n)
The bad news in many cases, the algorithm may
be difficult to implement and the amount of work
required to distribute and scoop is in ? ?
(n) with a large constant multiplier
The result is interesting an algorithm with a
worst case complexity in ? ? (n) but with a
run-time substantially longer than sorts in ? ?
(n log n)

41
The Radix Sort Algorithm

For the purpose of describing the algorithm
we will assume that we are dealing with int keys
that are no longer than k digits
create 10 FIFO queues, queue0..queue9
for i k downto 1 do
for j 0 to n-1 do
temp jth value in the list
peel off digit i from temp and place temp into
queuei
for j0 to 9
remove all items from queuej and return them to
the original list in the order they were removed

42
Believe it or not

The algorithm really works! We show by
induction
After one pass, the values are sorted by their
final digit
Assuming that after pass k-1, the values are
sorted from their second to last digits, then
If we remove all items from queue 0, then they
are already sorted from second to last digit.
Since they all start with 0, they are sorted
correctly. Next we remove all items from queue 1
(already sorted), etc and so we will have a
completely sorted list
Example ?

43
Radix Sort Analysis

The complexity is easy to figure out from the
pseudocode earlier
In sorting int values
We create 10 queues (0 comparisons)
We iterate for each digit
for int values, assume 10 digits since int values
are stored in 32 bits, giving a range of roughly
2 billion to 2 billion
The inner loop requires taking each key (n of
them), peeling off the current digit, and
determining which queue it is placed into n
comparisons
this assumes a ?(1) enqueue operation and a ?(1)
peel digit off operation
Now we have to remove every item from every
queue, this requires n dequeues, again assuming
?(1) for dequeue
Thus, the algorithm has 2 n 2 ? (1)
operations per iteration and there are 10
iterations or roughly 20 n comparisons which is
? (n) !
What if we were dealing with floats or doubles
instead of int values?
What if we were dealing with strings instead of
int values?

44
The Bad News

Radix Sort isnt a great sort because of
problems
We need a lot of extra storage space for the
queues
How much? We need an array of k queues where k
10 for numeric keys and 26, 52, or 128 for string
keys (or even 65336 if we are dealing with
Unicode!)
If our queues are linked list based, then our
queues will take up a total of n entries, but if
we use arrays for our queues, then we will need n
entries per queue to make sure we have enough
space, and so we wind up wasting 9 n (or 25 n
or 51 n or 127 n or 65335 n) entries!
Peeling off a digit from an int, float or double
is not easy, especially in a language other than
Java
Peeling a char off a string is easy in most
languages, but peeling a digit off a number might
require first converting the number to a string,
or isolating the digit by a series of / and
operations
While Radix sort is in ? (n), the constant
multiplier is quite large depending on the size
of the keys
Strings might be as many as 255 characters!

Write a Comment

User Comments (0)

About PowerShow.com

Chapter 4: Sorting PowerPoint PPT Presentation