Title: Chapter 12 Sorting
1Chapter 12Sorting
- CS 260 Data Structures
- Indiana University Purdue University Fort Wayne
2Note
- We temporarily skip ahead to Section 12.3
- Pages 633 643
3Heapsort
- Heapsort is a superior O( n log(n) ) method
- Assume the array to sort is
- Overview
- First convert an unsorted array to a heap
- Then, iteratively, remove the root element,
rebuilding the heap each time - The root element is always the largest remaining
element - The elements, as removed, are in descending order
- Notation Define to consist of the
elements
int a new int n
a i .. j
a i , a i1 , a i2 , ..., a j
4Heapsort method details
public static void heapsort( int a, int n )
- 1. Note that a 0..0 is already a heap (only
one element) - 2. Turn a 0..(n-1) into a heap by successively
adding . . .
a 1 to a 0..0 a 2 to a 0..1 a 3 to
a 0..2 a n-1 to a 0..(n-2)
These steps could be written as a private helper
method called makeheap
public static void makeHeap( int a, int n )
5Heapsort method details
- 3. Iteratively remove the largest remaining
element and rebuild the heap - Note The largest remaining element is only
removed from the heap logically but remains
physically in the array (in the new position) - Note Reheapification downward could be written
as a private helper method called reheapifyDown
for( int i n-1 i gt 0 i-- ) Exchange a
0 with a i Perform reheapification
downward on a 0..(i-1)
public static void reheapifyDown( int a, int n
)
6Heapsort example
0 1 2 3 4
a
8
8
5
5
8
swap
7
5
7
5
2
2
5
swap
2
2
1
makeHeap
1
5
1
7
swap
swap
2
5
2
1
7
5
2
5
7
swap
2
8
1
0 1 2 3 4
1
2
1
1
a
swap
2
2
5
1
final array
7Analysis of heapsort
- Since the heap is complete binary tree, it is
automatically balanced - The depth of tree is O( log(n) )
- Worst case analysis of heapsort is O( n log(n) )
- Steps 1 and 2 have O( n log(n) ) performance
- Step 3 has O( n log(n) ) performance
- The steps form a sequence
- The worse case performance is also the best case
performance and the average case performance
8Quadratic sorting algorithms
Now, back to the beginning of Chapter 12
Section 12.1, page 600
- Quadratic sorting algorithms
- Have inefficient worst case O( n2 ) performance
- Are easy to implement
- O( n2 ) performance doesnt matter for small
arrays
9Selection sort
public static void selectionsort( int data,
int first, int n ) Find the largest
element. Swap it with the last. Find the
next largest. Swap it with the next to last.
Etc.
- The sort range is data first .. (first n 1)
- A typical call is selectionsort( a, 0, n )
- Here a is an array of n cells
- Analysis
- Best case worst case average case O( n2 )
10Insertion sort
public static void insertionsort( int data,
int first, int n ) Consider data 0..0
already sorted Insert data1 into the
proper position of data 0..1 Insert
data2 into the proper position of data 0..2
Etc.
- Each insert operation places an additional
element into a portion of the array that has
already been sorted as follows
sorted
0 1 2 3 4 5
6 7 8 9 10 11
a
10
11Insertion sort
- Analysis
- Worst case average case O( n2 )
- Best case O( n )
- The algorithms takes advantage of the situation
when the array is already sorted - This is a good method when . . .
- a few updates need to be added from time to time
so that the array remains sorted
12Recursive O( n log(n) ) methods
- We will consider
- Mergesort
- Quicksort
13Mergesort
public static void mergesort( int data, int
first, int n ) Divide the array in half.
Recursively apply mergesort to each half.
Merge the sorted halves into a sorted temporary
array Copy the temporary array back to the
original array
0 1 2 3 4 5
6 7 8 9
mergesort
mergesort
14Mergesort
- Recursive stopping case
- When a subarray to be sorted consists of only one
element - During the merge process, when one of the halves
becomes empty, simply copy the remainder of the
remaining half to the end of the temporary array
15private static void merge( int data, int
first, int n1, int n2) int temp new
int n1n2 // Allocate the temporary
array int copied 0 //
Number of elements copied from data to temp
int copied1 0 // Number copied
from the first half of data int copied2
0 // Number copied from the second
half of data int i
// Array index to copy from temp back into
data while ( ( copied1 lt n1 ) ( copied2
lt n2 ) ) if ( data first copied1
lt data first n1 copied2 )
temp copied data first ( copied1 )
// bad style ! else
temp copied data first n1 (
copied2 ) // bad style !
while ( copied1 lt n1 ) temp copied
data first ( copied1 )
// bad style ! while ( copied2 lt n2 )
temp copied data first n1 (
copied2 ) // bad style ! for
( i 0 i lt n1n2 i ) data first
i temp i
16Mergesort
private static void mergesort( int data, int
first, int n ) int n1 // Size
of the first half of the array int n2
// Size of the second half of the array
if ( n gt 1 ) // Compute sizes of the
two halves n1 n / 2 n2 n -
n1 mergesort( data, first, n1 )
// Sort data first through data
firstn1-1 mergesort( data, first
n1, n2 ) // Sort data firstn1 to the end
// Merge the two sorted halves.
merge(data, first, n1, n2)
17Mergesort analysis
- The usual technique of determining the big-O of a
recursive methods does not work here - There are only half as many elements in the merge
phase within each successive recursive call - Instead look at the merge activity across an
entire level at a time - The big-O of merge across each level is O(n)
- There are O( log(n) ) levels
- Ignore the actual number of recursive calls
18Mergesort analysis
log(n) levels
n elements
- Worst case average case best case O( n
log(n) )
19Mergesort analysis
- A disadvantage of mergesort when used with arrays
is that a second temporary array is needed - This effectively cuts the size of the largest
array that can be sorted in half - Advantages
- Works with linked lists without need for a
temporary array ! - Can be used to sort data in a huge disk file
- A file much too large to fit in memory
- Subdivide the file into pieces small enough to
fit in memory - Sort the pieces
- Merge the pieces together
20Quicksort
public static void quicksort( int data, int
first, int n ) Partition the array in
two parts such that (all elements in left part)
lt (all elements in the right part)
Recursively apply quicksort to each part
- Quicksort works in a manner opposite to mergesort
- The partition operation iterates through all the
elements before the recursive calls rather than
after - The partition operation does rough sorting ahead
of time
21Quicksort partition method
public static int partition( int data, int
first, int n )
- The partition method rearranges elements such
that all the elements in the left part will be
smaller than any of the elements in the right
part - A value called the pivot value will end up
between the elements of the two parts - This pivot value is
- gt each element of the left part
- lt each element of the right part
- Partition method returns the pivot index giving
the position of the pivot value
22Quicksort partition method
- Start by choosing a pivot value to help with
the partitioning process - Ideally, the pivot value would be the median of
the elements in the sort range - Then the two parts would be nearly equal in size
- However, finding the median is a O(n) operation
- This is too inefficient
- Instead, simply choose data first for the
pivot - Later, we will improve on this guess
- Then use two indices to sweep through the data
- u for up
- d for down
23Quicksort partition method
- Sweep through the data as follows
- Move u up and d down until the values they refer
to are out of order with respect to the pivot
value - Then swap the u and d values and continue
- Stop when u and d pass each other
- Finally, swap data d with the pivot value
data first - Return index d
24Quicksort partition example
pivot
u
d
pivot
u
d
pivot
u
d
pivot
u
d
pivot
u
d
all lt pivot (7)
all gt pivot (7)
25Quicksort
- Once elements in both parts are sorted, the
entire array is sorted
public static void quicksort( int data, int
first, int n) int pivotIndex // Array
index for the pivot element int n1
// Number of elements before the pivot element
int n2 // Number of elements after
the pivot element if ( n gt 1 )
// Partition the array, and set the pivot index.
pivotIndex partition( data, first, n
) // Compute the sizes of the two
pieces. n1 pivotIndex - first
n2 n - n1 - 1 // Recursive calls
will now sort the two pieces. quicksort(
data, first, n1 ) quicksort( data,
pivotIndex 1, n2 )
26Quicksort analysis
- Best case average case O( n log(n) )
- When pivot occurs near the center of each time
- Number of levels O( log(n) )
- Number of probes within each level O(n)
- Worst case O(n2)
- When pivot occurs near an end most of the time
- For example, when the array is already sorted
- Number of levels is only limited by n
27Quicksort
- There is a better way to choose the pivot value
- 1. Choose the median of the three values . . .
- data first
- data first n 1
- data first n/2
- 2. Swap the chosen value with data first
- 3. Continue as before
- This method is called the median of three
- Statistically, this gives a much better pivot
value - Performance is much more likely to be O(n log(n)
) - Even when the data is already sorted
28Other improvements
- Both mergesort and quicksort encounter more and
more overhead due to recursion when the subarrays
get small - Both can be improved as follows
- When a subarray represents less than some number
M of elements, use the insertion sort method on
the subarray instead of making a recursive call - A typical value for M might be around 100