Title: Sorting
1Chapter 7
2Objective
- To understand the importance of sorting
- Review known sorting techniques
- Develop improved sorting techniques (from a
computer complexity viewpoint) - Implement several sorting classes
3Review of Sorting Techniques
- Selection sort
- Bubble sort
- Insertion sort
- All of the above are O(N2)
- Radix sort
- O(mN) where m is the number of columns to be
sorted
4Selection Sort Steps
- If the array is of length n, we need n-1 steps
- If we are at ai, we must find the smallest
number between ai and an - We need to exchange ai with this smallest number
5- The step by step process to sort the components
of the array a into ascending order is
Compare a0 with a1 through a4 and swap
a0 with the smallest value (a4)
Next, compare a1 with the contents of a2
through a4. No change.
Next, compare a2 with the contents of a3
through a4 and swap a2 with the smallest of
these values.
Next, compare a3 with the contents of a4 and
swap a3 with the smallest of these values
6Bubble Sort Steps
- During the first pass through the array
- Compare each consecutive member
- Interchange when necessary consecutive members
- After the first pass, the nth element is in
sequence - Repeat through n-1 element
- Repeat
- In general, the kth pass has the n-kth element in
sequence - Notice that, during any pass, if no changes are
made, we can stop (the array is in sequence).
7- The step by step process using the bubble sort
After the first pass
After the second pass
After the third pass
After the fourth pass
8Insertion Sort Steps
- Starting with the first element, as you advance
to the next element, insert in the correct order - As you insert, compare with the element before
the new element and move downward
9- The step by step process to sort by insertion
First step
Second step
Third step
Last step
10Applet for Sorting
11Computer Complexity
- All of the above are O(N2)
- They can be classified as exchanging adjacent
elements - It can be shown that this classification is ?(N2)
time on average
12The Radix Sort
- This sort is based on the concept of sorting
IBMs 80 column Hollerith cards. - To sort in columns 1-5, sort column 5 and
progress until column 1. - Note that the number of passes will be 5N i.e.,
this sort is O(N). - Each sorted column must be kept in memory, called
bins
13(No Transcript)
14(No Transcript)
15Sorting Indices
- Maintaining a separate array of indices
- Run time can be reduced by sorting the indices
- Why?
16Heapsort
- A heap is a natural structure to help for sorting
- Build a binary heap
- This takes O(N) time
- deleteMin
- Place the deleted node in an array
- This takes O(log N)
- Hence the total run time is O(N log N)
- The negative is that it doubles the memory
allocation for the sort
17Better Heapsort Algorithm
- Build the heap in descending order
- Interchange the root with the last element
- Reduce size by 1
- Restore the heap property
- Repeat this n times for the n elements in the
array
18Complexity
- The complexity is O( n log n ), even in the worst
case - If worst case behavior is critically important,
this is a very good sort
19Build Heap and Delete Max
- The heap is a max heap
- Remove 97
- The last element, 31, tries to go to the root but
filters down to its natural position - The 97 goes to where the 31 used to be but is not
part of the heap - This process is continued until the data is
sorted in ascending order
20Heapsort Code- 1
- / 1/ template ltclass Etypegt
- / 2/ void
- / 3/ Perc_Down( Etype A , unsigned int i,
const unsigned int N ) - / 4/
- / 5/ unsigned int Child
- / 6/ Etype Tmp A i
- / 7/ for( i 2 lt N i Child )
- / 8/
- / 9/ Child i 2
- /10/ if( Child ! N A Child 1 gt
A Child ) - /11/ Child
- /12/ if( Tmp lt A Child )
- /13/ A i A Child
- /14/ else
- /15/ break
- /16/
- /17/ A i Tmp
- /18/
21Heapsort Code 2
- / 1/ template ltclass Etypegt
- / 2/ void
- / 3/ Heap_Sort( Etype A , const unsigned int
N ) - / 4/
- / 5/ for( unsigned int i N / 2 i gt 0
i-- ) // Build_Heap. - / 6/ Perc_Down( A, i, N )
- / 7/ for( i N i gt 2 i-- )
- / 8/
- / 9/ Swap( A 1 , A i ) //
Delete_Max. - /10/ Perc_Down( A, ( unsigned int ) 1,
i - 1 ) - /11/
- /12/
22Applet for Heapsort
23Homework
- Build a max heap from the following data, then
show the data pass by pass as the heapsort is
performed77 22 84 34 35 75 21
46 88
24Mergesort - 1
- A mergesort is based on merging two sorted
subsequences together to produce a sorted,
combined subsequence - Divide and Conquer
- Surprisingly, the complexity is O(N log N)
- The sizes of the subsequences grow from 1 to 2 to
4 to 8 to 16 and so forth it takes log n merges
each of complexity O(n) - Requires 3 pointers, called them Actr, Bctr, Cctr
where we are merging A and B to form C
25Mergesort 2
26Mergesort -3
27Mergesort Code 1
- / 1/ template ltclass Etypegt
- / 2/ void
- / 3/ Merge_Sort( Etype A , const unsigned int
N ) - / 4/
- / 5/ Etype Tmp_Array new Etype N 1
- / 6/ unsigned int New_N N //
Non-constant, for m_sort. - / 7/ if( Tmp_Array ! NULL )
- / 8/
- / 9/ M_Sort( A, Tmp_Array, ( unsigned
int ) 1, New_N ) - /10/ delete Tmp_Array
- /11/
- /12/ else
- /13/ Error( "No space for tmp array" )
- /14/
28- / 1/ template ltclass Etypegt
- / 2/ void
- / 3/ M_Sort( Etype A , Etype Tmp_Array ,
- / 4/ unsigned int Left, unsigned int
Right ) - / 5/
- / 6/ if( Left lt Right )
- / 7/
- / 8/ unsigned int Center ( Left
Right ) / 2 - / 9/ M_Sort( A, Tmp_Array, Left, Center
) - /10/ M_Sort( A, Tmp_Array, Center 1,
Right ) - /11/ Merge( A, Tmp_Array, Left, Center
1, Right ) - /12/
- /13/
29Mergesort Code 2
- / 1/ // Left_Pos start of left half.
- / 2/ // Right_Pos start of right half.
- / 3/ template ltclass Etypegt
- / 4/ void
- / 5/ Merge( Etype A , Etype Tmp_Array ,
unsigned int Left_Pos, - / 6/ unsigned int Right_Pos, unsigned
int Right_End ) - / 7/
- / 8/ int Left_End Right_Pos - 1
- / 9/ int Tmp_Pos Left_Pos
- /10/ int Num_Elements Right_End -
Left_Pos 1 - /11/ // Main loop.
- /12/ while( Left_Pos lt Left_End
Right_Pos lt Right_End ) - /13/ if( A Left_Pos lt A Right_Pos
) - /14/ Tmp_Array Tmp_Pos A
Left_Pos - /15/ else
- /16/ Tmp_Array Tmp_Pos A
Right_Pos - /17/ while( Left_Pos lt Left_End ) // Copy
rest of first half. - /18/ Tmp_Array Tmp_Pos A
Left_Pos - /19/ while( Right_Pos lt Right_End ) //
Copy rest of second half.
30Homework
- Sort the following data using a mergesort. Show
the results pass by pass.77 22 84 34
35 75 21 46 88
31Analysis using Telescoping
32(No Transcript)
33(No Transcript)
34Quicksort the basic approach
- The basic idea
- Partition the data into two sets, those elements
gt a pivot and those lt the pivot - The key insight is that no data will go from one
partition to the other after the partitioning is
finished - This is a divide and conquer problem where the
data in the partitions can be sorted independently
35The algorithm
36The Big Picture
- Some critical issues
- How is the pivot selected
- How can the number of recursive steps be reduced
- How to avoid worst case behavior
37Picking the Pivot
- Some choices are
- Pick the leftmost or rightmost element
- Pick the center element
- Pick the median of the leftmost, center, and
rightmost elements - We will use the median of three approach
- It has better performance since, statistically,
the median of a subset of elements (three in this
case) is more likely to be near the median of all
the data (the optimal choice)
38Algorithm
- We can perform the sort in place using the
following algorithm - Determine the pivot (using the median of the
first, last and median) - Interchange the pivot with the last element
- Use 2 pointers, i pointing to the first element
and j pointing to the last element before the
pivot - Move i to the right until a large number (number
greater than the pivot) is encountered - Move j to the left until a small number is
encountered - Swap the elements
- Continue until j is left of i
- Then swap i with the pivot
- Example use 8 1 4 9 6 3 5 2 7 0
39Partitioning
- Assume the pivot, 6, has been placed in the
rightmost position - These pictures show a complete partitioning of
the data
40Driver for Quicksort
- / 1/ template ltclass Etypegt
- / 2/ void
- / 3/ Quick_Sort( Etype A , const unsigned int
N ) - / 4/
- / 5/ const unsigned int One 1
- / 6/ Q_Sort( A, One, N )
- / 7/ Insertion_Sort( A, N )
- / 8/
- / 9/ template ltclass Etypegt
- /10/ inline void
- /11/ Swap( Etype A, Etype B )
- /12/
- /13/ Etype Tmp
- /14/ Tmp A
- /15/ A B
- /16/ B Tmp
- /17/
41Median of Three
- / 1/ template ltclass Etypegt
- / 2/ Etype
- / 3/ Median3( Etype A ,
- / 4/ const unsigned int Left, const
unsigned int Right ) - / 5/
- / 6/ unsigned int Center ( Left Right )
/2 - / 7/ if( A Left gt A Center )
- / 8/ Swap( A Left , A Center )
- / 9/ if( A Left gt A Right )
- /10/ Swap( A Left , A Right )
- /11/ if( A Center gt A Right )
- /12/ Swap( A Center , A Right )
- /13/ // Invariant A Left lt A Center
lt A Right . - /14/ // Now hide and return pivot.
- /15/ Swap( A Center , A Right - 1 )
- /16/ return A Right - 1
- /17/
42Quicksort Routine
- / 1/ template ltclass Etypegt
- / 2/ void
- / 3/ Q_Sort( Etype A ,
- / 4/ const unsigned int Left, const
unsigned int Right ) - / 5/
- / 6/ if( Left Cutoff lt Right )
- / 7/
- / 8/ Etype Pivot Median3( A, Left,
Right ) - / 9/ unsigned int i Left, j Right -
1 - /10/ for( )
- /11/
- /12/ while( A i lt Pivot )
- /13/ while( A --j gt Pivot )
- /14/ if( i lt j )
- /15/ Swap( A i , A j )
- /16/ else
- /17/ break
- /18/
- /19/ Swap( A i , A Right - 1 ) //
Restore pivot.
43Homework
- Perform a quicksort on the following data. Show
the results pass by pass77 22 84 34
35 75 21 46 88
44Small Arrays
- For small arrays (Nlt20), quicksort does not work
as well as the O(N2) sorts. - Use quicksort until partition reaches 20
- Then use one of the other sorts
45Analysis of QuicksortWorst case
- General recurrence
- Worst case recurrence
46Analysis of Quicksort Best Case
- Best case recurrence
-
- Average case recurrence is more complex but
results in O( n log n )
47Quick Select - general
- The goal is to find the kth largest element
- A simple approach is to sort the data and get the
data at the k-1 index - The complexity would be O(n log n)
- A linear time approach can be accomplished
without sorting all the data by adapting the
quicksort routine to a new routine, quickselec - Only apply quicksort recursively only to
partition containing the desired final position - When the partition gets small enough, use an
insertion sort to avoid the cost of recursion
48QuickSelectCode
- / 1/ template ltclass Etypegt
- / 2/ void
- / 3/ Q_Select( Etype A , const unsigned int
k, - / 4/ const unsigned int Left, const
unsigned int Right ) - / 5/
- / 6/ if( Left Cutoff lt Right )
- / 7/
- / 8/ Etype Pivot Median3( A, Left,
Right ) - / 9/ unsigned int i Left, j Right -
1 - /10/ for( )
- /11/
- /12/ while( A i lt Pivot )
- /13/ while( A --j gt Pivot )
- /14/ if( i lt j )
- /15/ Swap( A i , A j )
- /16/ else
- /17/ break
- /18/
- /19/ Swap( A i , A Right - 1 ) //
Restore pivot.
49A Lower Bound on Complexity
- Sorts based on comparison
- We will prove the best sorting routine based on
comparisons cannot be any better than O( n log n
) - We have studied three sorts of this complexity
class - Quicksort has the best average time behavior
- Heapsort has the best worst case behavior
- Mergesort is well suited for data stored
sequentially in external files - Other sorts
- Radix sort can have linear time behavior, but the
multiplying constant may be high - Radix sort only applies to certain types of data
50A Decision Tree
- Three elements can be ordered 3! 6 ways, this
decision tree finds which ordering is correct
51Some Theorems - 1
52Some Theorems - 2