Sorting - PowerPoint PPT Presentation

About This Presentation
Title:

Sorting

Description:

What makes it hard? Chapter 7 in DS&AA Chapter 8 in DS&PS Insertion Sort Algorithm Conceptually, incremental add element to sorted array or list, starting with an ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 21
Provided by: DennisK152
Learn more at: https://ics.uci.edu
Category:

less

Transcript and Presenter's Notes

Title: Sorting


1
Sorting
  • What makes it hard?
  • Chapter 7 in DSAA
  • Chapter 8 in DSPS

2
Insertion Sort
  • Algorithm
  • Conceptually, incremental add element to sorted
    array or list, starting with an empty array
    (list).
  • Incremental or batch algorithm.
  • Analysis
  • In best case, input is sorted time is O(N)
  • In worst case, input is reverse sorted time is
    O(N2).
  • Average case is (loose argument) is O(N2)
  • Inversion elements out of order
  • critical variable for determining algorithm
    time-cost
  • each swap removes exactly 1 inversion

3
Inversions
  • What is average number of inversions, over all
    inputs?
  • Let A be any array of integers
  • Let revA be the reverse of A
  • Note if (i,j) are in order in A they are out of
    order in revA. And vice versa.
  • Total number of pairs (i,j) is N(N-1)/2 so
    average number of inversions is N(N-1)/4 which
    is O(N2)
  • Corollary any algorithm that only removes a
    single inversion at a time will take time at
    least O(N2)!
  • To do better, we need to remove more than one
    inversion at a time.

4
BubbleSort
  • Most frequently used sorting algorithm
  • Algorithm
  • for jn-1 to 1 . O(n)
  • for i0 to j .. O(j)
  • if Ai and Ai1 are out of order,
    swap them
  • (thats the bubble) . O(1)
  • Analysis
  • Bubblesort is O(n2)
  • Appropriate for small arrays
  • Appropriate for nearly sorted arrays
  • Comparision versus swaps ?

5
Shell Sort 1959 by Shell
  • Motivated by inversion result - need to move far
    elements
  • Still quadratic
  • Only in text books
  • Historical interest and theoretical interest -
    not fully understood.
  • Algorithm (with schedule 1, 3, 5)
  • bubble sort things spaced 5 apart
  • bubble sort things 3 apart
  • bubble sort things 1 apart
  • Faster than insertion sort, but still O(N2)
  • No one knows the best schedule

6
Divide and Conquer Merge Sort
  • Let A be array of integers of length n
  • define Sort (A) recursively via auxSort(A,0,N)
    where
  • Define array Sort(A,low, high)
  • if (low high) return
  • Else
  • mid (lowhigh)/2
  • temp1 sort(A,low,mid)
  • temp2 sort(A,mid,high)
  • temp3 merge(temp1,temp2)

7
Merge
  • Int Merge(int temp1, int temp2)
  • int temp new int temp1.lengthtemp2.length
  • int i,j,k
  • repeat
  • if (temp1ilttemp2j) tempktemp1i
  • else tempk temp2j
  • for all appropriate i, j.
  • Analysis of Merge
  • time O( temp1.lengthtemp2.length)
  • memory O(temp1.lengthtemp2.length)

8
Analysis of Merge Sort
  • Time
  • Let N be number of elements
  • Number of levels is O(logN)
  • At each level, O(N) work
  • Total is O(NlogN)
  • This is best possible for sorting.
  • Space
  • At each level, O(N) temporary space
  • Space can be freed, but calls to new costly
  • Needs O(N) space
  • Bad - better to have an in place sort
  • Quick Sort (chapter 8) is the sort of choice.

9
Quicksort Algorithm
  • QuickSort - fastest algorithm
  • QuickSort(S)
  • 1. If size of S is 0 or 1, return S
  • 2. Pick element v in S (pivot)
  • 3. Construct L all elements less than v and
  • R all elements greater than v.
  • 4. Return QuickSort(L), then v, then QuickSort(R)
  • Algorithm can be done in situ (in place).
  • On average runs in O(NlogN), but can take O(N2)
    time
  • depends on choice of pivot.

10
Quicksort Analysis
  • Worst Case
  • T(N) worst case sorting time
  • T(1) 1
  • if bad pivot, T(N) T(N-1)N
  • Via Telescope argument (expand and add)
  • T(N) O(N2)
  • Average Case (text argument)
  • Assume equally likely subproblem sizes
  • Note chance of picking ith is 1/N
  • T(N) average cost to sort

11
Analysis continued
  • T(left branch) T(right branch) (average) so
  • T(N) 2 ( T(0)T(1).T(N-1) )/N N, where N
    is cost of partitioning
  • Multiply by N
  • NT(N) 2(T(0)T(N-1)) N2 ()
  • Subtract N-1 case of ()
  • NT(N) - (N-1)T(N-1) 2T(N-1) 2N-1
  • Rearrange and drop -1
  • NT(N) (N1)T(N-1) 2N -1
  • Divide by N(N1)
  • T(N)/(N1) T(N-1) 2/(N1)

12
Last Step
  • Substitute N-1, N-2,... 3 for N
  • T(N-1)/N T(N-2)/(N-1) 2/N
  • T(2)/3 T(1)/2 2/3
  • Add
  • T(N)/(N1) T(1)/2 2(1/31/4 ..1/(N1)
  • 2( 11/2 ) -5/2 since T(1) 0
  • O(logN)
  • Hence T(N) N logN
  • In literature, more accurate proof.
  • For better results, choose pivot as median of 3
    random values.

13
Quickselect Algorithm
  • Problem find the kth smallest item
  • Algorithm modify Quicksort
  • let S be the number of elements in S.
  • QuickSelect(S, k)
  • if S 1, return element in S
  • Pick element p in S (the pivot)
  • Partition S via p as in QuickSort into L and R
  • if k lt L return QuickSelect(L,k)
  • if k L1, return pivot
  • otherwise return QuickSelect(R, k - L-1)

14
Quickselect Analysis
  • Worst Case is O(N2)
  • Average Case analysis similar to quicksorts.
  • Here T(N) 1(T(0)T(1)T(N-1))/N N
  • Multiply by N
  • NT(N) T(0)T(1) T(N-1) N2
  • Substitute with N N-1 and subtract
  • NT(N) -(N-1)T(N-1) T(N-1) 2N -1
  • Rearrange and divide by N
  • T(N) T(N-1)2
  • T(N) T(N-2) 4.. T(1)2N O(N)
  • Average Case Linear.

15
Bucket Sort
  • A linear time sort algorithm!
  • Need to know the possible values.
  • Example 1 to sort N integers less than M.
  • Make array A of size M
  • Read each integer i and update, Ai
  • Example 2 200 names
  • make array of size 2626 676
  • Using first 2 letters of each name, put it in
    char-char bucket (usually a short ordered
    linked list)
  • Collect them up

16
Radix Sorting (card sorting)
  • Uses linked lists
  • Idea Multiple passes of Bucket Sort
  • Trick Iteratively sort by last index, next to
    last, etc.
  • Example
  • ed ca xa cd xd bd
  • pass1 aca, xa ded, cd, xd, bd
  • ca xa ed cd xd bd
  • pass 2 bbd c ca, cd e ed xxa,
    xd
  • bd ca cd ed xa xd
  • Complexity O(N number of passes)
  • number of passes length of key

17
External Sorting (Tape or CD)
  • Idea merge sort (2-way)
  • Suppose memory size is M (enough to sort
    internally)
  • Ta1, Ta2, Tb1, Tb2 are tape drives
  • Data on Ta1 (initially)
  • Pass 1
  • read M records
  • sort and write to Tb1, Tb2 alternatively
  • (each run of M records on Tb1, Tb2 is
    sorted)
  • Pass 2
  • merge sort Tb1 and Tb2 onto Ta1 and Ta2
  • Note this takes O(1) memory
  • Each run of 2M records is sorted

18
External Sorting
  • Continuing merging, alternating writing to ta1,
    ta2.
  • Number of passes is log(N/M)
  • Time comlexity is O( N/M log(M)) for first pass
  • O(N) for subsequent passes
  • Total O(max(N log(N/M), N/Mlog(M))
  • With more tapes, can reduce time by doing k-way
    merge rather than 2-way merge
  • Replace Log base 2 with log base k
  • A trickier algorithm (Polyphase) can do it with
    fewer tapes.
  • Who uses tapes? Algorithm works for CDs

19
Lower Bound for Sorting
  • Theorem if you sort by comparisons, then must
    use at least log(N!) comparisons. Hence N logN
    algorithm.
  • Proof
  • N items can be rearranged in N! ways.
  • Consider a decision tree where each internal node
    is a comparison.
  • Each possible array goes down one path
  • Number of leaves N!
  • minimum depth of a decision tree is log(N!)
  • log(N!) log1log2log(N) is O(N logN)
  • Proof use partition trick
  • sum log(N/2) log(N/21).log(N) gtN/2log(N/2)

20
Summary
  • For online sorting, use heapsort.
  • Online get elements one at at time
  • Offline or Batch have all elements available
  • For small collections, bubble sort is fine
  • For large collections, use quicksort
  • You may hybridize the algorithms, e.g
  • use quicksort until the size is below some k
  • then use bubble sort
  • Sorting is important and well-studied and often
    inefficiently done.
  • Libraries often contain sorting routines, but
    beware the quicksort routine in Visual C seems
    to run in quadratic time. Java sorts in
    Collections are fine.
Write a Comment
User Comments (0)
About PowerShow.com