Title: Heapsort Algorithm
1Heapsort Algorithm
- CS 583
- Analysis of Algorithms
2Outline
- Sorting Problem
- Heaps
- Definition
- Maintaining heap property
- Building a heap
- Heapsort Algorithm
3Sorting Problem
- Sorting is usually performed not on isolated
data, but records. - Each record contains a key, which is the value to
be sorted. - The remainder is called satellite data.
- When a sorting algorithm permutes the keys, it
must permute the satellite data as well. - If the satellite data is large for each record,
we often permute pointers to records. - This level of detail is usually irrelevant in the
study of algorithms, but is important when
converting an algorithm to a program.
4Sorting Problem Importance
- Sorting is arguably the most fundamental problem
in the study of algorithms for the following
reasons - The need to sort information is often a key part
of an application. For example, sorting the
financial reports by security IDs. - Algorithms often use sorting as a key subroutine.
For example, in order to match a security against
benchmarks, the latter set needs to be sorted by
some key elements. - There is a wide variety of sorting algorithms,
and they use a rich set of techniques.
5Heaps
- Heapsort algorithm sorts in place and its running
time is O(n log(n)). - It combines the better attributes of insertion
sort and merge sort algorithms. - It is based on a data structure, -- heaps.
- The (binary) heap data structure is an array
object that can be viewed as a nearly complete
binary tree. - An array A that represents a heap is an object
with two attributes - lengthA, which is the number of elements, and
- heap-sizeA, the number of elements in the heap
stored within the array A.
6Heaps Example
- A 10, 8, 6, 5, 7, 3, 2
- Â
- Â
- 10
- 8 6
- 7 3 2
- The root of the tree is A1. Children of a node
i determined as follows - Â
- Left(i)
- return 2i
- Right(i)
- return 2i1
- Â
7Heaps Example (cont.)
- The above is proven by induction
- The root's left child is 2 21.
- Assume it is true for node n.
- The left child of a node (n1) will follow the
right child of node n left(n1) 2n 1 1
2(n1) ? - Â
- The parent of a node i is calculated from i2p,
or i2p1, where p is a parent node. Hence - Â
- Parent(i)
- return floor(i/2)
8Max-Heaps
- In a max-heap, for every node i other than the
root - AParent(i) gt Ai
- For the heapsort algorithm, we use max-heaps.
- The height of the heap is defined to be the
longest path from the root to a leaf, and it is
?(lg n) since it is a complete binary tree. - We will consider the following basic procedures
on the heap - Max-Heapify to maintain the max-heap property.
- Build-Max-Heap to produce a max-heap from an
unordered input array. - Heapsort to sort an array in place.
9Maintaining the Heap Property
- The Max-Heapify procedure takes an array A and
its index i. - It is assumed that left and right subtrees are
already max-heaps. - The procedure lets the value of Ai "float down"
in the max-heap so that the subtree rooted at
index i becomes a max-heap.
10Max-Heapify Algorithm
- Max-Heapify (A, i)
- 1 l Left(i)
- 2 r Right(i)
- 3 if l lt heap-sizeA and Al gt Ai
- 4 largest l
- 5 else
- 6 largest i
- 7 if r lt heap-sizeA and Ar gt Alargest
- 8 largest r
- 9 if largest ltgt i
- 10 ltexchange Ai with Alargestgt
- 11 Max-Heapify(A, largest)
11Max-Heapify Analysis
- It takes ?(1) to find Alargest, plus the time
to run the procedure recursively on at most 2n/3
elements. (This is the maximum size of a child
tree. It occurs when the last row of the tree is
exactly half full.) - Â
- Assume there n nodes and x levels in the tree
that has half of the last row. This means - Â
- n 1 2 ... 2(x-1) 2x/2
- 2x 1 2x/2 n
- 2(x-1) a gt 2a a n1 gt
- 2(x-1) (n1)/3
12Max-Heapify Analysis (cont.)
Max subtree size (half of all elements to level
x-1) (elements at the last level) (1 root
element) (2x 1)/2 2x/2 1 2(x-1) ½
2(x-1) 1 n/3 1/3 n/3 1/3 1.5
2n/3 2/3 1.5 2n/3 Â Therefore the running
time of Max-Heapify is described by the following
recurrence  T(n) lt T(2n/3) ?(1) According to
the master theorem  T(n) ?(lg n) (a1,
b3/2, f(n) ?(1)) Â Since T(n) is the
worst-case scenario, we have a running time of
the algorithm at O(lg n).
13Building a Heap
- We can use the procedure Max-Heapify in a
bottom-up manner to convert the whole array
A1..n into a max-heap. - Note that, elements Afloor(n/2)1..n are
leaves. The last element that is not a leaf is a
parent of the last node, -- floor(n/2). - The procedure Build-Max-Heap goes through all
non-leaf nodes and runs Max-Heapify on each of
them.
14Build-Max-Heap Algorithm
- Build-Max-Heap(A, n)
- 1 heap-sizeA n
- 2 for i floor(n/2) to 1
- 3 Max-Heapify(A,i)
- Invariant
- At the start of each iteration 2-3, each node
i1, ... , n is the root of a max-heap. - Proof.
- Initialization ifloor(n/2). Each node in
floor(n/2)1,...,n are leaves and hence are roots
of trivial max-heaps.
15Build-Max-Heap Correctness
- Maintenance children of node i are numbered
higher than i, and by the loop invariant are
assumed to be roots of max-heaps. - This is the condition for Max-Heapify.
- Moreover, the Max-Heapify preserves the property
that i1, ... , n are roots of max-heaps. - Decrementing i by 1 makes the loop invariant for
the next iteration. - Termination i0, hence each node 1,2,...,n is
the root of a max-heap.
16Build-Max-Heap Performance
- Each call to Max-Heapify takes O(lg n) time and
there are n such calls. - Therefore the running time of Build-Max-Heap is
O(n lgn). - To derive a tighter bound, we observe that the
running time of Max-Heapify depends on the node's
height. - An n-element heap has height floor(lgn). There
are at most ceil(n/2(h1)) nodes of any height
h. Assume these nodes are at height x of the
original tree. Then we have
17Build-Max-Heap Performance (cont.)
12...2x...2h n 2(xh1) n1 2x
(n1)/2(h1) ceil(n/2(h1)) Â The time
required by Max-Heapify when called on a node of
height h is O(h). Hence  ?h0,floor(lgn)ceil(n/2
(h1)) O(h) O(n?h0,floor(lgn)h/2h) Â A.8
?k0,?k/xk x/(1-x)2  ?h0,?h/2h ½ /
(1-1/2)2 2 Â Thus, the running time of
Build-Max-Heap can be bounded  O(n
?h0,floor(lgn)h/2h) O(n?h0,?h/2h) O(n)
18The Heapsort Algorithm
The heapsort algorithm uses Build-Max-Heap on
A1..n. Since the maximum element of the array
is at A1, it can be put into correct position
An. Now A1..(n-1) can be made max-heap
again. Â Heapsort (A,n) 1 Build-Max-Heap(A,n) 2
for i n to 2 3 ltswap A1 with Aigt 4
heap-sizeA heap-sizeA-1 5
Max-Heapify(A,1) Â Step 1 takes O(n) time. Loop 2
is repeated (n-1) times with step 5 taking most
time O(lgn). Hence the running time of heapsort
is O(n) O(n lgn) O(n lgn).