Title: Todays Material
1Todays Material
- The dynamic equivalence problem
- a.k.a. Disjoint Sets/Union-Find ADT
- Covered in Chapter 8 of the textbook
2Motivation
- Consider the relation between integers
- For any integer A, A A (reflexive)
- For integers A and B, A B means that B A
(symmetric) - For integers A, B, and C, A B and B C means
that A C (transitive) - Consider cities connected by two-way roads
- A is trivially connected to itself
- A is connected to B means B is connected to A
- If A is connected to B and B is connected to C,
then A is connected to C
3Equivalence Relationships
- An equivalence relation R obeys three properties
- reflexive for any x, xRx is true
- symmetric for any x and y, xRy implies yRx
- transitive for any x, y, and z, xRy and yRz
implies xRz - Preceding relations are all examples of
equivalence relations - What are not equivalence relations?
4Equivalence Relationships
- An equivalence relation R obeys three properties
- reflexive for any x, xRx is true
- symmetric for any x and y, xRy implies yRx
- transitive for any x, y, and z, xRy and yRz
implies xRz - What about lt on integers?
- 1 and 2 are violated
- What about on integers?
- 2 is violated
5Equivalence Classes and Disjoint Sets
- Any equivalence relation R divides all the
elements into disjoint sets of equivalent items - Let be an equivalence relation. If AB, then A
and B are in the same equivalence class. - Examples
- On a computer chip, if denotes electrically
connected, then sets of connected components
form equivalence classes - On a map, cites that have two-way roads between
them form equivalence classes - What are the equivalence classes for the relation
Modulo N applied to all integers?
6Equivalence Classes and Disjoint Sets
- Let be an equivalence relation. If AB, then A
and B are in the same equivalence class. - Examples
- The relation Modulo N divides all integers in N
equivalence classes (for the remainders 0, 1, ,
N-1) - Under Mod 5
- 0 5 10 15
- 1 6 11 16
- 2 7 12
- 3 8 13
- 4 9 14
- (5 equivalence classes denoting remainders 0
through 4 when divided by 5)
7Union and Find Problem Definition
- Given a set of elements and some equivalence
relation between them, we want to figure out
the equivalence classes - Given an element, we want to find the equivalence
class it belongs to - E.g. Under mod 5, 13 belongs to the equivalence
class of 3 - E.g. For the map example, want to find the
equivalence class of Eskisehir (all the cities it
is connected to) - Given a new element, we want to add it to an
equivalence class (union) - E.g. Under mod 5, since 18 13, perform a union
of 18 with the equivalence class of 13 - E.g. For the map example, Ankara is connected to
Eskisehir, so add Ankara to equivalence class of
Eskisehir
8Disjoint Set ADT
- Stores N unique elements
- Two operations
- Find Given an element, return the name of its
equivalence class - Union Given the names of two equivalence
classes, merge them into one class (which may
have a new name or one of the two old names) - ADT divides elements into E equivalence classes,
1 E N - Names of classes are arbitrary
- E.g. 1 through N, as long as Find returns the
same name for 2 elements in the same equivalence
class
9Disjoint Set ADT Properties
- Disjoint set equivalence property every element
of a DS ADT belongs to exactly one set (its
equivalence class) - Dynamic equivalence property the set of an
element can change after execution of a union
Disjoint Set ADT
- Example
- Initial Classes 1,4,8, 2,3, 6, 7,
5,9,10 - Name of equiv. class underlined
1,4,8 6
7
5,9,10
2,3
6
Find(4)
8
2,3,6
2,3
Union(6, 2)
10Disjoint Set ADT Format Definition
- Given a set U a1, a2, , an
- Maintain a partition of U, a set of subsets (or
equivalence classes) of U denoted by S1, S2, ,
Sk such that - each pair of subsets Si and Sj are disjoint
- together, the subsets cover U
- each subset has a unique name
- Union(a, b) creates a new subset which is the
union of as subset and bs subset - Find(a) returns the unique name for as subset
11Implementation Ideas and Tradeoffs
- How about an array implementation?
- N element array A Ai holds the class name for
element i - E.g. Assume 8 43
- pick 3 as class name and set A8 A4 A3
3
Sets 0, 1, 2, 5, 9, 3, 4,
8, 6, 7
- Running time for Find(i)?
O(1) (just return Ai)
- Running time for Union(i, j)?
O(N)
12Implementation Ideas and Tradeoffs
- How about linked lists?
- One linked list for each equivalence class
- Class name head of list
E.g. Sets 0, 1, 2, 5, 9,
3, 4, 8, 6, 7
- Running time for Union(i, j) ?
- E.g. Union(1, 3)
- O(1) Simply append one list to the end of the
other
0
1
2
5
9
3
4
8
- Running time for Find(i) ?
- O(N) Must scan all lists in the worst case
6
7
13Implementation Ideas and Tradeoffs
- Tradeoff between Union-Find can we do both in
O(1) time? - N-1 Unions (the maximum possible) and M Finds
- O(N2 M) for array
- O(N MN) for linked list implementation
- Can we do this in O(M N) time?
14Towards a new Data Structure
- Intuition Finding the representative member (
class name) for an element is like the opposite
of searching for a key in a given set - So, instead of trees with pointers from each node
to its children, lets use trees with a pointer
from each node to its parent - Such trees are known as Up-Trees
15Up-Tree Data Structure
- Each equivalence class (or discrete set) is an
up-tree with its root as its representative
member - All members of a given set are nodes in that
sets uptree
NULL
NULL
NULL
c, f
h
a, d, g, b, e
Up-Trees are not necessarily binary
16Implementing Up-Trees
NULL
NULL
NULL
NULL
- Forest of up-trees can easily be stored in an
array - (call it up)
- upX parent of X
- -1 if root
g
h, i
c, f
a, b, d, e
Array up
17Example Find
NULL
NULL
NULL
NULL
- Find(x) Just follow parent pointers to the root
- Find(e) a
- Find(f) c
- Find(g) g
g
h, i
c, f
a, b, d, e
Array up
Find(e)
18Implementing Find(x)
define N 9 int upN / Returns setid of
x/ int Find(int x) while (upx gt 0)
x upx / end-while / return x /
end-Find /
NULL
NULL
NULL
NULL
g
h, i
c, f
a, b, d, e
Running time?
O(maxHeight)
Array up
Find(4)
19Recursive Find(x)
define N 9 int upN / Returns setid of
x/ int Find(int x) if (upx lt 0)
return x return Find(upx) / end-Find /
NULL
NULL
NULL
NULL
g
h, i
c, f
a, b, d, e
Array up
Find(4)
20Example Union
NULL
NULL
NULL
NULL
- Union(x, y) Just hang one root from the other!
- Union(c, a)
g
h, i
a, b, d, e, c, f
0(a)
Array up
2
-1
21Implementing Union(x, y)
define N 9 int upN / Joins two sets / int
Union(int x, int y) assert(upx lt 0)
assert(upy lt 0) upy x / end-Union /
NULL
NULL
NULL
NULL
g
h, i
a, b, d, e, c, f
Running time?
O(1)
Array up
21
22MakeSet() Creating initial sets
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
a
g
h
i
b
e
f
c
d
define N 9 int upN / Make initial sets
/ void MakeSets() int i for (i0 iltN
i) upi -1 / end-for / /
end-MakeSets /
23Detailed Example
Initial Sets
a
g
h
i
b
e
f
c
d
Union(b, e)
24Detailed Example
a
d
a
c
g
h
i
d
f
b, e
Union(a, d)
25Detailed Example
a
b
c
g
h
i
f
d
b, e
a, d
Union(a, b)
26Detailed Example
a
g
h
i
c
f
c
g
h
i
f
d
b
e
a, d, b, e
Union(h, i)
27Detailed Example
a
g
h
c
f
c
g
f
d
b
i
e
a, d, b, e
h, i
Union(c, f)
28Detailed Example
a
g
h
c
g
d
b
i
f
c, f
e
h, i
a, d, b, e
Union(c, a)
Q Can we do a better job on this union for
faster finds in the future?
29Implementation of Find Union
define N 9 int upN / Joins two sets / int
Union(int x, int y) assert(upx lt 0)
assett(upy lt 0) upy x / end-Union /
Running time
O(1)
Height depends on previous unions Best Case 1-2,
1-4, 1-5, - O(1) Worst Case 2-1, 3-2, 4-3, -
O(N)
Q Can we do a better?
30Lets look back at our example
a
g
h
c
g
d
b
i
f
c, f
e
h, i
a, d, b, e
Union(c, a)
Q Can we do a better job on this union for
faster finds in the future? How can we make the
new tree shallow?
31Speeding up Find Union-by-Size
- Idea In Union, always make the root of the
larger tree the new root union-by-size
Initial Sets
32Trick for Storing Size Information
- Instead of storing -1 in root, store up-tree size
as negative value in root node
g
h, i
c, f
a, b, d, e
Array up
33Implementing Union-by-Size
define N 9 int upN / Joins two sets.
Assumes x y are roots / int Union(int x, int
y) assert(upx lt 0) assert(upy lt 0)
if (upx lt upy) // x is bigger. Join y to
x upx upy upy x else
// y is bigger. Join x to y upy
upx upx y / end-else / /
end-Union /
Running time?
O(1)
33
34Running Time for Find with Union-by-Size
- Finds are O(MaxHeight) for a forest of up-trees
containing N nodes - Theorem Number of nodes in an up-tree of height
h using union-by-size is 2h - Pick up-tree with MaxHeight
- Then, 2MaxHeight N
- MaxHeight log N
- Find takes O(log N)
- Proof by Induction
- Base case h 0, tree has 20 1 node
- Induction hypothesis Assume true for h lt h'
- Induction Step New tree of height h' was formed
via union of two trees of height h'-1 . - Each tree then has 2h-1 nodes by the
induction hypothesis - So, total nodes 2h-1 2h-1 2h
- Therefore, True for all h
35Union-by-Height
- Textbook describes alternative strategy of
Union-by-height - Keep track of height of each up-tree in the root
nodes - Union makes root of up-tree with greater height
the new root - Same results and similar implementation as
Union-by-Size - Find is O(log N) and Union is O(1)
36Can we make Find go faster?
- Can we make Find(g) do something so that future
Find(g) calls will run faster? - Right now, M Find(g) calls run in total O(MlogN)
time - Can we reduce this to O(M)?
h, i
c, f
a, b, d, e, g
- Idea Make Find have side-effects so that future
Finds will run faster.
37Introducing Path Compression
- Path Compression Point everything along path of
a Find to root - Reduces height of entire access path to 1
- Finds get faster!
Find(g)
h, i
c, f
a, b, d, e, g
38Another Path Compression Example
Find(g)
c, f
a, b, d, h, e, i, g
39Implementing Path Compression
- Path Compression Point everything along path of
a Find to root - Reduces height of entire access path to 1
- Finds get faster!
Running time
O(MaxHeight)
define N int upN / Returns setid of x
/ int Find(int x) if (upx lt 0) return
x int root Find(upx) upx root /
Point to the root /
return root / end-Find /
- But, what happens to the tree height over time?
- It gets smaller
- Whats the total running time if we do M Finds?
- Turns out this is equal to O(MInvAccerman(M, N))
40Running time of Find with Path Compression
- Whats the total running time if we do M Finds?
- Turns out this is equal to O(MInvAccerman(M, N))
- InverseAccerman(M, N) lt 4 for all practical
values of M and N - So, total running time of M Finds lt 4MO(M)
- Meaning that the amortized running time of Find
with path compression is O(1)
41Summary of Disjoint Set ADT
- The Disjoint Set ADT allows us to represent
objects that fall into different equivalence
classes or sets - Two main operations Union of two classes and
Find class name for a given element - Up-Tree data structure allows efficient array
implementation - Unions take O(1) worst case time, Finds can take
O(N) - Union-by-Size (or by-Height) reduces worst case
time for Find to O(log N) - If we use both Union-by-Size/Height Path
Compression - Any sequence of M Union/Find operations results
in O(1) amortized time per operation (for all
practical purposes)
42Applications of Disjoint Set ADT
- Disjoint sets can be used to represent
- Cities on a map (disjoint sets of connected
cities) - Electrical components on chip
- Computers connected in a network
- Groups of people related to each other by blood
- Textbook example Maze generation using
Unions/Finds - Start with walls everywhere and each cell in a
set by itself - Knock down walls randomly and Union cells that
become connected - Use Find to find out if two cells are already
connected - Terminate when starting and ending cell are in
same set i.e. connected (or when all cells are in
same set)
43Disjoint Set ADT Declaration Operations
class DisjointSet private int up // Up
links array int N // Number of
sets public DisjointSet(int n) // Creates N
sets DisjointSet()delete up int Find(int
x) void Union(int x, int y)
44Operations DisjointSet, Find
/ Create N sets / DisjointSetDisjointSet(int
n) int i N n up new intN
for (i0 iltN i) upi -1
//end-DisjointSet
/ Returns setid of x / int DisjointSetFind(i
nt x) if (upx lt 0) return x int root
Find(upx) upx root / Point to
the root / return root /
end-Find /
45Operations Union (by size)
/ Joins two sets. Assumes x y are roots / int
DisjointSetUnion(int x, int y) assert(upx
lt 0) assert(upy lt 0) if (upx lt
upy) // x is bigger. Join y to x upx
upy upy x else // y is
bigger. Join x to y upy upx upx
y / end-else / / end-Union /