Title: Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng
1Trees (Ch. 9.2) Longin Jan LateckiTemple
University based on slides bySimon Langley and
Shang-Hua Teng
2Basic Data Structures - Trees
- Informal a tree is a structure that looks like a
real tree (up-side-down) - Formal a tree is a connected graph with no
cycles.
3Trees - Terminology
size7
root
subtree
x
value
b
e
m
height3
c
d
a
nodes
leaf
Every node must have its value(s) Non-leaf node
has subtree(s) Non-root node has a single parent
node A parent may have 0 or more children
4Types of Tree
Binary Tree
Each node has at most 2 sub-trees
m-ary Trees
Each node has at most m sub-trees
5Binary Search Trees
- A binary search tree
- is a binary tree.
- if a node has value N, all values in its left
sub-tree are less than or equal to N, and all
values in its right sub-tree are greater than N.
6inorder (t) if t ! NIL inorder
(leftt) write (labelt) inorder (rightt)
Inorder Traversal is an algorithm which visits
each node of a tree after its left subtree and
before its right subtree. It shows the values
stored in a binary search tree in order.
7This is NOT a binary search tree
8Try it!!
- Build binary search trees for the following input
sequences - 7, 4, 2, 6, 1, 3, 5, 7
- 7, 1, 2, 3, 4, 5, 6, 7
- 7, 4, 2, 1, 7, 3, 6, 5
- 1, 2, 3, 4, 5, 6, 7, 8
- 8, 7, 6, 5, 4, 3, 2, 1
9Searching a binary search tree
- search(t, s)
- If(s label(t))
- return t
- If(t is leaf) return null
- If(s lt label(t))
- search(ts left tree, s)
- else
- search(ts right tree, s)
Time per level
O(1)
O(1)
h
Total O(h)
10Searching a binary search tree
- search( t, s )
- while(t ! null)
- if(s label(t)) return t
- if(s lt label(t)
- t leftSubTree(t)
- else
- t rightSubTree
-
- return null
Time per level
O(1)
O(1)
h
Total O(h)
11- Heres another function that does the same (we
search for label s) -
- TreeSearch(t, s)
- while (t ! NULL and s ! labelt)
- if (s lt labelt)
- t leftt
- else
- t rightt
- return t
12Insertion in a binary search treewe need to
search before we insert
Insert 6
Insert 11
6
11
6
11
6
6
11
always insert to a leaf
?
Time complexity
O(height_of_tree)
n size of the tree
O(log n) if it is balanced
13Insertion
- insertInOrder(t, s)
- if(t is an empty tree) // insert here
- return a new tree node with value s
- else if( s lt label(t))
- t.left insertInOrder(t.left, s )
- else
- t.right insertInOrder(t.right, s)
- return t
14Comparison Insertion in an ordered list
insertInOrder(list, s) loop1 search from
beginning of list, look for an item gt s
loop2 shift remaining list to its right, start
from the end of list insert s
Insert 6
6
6
6
6
6
9
8
2
3
4
5
7
6
7
8
9
Time complexity?
O(n) n size of the list
15Data Compression
- Suppose we have 1000000000 (1G) character data
file that we wish to include in an email. - Suppose file only contains 26 letters a,,z.
- Suppose each letter a in a,,z occurs with
frequency fa. - Suppose we encode each letter by a binary code
- If we use a fixed length code, we need 5 bits for
each character - The resulting message length is
- Can we do better?
16Data Compression A Smaller Example
- Suppose the file only has 6 letters a,b,c,d,e,f
with frequencies - Fixed length 3G3000000000 bits
- Variable length
Fixed length
Variable length
17How to decode?
- At first it is not obvious how decoding will
happen, but this is possible if we use prefix
codes
18Prefix Codes
- No encoding of a character can be the prefix of
the longer encoding of another character - we could not encode t as 01 and x as 01101 since
01 is a prefix of 01101 - By using a binary tree representation we generate
prefix codes with letters as leaves
19Prefix codes allow easy decoding
Decode 11111011100
s 1011100
sa 11100
san 0
sane
20Prefix codes
- A message can be decoded uniquely.
- Following the tree until it reaches to a leaf,
and then repeat! - Draw a few more tree and produce the codes!!!
21Some Properties
- Prefix codes allow easy decoding
- An optimal code must be a full binary tree (a
tree where every internal node has two children) - For C leaves there are C-1 internal nodes
- The number of bits to encode a file is
where f(c) is the freq of c, lengthT(c) is the
tree depth of c, which corresponds to the code
length of c
22Optimal Prefix Coding Problem
- Input Given a set of n letters (c1,, cn) with
frequencies (f1,, fn). - Construct a full binary tree T to define a prefix
code that minimizes the average code length
23Greedy Algorithms
- Many optimization problems can be solved using a
greedy approach - The basic principle is that local optimal
decisions may be used to build an optimal
solution - But the greedy approach may not always lead to an
optimal solution overall for all problems - The key is knowing which problems will work with
this approach and which will not - We study
- The problem of generating Huffman codes
24Greedy algorithms
- A greedy algorithm always makes the choice that
looks best at the moment - My everyday examples
- Driving in Los Angeles, NY, or Boston for that
matter - Playing cards
- Invest on stocks
- Choose a university
- The hope a locally optimal choice will lead to a
globally optimal solution - For some problems, it works
- Greedy algorithms tend to be easier to code
25David Huffmans idea
- A Term paper at MIT
- Build the tree (code) bottom-up in a greedy
fashion
Each tree has a weight in its root and symbols as
its leaves. We start with a forest of one
vertex trees representing the input symbols. We
recursively merge two trees whose sum of weights
is minimal until we have only one tree.
26Building the Encoding Tree
27Building the Encoding Tree
28Building the Encoding Tree
Building the Encoding Tree
29Building the Encoding Tree
Building the Encoding Tree
30Building the Encoding Tree
Building the Encoding Tree