Title: Huffman Codes and A Lot More Java
1Huffman Codes and(A Lot More) Java
- CS 2 Introduction to Programming Methods
- 30 January 2003
2Trees!
3Tree terms
- Node the basic unit in constructing a tree
- Children the nodes below and connected to a
given node. - Parent the node above and connected to a given
node - Root the only node with no parent, it is the
top of the tree - Leaf a childless node, i.e., the bottom most
nodes.
4Tree Examples
- Note how only the leaves in this example store
data. Depending on the application, this may be
appropriate.
5Tree Examples
- In this tree, Nodes have more than two children
and all the Nodes have data (color in this
example).
6Traversing a Tree
- Traversing a tree is fairly simple using
recursion. - traverse(TreeNode t, Operation o)
- o.operate(t.getRoot()) //do something to
self - if(t instanceof TreeLeaf)
- // No children to bother
- else
- // Get the children and do something to
them - Iterator kids t.children()
- while(kids.hasNext()) traverse(kids.next())
-
-
7Tree Traversal
- The nodes in this tree are numbered in the order
of operation for the preceding example of
Traversal
8The instanceof keyword
9instanceof usage
- So I used instanceof in the traversal example.
But what is it? - instanceof is used to determine if a given Object
is an instance of a particular class. For
instance, if t is a TreeLeaf, then (t instanceof
TreeLeaf) evaluates to true, otherwise it
evaluates to false. - Moreover, if t is a TreeLeaf, which is a subclass
of TreeNode, then (t instanceof TreeNode) also
will evaluate to true.
10Exceptions, quick and dirty
11Exceptions
- In Java, when something goes wrong, an exception
is thrown. - So far, if you saw an exception, it meant you did
something wrong. For instance, popping an object
off an empty stack. - There are many times when something will go wrong
through no fault of your own. When trying to do
file input and output (I/O), bad things can
happen and you need a graceful way to deal with
them. Exceptions are the answer.
12Try-catch blocks
- The construct for dealing with Exceptions is the
try catch block. Basically, it says try to do
something and if something goes wrong use the
code in the catch block to try and deal with the
problem. - Example
- try
- doACM95set(1) // total time to spend in
hours - catch(WishfulThinkingException e)
- System.err.println(Not gonna happen!)
-
13And nowthe quick and dirty way out
- If you dont want to deal with Exceptions right
now, all you need to do is add throws Exception
into all method declarations like so - public static void main(String args)
throws Exception - //ha ha I dont have to handle any
Exceptions - //I am invincible
-
14Worst coding habits ever
- You can do this for lab 4 to ease the learning
curve. There are a lot of new programming
concepts you need to understand for the lab and
you dont need to understand exceptions. - However, this is a terrible coding habit. I
cannot emphasize enough how bad it is to do this.
Once you learn the right way to handle
exceptions, never do this again. Ever. - Ever!
15File I/O
16File I/O intro
- Files are stored on the hard disk as a series of
bytes. Bytes are numbers from 0 to 255 (28-1)
that represent data. Plain text is usually
stored in ASCII, which maps the numbers 0-255 to
characters. - Java normally uses streams to do I/O. Basically
this means that you cannot go backwards while
doing I/O easily. If you want to go back and
change something, you have to start from the
beginning. There are ways around this, but you
wont need them for this lab. - I/O operations almost always throw IOExceptions.
- To use I/O in java you need to import java.io.
17Input
- In java, FileInputStream is a class that allows
you to read in a file. It has a lot of commands,
but the useful one is read() which returns the
next byte in the file or -1 if there are no bytes
left to read. - Usage example
- try
- FileInputStream fis new FileInputStream(file
name) - int c fis.read()
- while(c ! -1)
- // do something with c
- c fis.read()
-
- catch(IOException e)
- // deal with the exception
- System.err.println(IO Error!
e.getMessage()) -
18Output
- Output is done by FileOutputStream and its method
write(data). Look in the documentation to see
all the possible arguments for write, since there
are a bunch. - write(int b) writes the byte in b to the next
spot in the file. - Usage Example
- try
- FileOutputStream fos new FileOutputStream(fi
lename) - for(int i 0 i lt 256 i) write(i)
- catch (IOException e)
- System.err.println(IO Error! Oh no!)
-
19Huffman TreeHow do these structures relate to
trees?
20HuffmanTree
- HuffmanLeaf
- Acts like a leaf in a tree
- May or my not have parent
- Has no children
- HuffmanNode
- Acts like a non-leaf node in a tree
- May or may not have parent
- Can have 0, 1 or 2 children
21HuffmanTree
- Some Examples of Valid Huffman Trees
Leaf
Node
22HuffmanTree Methods of HuffmanLeaf
- HuffmanLeaf( int Value)
- HuffmanLeaf( int Value, int frequency)
- Value value leaf is representing
- Frequency how often value occurs (more on that
later) - getValue()
- setValue()
- toString()
- Returns a string of the form value, frequency
23HuffmanTree Methods of HuffmanNode
- HuffmanNode()
- HuffmanNode(int frequency)
- HuffmanNode(int Frequency, HuffmanTree left,
Huffman Tree right) - left and right are the children of the Huffman
Node
Left
Right
Left
Right
24HuffmanTree Methods of HuffmanNode
- HuffmanNode()
- HuffmanNode(int frequency)
- HuffmanNode(int Frequency, HuffmanTree left,
Huffman Tree right) - left and right are the children of the Huffman
Node - Nodes can also be children
Left
Right
Left
Right
25HuffmanTree Methods of HuffmanNode
- getLeft(), getRight()
- Returns nodes left and right child respectively
- setLeft(), setRight()
- Sets nodes left and right child respectively
- toString()
- Returns a string representation of the node
26CompressionOne reason we care about trees
27Compression
- Used EVERYWHERE
- Examples
- Music MP3s
- Images JPG, GIF
- Other ZIP, TAR
- Idea Compression takes a file and reduces the
number of bits it takes to express that
information. - One simple but effective compression algorithm is
Huffman Encoding - For more info visit http//www.data-compression.c
om/
28Huffman Encoding A Greedy Algorithm
- Huffman encoding is example of a greedy algorithm
- A greedy algorithm is an algorithm that always
makes the choice that looks best at the moment - In such algorithms the locally optimal solution
leads to the globally optimal solution. - There are many other greedy algorithms (CS 38)
29Huffman Encoding Resources
- Huffman encoding is a common algorithm and there
are many resources available online explaining
it, - http//www.cs.duke.edu/csed/poop/huff/info/
- If you have any questions about this or any other
part of your homework, come any of our office
hours or e-mail the TAs
30Huffman Encoding Overview
- Compression
- Reads a text file and creates a tree representing
a best possible encoding - Uses tree to convert the original file into a
compressed file. - Tree info is saved (we do that)
- As you can see compression requires reading over
the original file twice (we handle most of the
file reading)
31Huffman Encoding Creating the Tree
- Start with an array of 256 HuffmanLeafs
- there are 256 char in ASCII, converting to ints
simple - HuffmanLeaf Foo new HuffmanLeaf256
- Fooi.getValue I gt the ith char in ASCII
- All frequencies initially zero
- More on arrays in a bit
- Read in the file a character at a time
- Whenever encoder character I increment foois
frequency - Fooi.setFrequency(Fooi.getFrequency() 1)
- End up with an array of the frequencies of each
of the 256 chars
32Huffman Encoding Creating the Tree
- Building the Tree
- Select the two Trees (initially leafs) with the
lowest frequencies - We will call them L and R
- Create a new Node with L as its left child, R as
its right child and a frequency of L.frequency
R.frequency - Repeat until all the nodes are merged into one
tree
33Huffman Encoding Creating the Tree
C
E
H
I
A
5
8
2
7
3
34Huffman Encoding Creating the Tree
C
E
I
A
H
5
8
7
3
2
5
35Huffman Encoding Creating the Tree
E
I
A
H
8
7
3
2
C
5
5
10
36Huffman Encoding Creating the Tree
E
I
A
H
8
7
3
2
C
15
5
5
10
37Huffman Encoding Creating the Tree
A
H
3
2
C
E
I
5
8
7
5
15
10
25
38Huffman Encoding Creating the Tree
- Use the tree built using the Huffman algorithm to
get compression - Most frequently used characters have shortest
codeword lengths - Less common character have longer codeword
lengths - Get codewords by walking from root of tree to
the leaves representing the characters - No codeword is a prefix of any other codeword in
this algorithm
39Huffman Encoding Creating the Tree
A
H
E 01 I 00 C 10 A 111 H 110
3
2
C
E
I
1
0
5
8
7
5
1
0
0
1
15
10
Length of the codeword dependant of frequency of
the character.
0
1
25
40Huffman Encoding Overview
- Decompression
- Takes in information to rebuild tree used in
creating compressed file (we handle) - Converts the compressed file into original file
by walking down tree from root to the leaves
representing characters. - Outputs results
41Huffman Encoding Decoding from Tree
A
H
1111001
3
2
C
E
I
1
0
5
8
7
5
1
0
0
1
15
10
0
1
25
42Huffman Encoding Decoding from Tree
A
H
1111001
3
2
C
E
I
1
0
5
8
7
5
1
0
0
1
15
10
0
1
25
43Huffman Encoding Decoding from Tree
A
H
1111001
3
2
C
E
I
1
0
5
8
7
5
1
0
0
1
15
10
0
1
25
44Huffman Encoding Decoding from Tree
A
H
1111001 A
3
2
C
E
I
1
0
5
8
7
5
1
0
0
1
15
10
0
1
25
45Huffman Encoding Decoding from Tree
A
H
1111001 A
3
2
C
E
I
1
0
5
8
7
5
1
0
0
1
15
10
0
1
25
46Huffman Encoding Decoding from Tree
A
H
1111001 AC
3
2
C
E
I
1
0
5
8
7
5
1
0
0
1
15
10
0
1
25
47Huffman Encoding Decoding from Tree
A
H
1111001 AC
3
2
C
E
I
1
0
5
8
7
5
1
0
0
1
15
10
0
1
25
48Huffman Encoding Decoding from Tree
A
H
1111001 ACE
3
2
C
E
I
1
0
5
8
7
5
1
0
0
1
15
10
0
1
25
49Huffman Encoding Decoding from Tree
A
H
1111001 ACE
3
2
C
E
I
1
0
5
8
7
5
1
0
0
1
15
10
0
1
25
50Huffman Encoding Resources
- Once more, this is linked to homework 2
- http//www.cs.duke.edu/csed/poop/huff/info/
- If you have any questions about this or any other
part of your homework, come any of our office
hours or e-mail the TAs
51Arrays
52Arrays
- A way of organizing multiple objects of the same
type that are logically connected - int charFreq new charFreqsize // size is
an int - Can access an element of an array quickly and
easily - charFreq200 is the same as calling element the
element in 200th slot of the array - Arrays run from 0 to size -1 , so can access
charFreqi for 0 lt I lt size - Say size 256, trying to access charFreq256 is
VERY BAD - You will get lots of problems in your code from
this if you are not careful
53Arrays
- If you have an array of an objects you can access
them by selecting an element of the array and
then calling the method you want - Ex. Want the frequency of element 127 in an array
of TreeLeafs called Leaves - Leaves127.getFrequency() //returns the value
looking for - Once again, be very careful about going out of
bounds in looking at array elements
54Arguments
55So thats what it does
- Ever wonder why you always have to write
main(String args) instead of just main()?
Its because args is an array containing all the
arguments passed to your function from the
command line. If no arguments are passed args
just has length 0.
56Making good use of your new knowledge
- public class MidgetSearch
- public static void main(String args)
- for(int i 0 i lt args.length i)
- System.out.println(findMidget(argsi))
-
-
-
- gt java MidgetSearch Bridget Larry Dan
- Bridget the midget is putting on a show in
Ricketts - Larry the midget is trapped in a well in Rhode
Island - Dan isnt really a midget and therefore cannot be
located