Title: Attributes of Algorithms
1Attributes of Algorithms
- Or how I gave up working for a living
2Correctness
- Must meet the formal definition
- It must work in a finite amount of time
- Wrong results can cause havoc.
- If built without understanding whole problem, you
can end up with partial solution. - Correct, but incomplete.
- Incorrect because problem not understood
3Limits to Accuracy
- What is last digit of Pi?
- Cant be found
- Whats real world limit well accept 5 digits,
10 digits, 30 digits? - Difference between theoretical CS (no limit to
accuracy) and applied CS (limits exist) - Most algorithms are real-world intended to work
in real computers.
4Theory vs. Practice
- Railroads face problem all the time
- In 1860s, Canada wanted a railroad from
East-to-West - Theory says lay down track
- Applied says engines cant pull useful loads
unless track grade is lt 4
40 m
1 km
5Applied CS
- People aspect important
- People use the programs
- Dont write for single use then throw away
- Many different inputs to same kind of problem
(Tin Can algorithm) - Problem changes always.
- Needs change
- Expectations change
- Program Maintenance important
- Maintainer often not person who wrote the program
- Algorithm must be easy to understand clear
6Elegant Solutions
- Style
- Elegant Solution vs. BFAI
- (Brute Force And Ignorance)
- I. Ask for n
- II. Set Sum to 0
- III. Set X to 1
- IV. While X lt n Do
- A. Add X to Sum
- B. Add 1 to X
- V. Print value of Sum
- VI. Stop
7Gauss Solution
- If you add pairs together from opposite ends you
get the same value - n 1
- (n-1) 2
- (n-2) 3
- Etc
- Sum n/2 (n 1)
- Elegant much faster in most cases!
8More Real World CS
- Best if elegant Easy to Understand
- Limits on space in computer
- Memory size (RAM) limits largest program
- Hard Disk limits amount of data that can be
stored - Speed of computer limits calculations per second
- Efficient algorithms minimize space used (RAM
Hard Disk) and time used. - Often not possible to do both
9Why Not Wait?
- If algorithm too slow to run today, wait until
tomorrows computers? - Some truth
- More complex problems now than in past need
better computers - Still need efficient algorithms
10But What is Efficient?
- Measuring run time? Storage space?
- Do we measure on supercomputer or palmtop? With
a large amount of data or just a little? - Testing of program with real data and a real
example (New York telephone book and Jim Smith,
for example) is called a benchmark. - Rate one computer against another on same
problem. - Rate one algorithm against another on the same
computer using the same problem - Cannot rate different computers running different
algorithms
11Real Definition for Efficiency
- How much work the computer has to do.
- Usually in terms of fundamental work the program
is doing. - Efficiency is often the number of steps each
algorithm requires to complete - Not the time the algorithm takes on a particular
computer. That depends on speed of computer and
numerous other factors - Used to compare two algorithms doing the same
task with exactly the same data.
12Choice of Algorithms
- Often in real world, you gather great piles of
data. Because its generated by electronics, some
of it is garbled or damaged and needs removed.
What is considered garbage depends on the task
you are doing. - Suppose we have a task which generates zero as
the garbage value. - Before we process anything, well need to remove
the zeros.
13The Data
Length
5
8
0
3
2
9
0
8
5
7
10
10
9
8
7
6
5
4
3
2
1
- Well need a value to tell us how many entries
are usable in the list. Lets call this variable
length. - Initially, length would be 10
- We want to go through each item and discard the
zeros.
14Shuffle Left Algorithm
- Lets use our fingers to help solve the problem.
- Our left index finger will point to value were
currently looking at - When we find a zero, well copy all the values
after the one were pointing to down one slot - Of course, well have to reduce length by one,
too.
15Shuffle Left Algorithm
Length
5
8
0
3
2
9
0
8
5
7
10
10
9
8
7
6
5
4
3
2
1
- We move our finger along, one cell at a time,
until we reach Cell 3. This is the first zero.
16Shuffle Left Algorithm
Length
5
8
3
3
2
9
0
8
5
7
10
10
9
8
7
6
5
4
3
2
1
- Starting with the next cell and proceeding until
we get to the last cell, we copy the contents of
each cell to the previous one.
17Shuffle Left Algorithm
Length
5
8
3
2
9
0
5
7
8
7
9
10
9
8
7
6
5
4
3
2
1
- The extra 7 on the end doesnt belong any more so
we have to reduce length by one (to 9)
18Shuffle Left Algorithm
Length
5
8
3
2
9
0
5
7
8
7
9
10
9
8
7
6
5
4
3
2
1
- We can now scan our finger along until we find
the next 0 (at Cell 6). When we shuffle
everything down, we end up with another 7 and a
length of 8
19Shuffle Left Algorithm (Done!)
Length
5
8
3
2
9
5
8
7
7
7
8
10
9
8
7
6
5
4
3
2
1
- When we scan the rest of the list, we find no
more zeros and were done. - BTW, we dont need to scan past our last real
piece of data. - To write this as an algorithm, were going to
need a variable to track our cell (left) and
another to track the cell were copying (copy)
20Shuffle Down (Pseudo-Code)
- I. Get number of data items and store in n.
- II. Get the n data items
- III. Set length to n
- IV. Set left to 1
- V. While left length Do
- A. If item in Cell at position left is not zero
Then - 1. Add one to left
- Else
- 1. Set right to left 1
- 2. While right length Do
- a. Copy item in Cell at position right
into Cell at - position right-1
- b. Add one to right
- 3. Subtract one from length
- VI. Stop
21Copy Over Algorithm
Length
5
8
0
3
2
9
0
8
5
7
0
10
9
8
7
6
5
4
3
2
1
10
9
8
7
6
5
4
3
2
1
- An alternative is to get a clean piece of paper
and write onto it only those pieces of data which
arent zeros.
22Copy Over Algorithm
Length
5
8
0
3
2
9
0
8
5
7
0
10
9
8
7
6
5
4
3
2
1
10
9
8
7
6
5
4
3
2
1
- Once again well use our left hand to mark the
value were looking at.
23Copy Over Algorithm
Length
5
8
0
3
2
9
0
8
5
7
1
10
9
8
7
6
5
4
3
2
1
left
5
10
9
8
7
6
5
4
3
2
1
length
- Whenever a cell is good data, well add one to
length and copy the cell were looking at into
the new list a position length. - In either case, well go on to the next cell.
24Copy Over Algorithm
Length
5
8
0
3
2
9
0
8
5
7
2
10
9
8
7
6
5
4
3
2
1
left
5
8
10
9
8
7
6
5
4
3
2
1
length
- Whenever a cell is a zero, well leave length
alone and forget about copying the value - We still want to move left on to the next cell
25Copy Over Algorithm
- I. Get number of data items and store in n.
- II. Get the n data items
- III. Set left to 1
- IV. Set length to 0
- V. While left n Do
- A. If the Cell at position left in the old list
ltgt 0 Then - 1. Add one to length
- 2. Copy the Cell at position left in the old
list to the Cell at position - length in the new list.
- B. Add one to left.
26Converging Fingers
- Suppose we have two fingers one moving from left
to right and a second one moving from right to
left. - Whenever we encounter a zero with the left hand,
we copy the value in the right hand cell to
replace it, move the right finger one space to
the left and decrement our length. - Were done when our hands reach the same cell.
27Converging Fingers
Length
10
5
8
0
3
2
9
0
8
5
7
10
9
8
7
6
5
4
3
2
1
- The right hand until we find a non-zero. Each
time we move the right hand, we decrement the
length. - Now we move the left hand until we find a zero
and can copy the right value over the zero
28Converging Fingers
Length
9
5
8
7
3
2
9
0
8
5
7
10
9
8
7
6
5
4
3
2
1
- We repeat the process again (left gets to cell 7)
29Converging Fingers
Length
8
5
8
7
3
2
9
8
8
5
7
10
9
8
7
6
5
4
3
2
1
30Converging Fingers Algorithm
- I. Get number of data items and store in n.
- II. Get the n data items
- III. Set left to 1
- IV. Set length to n
- V. Set right to n
- VI. While Value in cell at position right 0 and
right gt 0 Do - A. Decrement right
- B. Decrement length
- VI. While left lt right Do
- A. If value in Cell at position left 0 Then
- 1. Copy the contents of Cell at position
right into cell at position left - 2. Decrement length
- 3. Decrement right
- VII. If value in Cell at position left is zero
Then - A. Decrement length
- VIII. Stop
31So which is Better?
- If the number of copies were our measure
(metric), the Converging Pointers algorithm would
be best - Converging points doesnt need any extra space
either. - From both a space and a time efficiency measure
the Converging Pointers algorithm is more
efficient. - But is it easier to understand?
32Measuring Efficiency
- The study of the efficiency of various algorithms
is called the Analysis of Algorithms.
33Some Notation
- When we deal with a list of items (say names or
phone numbers or whatever), we give it a name.
For example, our list of names might be called N
and its associated list of Telephone numbers
would be called T - To relate one name and one phone number, we
assume they are in the same slot in each of our
lists - So the first name in N has as its phone number
the value in the first location in T - These are called associated lists.
34Individual locations
- Rather than writing out the whole description
(such as the first location in N), we use a
shorthand
N
3
The list
The location
This is the third location in the name list (N)
35Index Values
- Sometimes we use a variable to tell us which
location we are working with
N
i
The list
The location
To know which slot we were dealing with, we would
need to know what value is in the variable i.
Thus if i held 5, we would be working with N5
the fifth location in N
36Sequential Search
- I. Get values for Names (N1..n) and Telephone
numbers (T1..n) - II. Set x to 1
- III. Set found to false
- IV. While found is false and x ? n Do
- A. If Nx Name Then
- 1. Set found to true
- 2. Say Telephone Number is Tx
- B. Add one to x.
- V. If found is false Then
- A. Say Cant find that name
- VI. Stop
37Lets Consider This
- Suppose the name we want is the very last name.
- Wed have to go through every name in the list.
- There are n names.
- The worst case has search look at each of the
items - Of course, it could be the first one
- One compare only
- Simple average would lead you to expect (1n)/2
compares per search. - Youll do this in more detail in the labs
38What Does It Mean?
n
Expected Compares
10
5.5
100
50
.5
68,000 (Sh
erwood
Pk)
34,000
.5
600,000 (Edm
onton
)
300,000
.5
10 milliom (Tokyo)
5,000,000.5
6 billion (Earth)
3,000,000,000.5
39Just Suppose ...
- Suppose we could do one million comparisons per
second. - Very fast for a database
- How long would it take to look up a phone number
in Sherwood Park? - 34,000 compares / 1 million 0.034 sec
- How about Tokyo?
- 5 million compares / 1 million 5 seconds
- The Earth?
- 3 billion compares / 1 million 3000 s (50 min)
40Measures of Efficiency
- Take a large scale view (large amounts of data
not small) - In our last case, we had n/2 ½ as our equation.
- After a while the ½ didnt matter when n got big
enough. - We can discard all of the values like ½ because
it just doesnt matter when n is big enough. - Only the multiplier (the ½ in front of the n)
matters. This is the slope of the line (c in the
equation)
T c n
0
1
2
41The Effect of C
- Bigger c means more work per data item
- If c were twice as big, the program would take
twice as long. - Shape of graph is the same though (line)
- Programs like this are said to be linear run time
programs
C 2
C 1
n
42Its the Shape That Counts
- The factor c is important but for large enough
amounts of data its not terribly important. - When Im talking billions of compares, whether I
multiply by ½ or not starts to make little
difference. Its way too long any way. - Order of magnitude is all we are really
interested in. We throw away the constant c,
too. - ? is a Greek letter that we use to tell people
that we are dealing with average expected run
times. - This algorithm is said to be ?(n)
- Read as Order n
43Lets Try Another One
- Suppose we have a telephone company that counts
the number of calls which originate in one
district and are directed to second district. - Of course, the two districts can be the same
(local call) - At month end, theyll want to display all the
numbers. What would the algorithm look like?
44Calling Results Algorithm
- I. For each Originating District (1..n) Do
- A. For each Target District (1..n) Do
- 1. Report the number of calls from the
- Originating District to the Target
District
Keep your notes here for a couple of slides.
Youll need to see the steps in the algorithm to
follow whats coming.
45Notice This!
- Suppose we can report each district in some
constant amount of time. Lets say it takes c
seconds. - Thats our step 1 In the algorithm.
- Each time we handle an originating district, we
have to go through each receiving district and
report it separately. - Because there are n receiving districts and each
receiving district takes c time to report, we
need c n time to handle each originating
district. - This is our inner loop. Step A in the algorithm
46Notice This!
- But we have n originating districts! The amount
of time to handle the whole algorithm will be n
Originating Districts x (c n) to handle each
Originating District. - This is the outermost loop (Step I in the
algorithm) - The algorithms efficiency is n (c n) c n2
- Just as the constant c didnt matter with linear
problems, it doesnt matter here either. This
algorithm is said to be ?(n2) - Read as Order n Squared
47?(n2)?
- What does ?(n2) mean?
- Suppose we have 1000 districts. Wed expect our
run time to be c n2 (whatever c is). - Suppose we double the number of districts so we
now have 2n districts. Wherever we had n in our
equation, we put in 2n. - Wed end up with a c (2 n)2
- This would be c 22 n2 c 4 n2
- Thus to double the data multiplies by 4 the run
time. - What would happen if we had three times the
number of districts?
48Whats It Mean?
?(n2)
?(n)
49The Effect of C
For a sufficiently large enough n, there will
come a point where the ?(n2) algorithm will
always take more time than the ?(n) algorithm
regardless of the value for c for the ?(n2)
algorithm. For small problem sizes though c can
be important and an easy to build ?(n2) algorithm
may be the best choice even though its not as
efficient as a complex ?(n).
50A Point of Order
- Why doesnt it take forever for you to search the
telephone book? - Alternately, how do you search the telephone
book? - What can you count on in the real telephone book?
51Another Search
- Suppose we could arrange our names in the
telephone book in alphabetical order. - Like they are in the real telephone book.
- Can this organization help us?
52Binary Search
- Suppose we use two fingers
- Left hand will point to the lowest name in the
group we are searching - Right hand will point to the highest name in the
group we are searching. - Initially, left will be at the first name and
right the last name.
53Binary Search
left
Adolf
- Suppose we split the difference between our two
fingers - We could compare against the middle name
- If we didnt find the name we were hunting for we
could move the left or right pointer to get rid
of the group of names that could never contain
the name we were searching for.
Ben
Cheryl
Dan
Edith
middle
Freda
Gary
Harry
Isolde
James
right
54Binary Search
left
Adolf
- Suppose we were hunting for Gary
- Left points to Cell 1
- Right points to Cell 10
- Middle (Left Right) / 2
- Points to Cell 5 (We throw away the ½)
- Cell 5 (Edith) is smaller than Gary
- Everything from Adolf to Edith can be eliminated
from the next search - Set left to middle 1
Ben
Cheryl
Dan
Edith
middle
Freda
Gary
Harry
Isolde
James
right
55Binary Search
Adolf
- This time around, the middle is (6 10)/2 8
- Cell 8 (Harry) is too big.
- We know that everything from Harry to the end of
the list (James) wont contain Gary. - Set right to middle - 1
Ben
Cheryl
Dan
Edith
Freda
left
Gary
Harry
middle
Isolde
James
right
56Binary Search
Adolf
- Middle this time is (6 7)/2 6 (same as left)
- Freda is too small, so we set left to middle 1
(7)
Ben
Cheryl
Dan
Edith
Freda
left
Gary
right
Harry
middle
Isolde
James
57Binary Search
Adolf
- Middle this time is (7 7)/2 7 (same as both
left and right) - Now we match! So were done.
- Notice that we throw away about half of the names
each time we do a comparison. - We stop when we get to one name
Ben
Cheryl
Dan
Edith
left
Freda
Gary
middle
Harry
right
Isolde
James
58Warning! Heavy Math Ahead!
- The next piece involves some use of logarithms
and powers. - You wont have to duplicate this on an exam.
- You will have to be able to work with the final
results, though.
59Binary Search
n
1
Each time we do a comparison, we discard half of
our available data. We multiply n (the original
amount of data) by ½. This keeps going until we
get only one location. At that point, weve found
our target or determined that our target doesnt
exist.
60Binary Search
- Suppose we did k comparisons, we could rewrite
the messy sequence of ½ s as - k is what were interested in. Its the number
of times we go through our loop.
61Binary Search
- Heres where we use the logarithms to solve our
equation. - If youve never worked with logarithms, dont
worry. On an exam, I let you use a calculator.
k ln2 (n)
Logarithm Base
62Log2(n) on a Calculator
- Most calculators dont have log2(n). They do
have natural log though - Its written as ln x on most calculators
- Enter your value for n and press ln x
- Press the divide key
- Enter 2 and press ln x
- Press the equals key
63Whats It Mean?
?(n2)
?(n)
?(ln2 n)
64What Does It Mean?
Q
Q
n
(n)
(ln
n)
2
10
5
3
100
50
7
68,000 (SP)
34,000
16
600,000 (Ed)
300,000
19
10 mil (To)
5,000,000
23
6 bill (Ea)
3 billion
32
65Just Suppose ...
- Suppose we could do one million comparisons per
second. - Very fast for a database
- How long would it take to look up a phone number
in Sherwood Park? - 2 compares / 1 million 2 microseconds (? s)
- How about Tokyo?
- 23 compares / 1 million 23 ? s
- The Earth?
- 31 compares / 1 million 31 ? s
66Algorithm Efficiency
- Which do you think the phone companies use
(Linear Search or Binary Search)? And Why?
67One Last Point
- Suppose I have n items in a binary search. We
know it takes log2(n) to find the item. - What happens if we have 2n items?
- Try it with 1024 items and 2048 items
- Try it with 32768 and 65536 items
- What do you notice?
68One Last Point
- Using logarithms (again)
- log (AB) log(A) log(B)
- log2(2 n) log2(2) log2(n)
- log2(2) is always 1 (try it)
- log2(2 n) 1 log2(n)
- Doubling the number of items to search adds only
one new comparison! - Thats why theoretical CS people are always
hunting for good ?(log2 n) algorithms