Title: Back to George One More Time
1Back to George One More Time
- Before they invented drawing boards, what did
they go back to? - If all the world is a stage, where is the
audience sitting? - If the 2 pencil is the most popular, why is it
still 2? - If work is so terrific, how come they have to pay
you to do it? - If you ate pasta and antipasto, would you still
be hungry? - If you try to fail, and succeed, which have you
done? - "People who think they know everything are a
great annoyance to those of us who do. - Anon
2O() Analysis Reasonable vs. UnreasonableAlgorit
hms Using O() Analysis in Design Concurrent
Systems Parallelism
Lecture 25
3Recipe for Determining O()
- Break algorithm down into known pieces
- Well learn the Big-Os in this section
- Identify relationships between pieces
- Sequential is additive
- Nested (loop / recursion) is multiplicative
- Drop constants
- Keep only dominant factor for each variable
4Comparing Data Structures and Methods
LB
- Data Structure Traverse Search Insert
- Unsorted L List N N 1
- Sorted L List N N N
- Unsorted Array N N 1
- Sorted Array N Log N N
- Binary Tree N N 1
- BST N N N
- FB BST N Log N Log N
5Reasonable vs. UnreasonableAlgorithms
6Algorithmic Performance Thus Far
- Some examples thus far
- O(1) Insert to front of linked list
- O(N) Simple/Linear Search
- O(N Log N) MergeSort
- O(N2) BubbleSort
- But it could get worse
- O(N5), O(N2000), etc.
7An O(N5) Example
- For N 256
- N5 2565 1,100,000,000,000
- If we had a computer that could execute a million
instructions per second - 1,100,000 seconds 12.7 days to complete
- But it could get worse
8The Power of Exponents
- A rich king and a wise peasant
9The Wise Peasants Pay
- Day(N) Pieces of Grain
- 1 2
- 2 4
- 3 8
- 4 16
- ...
63 9,223,000,000,000,000,000 64
18,450,000,000,000,000,000
10How Bad is 2N?
- Imagine being able to grow a billion
(1,000,000,000) pieces of grain a second - It would take
- 585 years to grow enough grain just for the 64th
day - Over a thousand years to fulfill the peasants
request!
11So the King cut off the peasants head.
LB
12The Towers of Hanoi
A B
C
- Goal Move stack of rings to another peg
- Rule 1 May move only 1 ring at a time
- Rule 2 May never have larger ring on top of
smaller ring
13The Towers of Hanoi
A B
C
14The Towers of Hanoi
A B
C
15The Towers of Hanoi
A B
C
16The Towers of Hanoi
A B
C
17The Towers of Hanoi
A B
C
18The Towers of Hanoi
A B
C
19The Towers of Hanoi
A B
C
20The Towers of Hanoi
A B
C
21The Towers of Hanoi
A B
C
22The Towers of Hanoi
A B
C
23The Towers of Hanoi
A B
C
24The Towers of Hanoi
A B
C
25The Towers of Hanoi
A B
C
26The Towers of Hanoi
A B
C
27The Towers of Hanoi
A B
C
28The Towers of Hanoi
A B
C
29Towers of Hanoi - Complexity
- For 1 rings we have 1 operations.
- For 2 rings we have 3 operations.
- For 3 rings we have 7 operations.
- For 4 rings we have 15 operations.
- In general, the cost is 2N 1 O(2N)
- Each time we increment N, we double the amount of
work. - This grows incredibly fast!
30Towers of Hanoi (2N) Runtime
- For N 64
- 2N 264 18,450,000,000,000,000,000
- If we had a computer that could execute a million
instructions per second - It would take 584,000 years to complete
- But it could get worse
31The Bounded Tile Problem
Match up the patterns in thetiles. Can it be
done, yes or no?
32The Bounded Tile Problem
Matching tiles
33Tiling a 5x5 Area
25 available tiles remaining
34Tiling a 5x5 Area
24 available tiles remaining
35Tiling a 5x5 Area
23 available tiles remaining
36Tiling a 5x5 Area
22 available tiles remaining
37Tiling a 5x5 Area
2 available tiles remaining
38Analysis of the Bounded Tiling Problem
- Tile a 5 by 5 area (N 25 tiles)
- 1st location 25 choices
- 2nd location 24 choices
- And so on
- Total number of arrangements
- 25 24 23 22 21 .... 3 2 1
- 25! (Factorial) 15,500,000,000,000,000,000,000,
000 - Bounded Tiling Problem is O(N!)
39Tiling (N!) Runtime
- For N 25
- 25! 15,500,000,000,000,000,000,000,000
- If we could place a million tiles per second
- It would take 470 billion years to complete
- Why not a faster computer?
40A Faster Computer
- If we had a computer that could execute a
trillion instructions per second (a million times
faster than our MIPS computer) - 5x5 tiling problem would take 470,000 years
- 64-ring Tower of Hanoi problem would take 213
days - Why not an even faster computer!
41 The Fastest Computer Possible?
- What if
- Instructions took ZERO time to execute
- CPU registers could be loaded at the speed of
light - These algorithms are still unreasonable!
- The speed of light is only so fast!
42Where Does this Leave Us?
- Clearly algorithms have varying runtimes.
- Wed like a way to categorize them
- Reasonable, so it may be useful
- Unreasonable, so why bother running
43Performance Categories of Algorithms
- Sub-linear O(Log N)
- Linear O(N)
- Nearly linear O(N Log N)
- Quadratic O(N2)
- Exponential O(2N)
- O(N!)
- O(NN)
Polynomial
44Reasonable vs. Unreasonable
- Reasonable algorithms have polynomial factors
- O (Log N)
- O (N)
- O (NK) where K is a constant
- Unreasonable algorithms have exponential factors
- O (2N)
- O (N!)
- O (NN)
45Reasonable vs. Unreasonable
- Reasonable algorithms
- May be usable depending upon the input size
- Unreasonable algorithms
- Are impractical and useful to theorists
- Demonstrate need for approximate solutions
- Remember were dealing with large N (input size)
46Two Categories of Algorithms
Unreasonable
1035 1030 1025 1020 1015 trillion billion million
1000 100 10
NN
2N
N5
Runtime
Reasonable
N
Dont Care!
2 4 8 16 32 64 128 256 512 1024
Size of Input (N)
47Summary
- Reasonable algorithms feature polynomial factors
in their O() and may be usable depending upon
input size. - Unreasonable algorithms feature exponential
factors in their O() and have no practical
utility.
48Questions?
49Using O() Analysis in Design
50Air Traffic Control
Conflict Alert
51Problem Statement
- What data structure should be used to store the
aircraft records for this system? - Normal operations conducted are
- Data Entry adding new aircraft entering the area
- Radar Update input from the antenna
- Coast global traversal to verify that all
aircraft have been updated coast for 5 cycles,
then drop - Query controller requesting data about a
specific aircraft by location - Conflict Analysis make sure no two aircraft are
too close together
52Air Traffic Control System
- Program Algorithm Freq
- 1. Data Entry / Exit Insert 15
- 2. Radar Data Update NSearch 12
- 3. Coast / Drop Traverse 60
- 4. Query Search 1
- 5. Conflict Analysis TraverseSearch 12
53Questions?
54Concurrent Systems
55Sequential Processing
- All of the algorithms weve seen so far are
sequential - They have one thread of execution
- One step follows another in sequence
- One processor is all that is needed to run the
algorithm
56A Non-sequential Example
- Consider a house with a burglar alarm system.
- The system continually monitors
- The front door
- The back door
- The sliding glass door
- The door to the deck
- The kitchen windows
- The living room windows
- The bedroom windows
- The burglar alarm is watching all of these at
once (at the same time).
57Another Non-sequential Example
- Your car has an onboard digital dashboard that
simultaneously - Calculates how fast youre going and displays it
on the speedometer - Checks your oil level
- Checks your fuel level and calculates
consumption - Monitors the heat of the engine and turns on a
light if it is too hot - Monitors your alternator to make sure it is
charging your battery
58Concurrent Systems
- A system in which
- Multiple tasks can be executed at the same time
- The tasks may be duplicates of each other, or
distinct tasks - The overall time to perform the series of tasks
is reduced
59Advantages of Concurrency
- Concurrent processes can reduce duplication in
code. - The overall runtime of the algorithm can be
significantly reduced. - More real-world problems can be solved than with
sequential algorithms alone. - Redundancy can make systems more reliable.
60Disadvantages of Concurrency
- Runtime is not always reduced, so careful
planning is required - Concurrent algorithms can be more complex than
sequential algorithms - Shared data can be corrupted
- Communications between tasks is needed
61Achieving Concurrency
- Many computers today have more than one processor
(multiprocessor machines)
62Achieving Concurrency
- Concurrency can also be achieved on a computer
with only one processor - The computer juggles jobs, swapping its
attention to each in turn - Time slicing allows many users to get CPU
resources - Tasks may be suspended while they wait for
something, such as device I/O
63Concurrency vs. Parallelism
- Concurrency is the execution of multiple tasks at
the same time, regardless of the number of
processors. - Parallelism is the execution of multiple
processors on the same task.
64Types of Concurrent Systems
- Multiprogramming
- Multiprocessing
- Multitasking
- Distributed Systems
65Multiprogramming
- Share a single CPU among many users or tasks.
- May have a time-shared algorithm or a priority
algorithm for determining which task to run next - Give the illusion of simultaneous processing
through rapid swapping of tasks (interleaving).
66Multiprogramming
Memory User 1 User 2
CPU
User1
User2
67Multiprogramming
4
3
Tasks/Users
2
1
1
2
3
4
CPUs
68Multiprocessing
- Executes multiple tasks at the same time
- Uses multiple processors to accomplish the tasks
- Each processor may also timeshare among several
tasks - Has a shared memory that is used by all the tasks
69Multiprocessing
Memory User 1 Task1 User 1 Task2 User 2 Task1
User1
User2
70Multiprocessing
Shared Memory
4
3
Tasks/Users
2
1
1
2
3
4
CPUs
71Multitasking
- A single user can have multiple tasks running at
the same time. - Can be done with one or more processors.
- Used to be rare and for only expensive
multiprocessing systems, but now most modern
operating systems can do it.
72Multitasking
Memory User 1 Task1 User 1 Task2 User 1 Task3
User1
73Multitasking
4
Single User
3
Tasks
2
1
1
2
3
4
CPUs
74Distributed Systems
- Multiple computers working together with no
central program in charge.
ATM Buford
ATM Perimeter
ATM Student Ctr
ATM North Ave
75Distributed Systems
- Advantages
- No bottlenecks from sharing processors
- No central point of failure
- Processing can be localized for efficiency
- Disadvantages
- Complexity
- Communication overhead
- Distributed control
76Questions?
77Parallelism
78Parallelism
- Using multiple processors to solve a single task.
- Involves
- Breaking the task into meaningful pieces
- Doing the work on many processors
- Coordinating and putting the pieces back together.
79Parallelism
One of many possible...
80Parallelism
4
3
Tasks
2
1
1
2
3
4
CPUs
81Pipeline Processing
- Repeating a sequence of operations or pieces of a
task. - Allocating each piece to a separate processor and
chaining them together produces a pipeline,
completing tasks faster.
output
input
A
B
C
D
82Example
- Suppose you have a choice between a washer and a
dryer each having a 30 minutes cycle or - A washer/dryer with a one hour cycle
- The correct answer depends on how much work you
have to do.
83One Load
Transfer Overhead
wash
dry
combo
84Three Loads
wash
dry
wash
dry
wash
dry
combo
combo
combo
85Examples of Pipelined Tasks
- Automobile manufacturing
- Instruction processing within a computer
A
1
5
4
3
2
B
C
D
1
2
3
4
5
6
7
0
time
86Task Queues
- A supervisor processor maintains a queue of tasks
to be performed in shared memory. - Each processor queries the queue, dequeues the
next task and performs it. - Task execution may involve adding more tasks to
the task queue.
87Parallelizing Algorithms
- How much gain can we get from parallelizing an
algorithm?
88Parallel Bubblesort
- We can use N/2 processors to do all the
comparisons at once, flopping the pair-wise
comparisons.
93
87
74
65
57
45
33
27
93
87
74
65
57
45
33
27
93
87
74
65
57
45
33
27
89Runtime of Parallel Bubblesort
93
87
74
65
57
45
33
27
3
93
87
74
65
57
45
33
27
4
93
87
74
65
57
45
33
27
5
93
87
74
65
57
45
33
27
6
93
87
74
65
57
45
33
27
7
93
87
74
65
57
45
33
27
8
90Completion Time of Bubblesort
- Sequential bubblesort finishes in N2 time.
- Parallel bubblesort finishes in N time.
O(N2)
Bubble Sort
O(N)
parallel
91Product Complexity
- Got done in O(N) time, better than O(N2)
- Each time chunk does O(N) work
- There are N time chunks.
- Thus, the amount of work is still O(N2)
- Product complexity is the amount of work per
time chunk multiplied by the number of time
chunks the total work done.
92Ceiling of Improvement
- Parallelization can reduce time, but it cannot
reduce work. The product complexity cannot
change or improve. - How much improvement can parallelization provide?
- Given an O(NLogN) algorithm and Log N
processors, the algorithm will take at least O(?)
time. - Given an O(N3) algorithm and N processors, the
algorithm will take at least O(?) time.
O(N) time.
O(N2) time.
93Number of Processors
- Processors are limited by hardware.
- Typically, the number of processors is a power of
2 - Usually The number of processors is a constant
factor, 2K - Conceivably Networked computers joined as needed
(ala Borg?).
94Adding Processors
- A program on one processor
- Runs in X time
- Adding another processor
- Runs in no more than X/2 time
- Realistically, it will run in X/2 ? time
because of overhead - At some point, adding processors will not help
and could degrade performance.
95Overhead of Parallelization
- Parallelization is not free.
- Processors must be controlled and coordinated.
- We need a way to govern which processor does what
work this involves extra work. - Often the program must be written in a special
programming language for parallel systems. - Often, a parallelized program for one machine
(with, say, 2K processors) doesnt work on other
machines (with, say, 2L processors).
96What We Know about Tasks
- Relatively isolated units of computation
- Should be roughly equal in duration
- Duration of the unit of work must be much greater
than overhead time - Policy decisions and coordination required for
shared data - Simpler algorithm are the easiest to parallelize
97Questions?
98More?
99Matrix Multiplication
100Inner Product Procedure
- Procedure inner_prod(a, b, c isoftype in/out
Matrix, i, j isoftype in Num) - // Compute inner product of ai and bj
- Sum isoftype Num
- k isoftype Num
- Sum lt- 0
- k lt- 1
- loop
- exitif(k gt n)
- sum lt- sum aik bkj
- k lt k 1
- endloop
- endprocedure // inner_prod
101- Matrix definesa Array1..N1..N of Num
- N is // Declare constant defining size
- // of arrays
- Algorithm P_Demo
- a, b, c isoftype Matrix Shared
- server isoftype Num
- Initialize(NUM_SERVERS)
- // Input a and b here
- // (code not shown)
- i, j isoftype Num
-
102- i lt- 1
- loop
- exitif(i gt N)
- server lt- (i NUM_SERVERS) DIV N
- j lt- 1
- loop
- exitif(j gt N)
- RThread(server, inner_prod(a, b, c, i, j ))
- j lt- j 1
- endloop
- i lt- i 1
- endloop
- Parallel_Wait(NUM_SERVERS)
- // Output c here
- endalgorithm // P_Demo
103Questions?
104(No Transcript)