Back to George One More Time - PowerPoint PPT Presentation

About This Presentation
Title:

Back to George One More Time

Description:

Towers of Hanoi (2N) Runtime. For N = 64. 2N = 264 = 18,450,000,000, ... 64-ring Tower of Hanoi problem would take 213 days. Why not an even faster computer! ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 105
Provided by: ccGa
Category:
Tags: back | george | hanoi | more | one | time

less

Transcript and Presenter's Notes

Title: Back to George One More Time


1
Back to George One More Time
  • Before they invented drawing boards, what did
    they go back to?
  • If all the world is a stage, where is the
    audience sitting?
  • If the 2 pencil is the most popular, why is it
    still 2?
  • If work is so terrific, how come they have to pay
    you to do it?
  • If you ate pasta and antipasto, would you still
    be hungry?
  • If you try to fail, and succeed, which have you
    done?
  • "People who think they know everything are a
    great annoyance to those of us who do. - Anon

2
O() Analysis Reasonable vs. UnreasonableAlgorit
hms Using O() Analysis in Design Concurrent
Systems Parallelism
Lecture 25
3
Recipe for Determining O()
  • Break algorithm down into known pieces
  • Well learn the Big-Os in this section
  • Identify relationships between pieces
  • Sequential is additive
  • Nested (loop / recursion) is multiplicative
  • Drop constants
  • Keep only dominant factor for each variable

4
Comparing Data Structures and Methods
LB
  • Data Structure Traverse Search Insert
  • Unsorted L List N N 1
  • Sorted L List N N N
  • Unsorted Array N N 1
  • Sorted Array N Log N N
  • Binary Tree N N 1
  • BST N N N
  • FB BST N Log N Log N

5
Reasonable vs. UnreasonableAlgorithms
6
Algorithmic Performance Thus Far
  • Some examples thus far
  • O(1) Insert to front of linked list
  • O(N) Simple/Linear Search
  • O(N Log N) MergeSort
  • O(N2) BubbleSort
  • But it could get worse
  • O(N5), O(N2000), etc.

7
An O(N5) Example
  • For N 256
  • N5 2565 1,100,000,000,000
  • If we had a computer that could execute a million
    instructions per second
  • 1,100,000 seconds 12.7 days to complete
  • But it could get worse

8
The Power of Exponents
  • A rich king and a wise peasant

9
The Wise Peasants Pay
  • Day(N) Pieces of Grain
  • 1 2
  • 2 4
  • 3 8
  • 4 16
  • ...

63 9,223,000,000,000,000,000 64
18,450,000,000,000,000,000
10
How Bad is 2N?
  • Imagine being able to grow a billion
    (1,000,000,000) pieces of grain a second
  • It would take
  • 585 years to grow enough grain just for the 64th
    day
  • Over a thousand years to fulfill the peasants
    request!

11
So the King cut off the peasants head.
LB
12
The Towers of Hanoi
A B
C
  • Goal Move stack of rings to another peg
  • Rule 1 May move only 1 ring at a time
  • Rule 2 May never have larger ring on top of
    smaller ring

13
The Towers of Hanoi
A B
C
14
The Towers of Hanoi
A B
C
15
The Towers of Hanoi
A B
C
16
The Towers of Hanoi
A B
C
17
The Towers of Hanoi
A B
C
18
The Towers of Hanoi
A B
C
19
The Towers of Hanoi
A B
C
20
The Towers of Hanoi
A B
C
21
The Towers of Hanoi
A B
C
22
The Towers of Hanoi
A B
C
23
The Towers of Hanoi
A B
C
24
The Towers of Hanoi
A B
C
25
The Towers of Hanoi
A B
C
26
The Towers of Hanoi
A B
C
27
The Towers of Hanoi
A B
C
28
The Towers of Hanoi
A B
C
29
Towers of Hanoi - Complexity
  • For 1 rings we have 1 operations.
  • For 2 rings we have 3 operations.
  • For 3 rings we have 7 operations.
  • For 4 rings we have 15 operations.
  • In general, the cost is 2N 1 O(2N)
  • Each time we increment N, we double the amount of
    work.
  • This grows incredibly fast!

30
Towers of Hanoi (2N) Runtime
  • For N 64
  • 2N 264 18,450,000,000,000,000,000
  • If we had a computer that could execute a million
    instructions per second
  • It would take 584,000 years to complete
  • But it could get worse

31
The Bounded Tile Problem
Match up the patterns in thetiles. Can it be
done, yes or no?
32
The Bounded Tile Problem
Matching tiles
33
Tiling a 5x5 Area
25 available tiles remaining
34
Tiling a 5x5 Area
24 available tiles remaining
35
Tiling a 5x5 Area
23 available tiles remaining
36
Tiling a 5x5 Area
22 available tiles remaining
37
Tiling a 5x5 Area
2 available tiles remaining
38
Analysis of the Bounded Tiling Problem
  • Tile a 5 by 5 area (N 25 tiles)
  • 1st location 25 choices
  • 2nd location 24 choices
  • And so on
  • Total number of arrangements
  • 25 24 23 22 21 .... 3 2 1
  • 25! (Factorial) 15,500,000,000,000,000,000,000,
    000
  • Bounded Tiling Problem is O(N!)

39
Tiling (N!) Runtime
  • For N 25
  • 25! 15,500,000,000,000,000,000,000,000
  • If we could place a million tiles per second
  • It would take 470 billion years to complete
  • Why not a faster computer?

40
A Faster Computer
  • If we had a computer that could execute a
    trillion instructions per second (a million times
    faster than our MIPS computer)
  • 5x5 tiling problem would take 470,000 years
  • 64-ring Tower of Hanoi problem would take 213
    days
  • Why not an even faster computer!

41
The Fastest Computer Possible?
  • What if
  • Instructions took ZERO time to execute
  • CPU registers could be loaded at the speed of
    light
  • These algorithms are still unreasonable!
  • The speed of light is only so fast!

42
Where Does this Leave Us?
  • Clearly algorithms have varying runtimes.
  • Wed like a way to categorize them
  • Reasonable, so it may be useful
  • Unreasonable, so why bother running

43
Performance Categories of Algorithms
  • Sub-linear O(Log N)
  • Linear O(N)
  • Nearly linear O(N Log N)
  • Quadratic O(N2)
  • Exponential O(2N)
  • O(N!)
  • O(NN)

Polynomial
44
Reasonable vs. Unreasonable
  • Reasonable algorithms have polynomial factors
  • O (Log N)
  • O (N)
  • O (NK) where K is a constant
  • Unreasonable algorithms have exponential factors
  • O (2N)
  • O (N!)
  • O (NN)

45
Reasonable vs. Unreasonable
  • Reasonable algorithms
  • May be usable depending upon the input size
  • Unreasonable algorithms
  • Are impractical and useful to theorists
  • Demonstrate need for approximate solutions
  • Remember were dealing with large N (input size)

46
Two Categories of Algorithms
Unreasonable
1035 1030 1025 1020 1015 trillion billion million
1000 100 10
NN
2N
N5
Runtime
Reasonable
N
Dont Care!
2 4 8 16 32 64 128 256 512 1024
Size of Input (N)
47
Summary
  • Reasonable algorithms feature polynomial factors
    in their O() and may be usable depending upon
    input size.
  • Unreasonable algorithms feature exponential
    factors in their O() and have no practical
    utility.

48
Questions?
49
Using O() Analysis in Design
50
Air Traffic Control
Conflict Alert
51
Problem Statement
  • What data structure should be used to store the
    aircraft records for this system?
  • Normal operations conducted are
  • Data Entry adding new aircraft entering the area
  • Radar Update input from the antenna
  • Coast global traversal to verify that all
    aircraft have been updated coast for 5 cycles,
    then drop
  • Query controller requesting data about a
    specific aircraft by location
  • Conflict Analysis make sure no two aircraft are
    too close together

52
Air Traffic Control System
  • Program Algorithm Freq
  • 1. Data Entry / Exit Insert 15
  • 2. Radar Data Update NSearch 12
  • 3. Coast / Drop Traverse 60
  • 4. Query Search 1
  • 5. Conflict Analysis TraverseSearch 12

53
Questions?
54
Concurrent Systems
55
Sequential Processing
  • All of the algorithms weve seen so far are
    sequential
  • They have one thread of execution
  • One step follows another in sequence
  • One processor is all that is needed to run the
    algorithm

56
A Non-sequential Example
  • Consider a house with a burglar alarm system.
  • The system continually monitors
  • The front door
  • The back door
  • The sliding glass door
  • The door to the deck
  • The kitchen windows
  • The living room windows
  • The bedroom windows
  • The burglar alarm is watching all of these at
    once (at the same time).

57
Another Non-sequential Example
  • Your car has an onboard digital dashboard that
    simultaneously
  • Calculates how fast youre going and displays it
    on the speedometer
  • Checks your oil level
  • Checks your fuel level and calculates
    consumption
  • Monitors the heat of the engine and turns on a
    light if it is too hot
  • Monitors your alternator to make sure it is
    charging your battery

58
Concurrent Systems
  • A system in which
  • Multiple tasks can be executed at the same time
  • The tasks may be duplicates of each other, or
    distinct tasks
  • The overall time to perform the series of tasks
    is reduced

59
Advantages of Concurrency
  • Concurrent processes can reduce duplication in
    code.
  • The overall runtime of the algorithm can be
    significantly reduced.
  • More real-world problems can be solved than with
    sequential algorithms alone.
  • Redundancy can make systems more reliable.

60
Disadvantages of Concurrency
  • Runtime is not always reduced, so careful
    planning is required
  • Concurrent algorithms can be more complex than
    sequential algorithms
  • Shared data can be corrupted
  • Communications between tasks is needed

61
Achieving Concurrency
  • Many computers today have more than one processor
    (multiprocessor machines)

62
Achieving Concurrency
  • Concurrency can also be achieved on a computer
    with only one processor
  • The computer juggles jobs, swapping its
    attention to each in turn
  • Time slicing allows many users to get CPU
    resources
  • Tasks may be suspended while they wait for
    something, such as device I/O

63
Concurrency vs. Parallelism
  • Concurrency is the execution of multiple tasks at
    the same time, regardless of the number of
    processors.
  • Parallelism is the execution of multiple
    processors on the same task.

64
Types of Concurrent Systems
  • Multiprogramming
  • Multiprocessing
  • Multitasking
  • Distributed Systems

65
Multiprogramming
  • Share a single CPU among many users or tasks.
  • May have a time-shared algorithm or a priority
    algorithm for determining which task to run next
  • Give the illusion of simultaneous processing
    through rapid swapping of tasks (interleaving).

66
Multiprogramming
Memory User 1 User 2
CPU
User1
User2
67
Multiprogramming
4
3
Tasks/Users
2
1
1
2
3
4
CPUs
68
Multiprocessing
  • Executes multiple tasks at the same time
  • Uses multiple processors to accomplish the tasks
  • Each processor may also timeshare among several
    tasks
  • Has a shared memory that is used by all the tasks

69
Multiprocessing
Memory User 1 Task1 User 1 Task2 User 2 Task1
User1
User2
70
Multiprocessing
Shared Memory
4
3
Tasks/Users
2
1
1
2
3
4
CPUs
71
Multitasking
  • A single user can have multiple tasks running at
    the same time.
  • Can be done with one or more processors.
  • Used to be rare and for only expensive
    multiprocessing systems, but now most modern
    operating systems can do it.

72
Multitasking
Memory User 1 Task1 User 1 Task2 User 1 Task3
User1
73
Multitasking
4
Single User
3
Tasks
2
1
1
2
3
4
CPUs
74
Distributed Systems
  • Multiple computers working together with no
    central program in charge.

ATM Buford
ATM Perimeter
ATM Student Ctr
ATM North Ave
75
Distributed Systems
  • Advantages
  • No bottlenecks from sharing processors
  • No central point of failure
  • Processing can be localized for efficiency
  • Disadvantages
  • Complexity
  • Communication overhead
  • Distributed control

76
Questions?
77
Parallelism
78
Parallelism
  • Using multiple processors to solve a single task.
  • Involves
  • Breaking the task into meaningful pieces
  • Doing the work on many processors
  • Coordinating and putting the pieces back together.

79
Parallelism
One of many possible...
80
Parallelism
4
3
Tasks
2
1
1
2
3
4
CPUs
81
Pipeline Processing
  • Repeating a sequence of operations or pieces of a
    task.
  • Allocating each piece to a separate processor and
    chaining them together produces a pipeline,
    completing tasks faster.

output
input
A
B
C
D
82
Example
  • Suppose you have a choice between a washer and a
    dryer each having a 30 minutes cycle or
  • A washer/dryer with a one hour cycle
  • The correct answer depends on how much work you
    have to do.

83
One Load
Transfer Overhead
wash
dry
combo
84
Three Loads
wash
dry
wash
dry
wash
dry
combo
combo
combo
85
Examples of Pipelined Tasks
  • Automobile manufacturing
  • Instruction processing within a computer

A
1
5
4
3
2
B
C
D
1
2
3
4
5
6
7
0
time
86
Task Queues
  • A supervisor processor maintains a queue of tasks
    to be performed in shared memory.
  • Each processor queries the queue, dequeues the
    next task and performs it.
  • Task execution may involve adding more tasks to
    the task queue.

87
Parallelizing Algorithms
  • How much gain can we get from parallelizing an
    algorithm?

88
Parallel Bubblesort
  • We can use N/2 processors to do all the
    comparisons at once, flopping the pair-wise
    comparisons.

93
87
74
65
57
45
33
27
93
87
74
65
57
45
33
27
93
87
74
65
57
45
33
27
89
Runtime of Parallel Bubblesort
93
87
74
65
57
45
33
27
3
93
87
74
65
57
45
33
27
4
93
87
74
65
57
45
33
27
5
93
87
74
65
57
45
33
27
6
93
87
74
65
57
45
33
27
7
93
87
74
65
57
45
33
27
8
90
Completion Time of Bubblesort
  • Sequential bubblesort finishes in N2 time.
  • Parallel bubblesort finishes in N time.

O(N2)
Bubble Sort
O(N)
parallel
91
Product Complexity
  • Got done in O(N) time, better than O(N2)
  • Each time chunk does O(N) work
  • There are N time chunks.
  • Thus, the amount of work is still O(N2)
  • Product complexity is the amount of work per
    time chunk multiplied by the number of time
    chunks the total work done.

92
Ceiling of Improvement
  • Parallelization can reduce time, but it cannot
    reduce work. The product complexity cannot
    change or improve.
  • How much improvement can parallelization provide?
  • Given an O(NLogN) algorithm and Log N
    processors, the algorithm will take at least O(?)
    time.
  • Given an O(N3) algorithm and N processors, the
    algorithm will take at least O(?) time.

O(N) time.
O(N2) time.
93
Number of Processors
  • Processors are limited by hardware.
  • Typically, the number of processors is a power of
    2
  • Usually The number of processors is a constant
    factor, 2K
  • Conceivably Networked computers joined as needed
    (ala Borg?).

94
Adding Processors
  • A program on one processor
  • Runs in X time
  • Adding another processor
  • Runs in no more than X/2 time
  • Realistically, it will run in X/2 ? time
    because of overhead
  • At some point, adding processors will not help
    and could degrade performance.

95
Overhead of Parallelization
  • Parallelization is not free.
  • Processors must be controlled and coordinated.
  • We need a way to govern which processor does what
    work this involves extra work.
  • Often the program must be written in a special
    programming language for parallel systems.
  • Often, a parallelized program for one machine
    (with, say, 2K processors) doesnt work on other
    machines (with, say, 2L processors).

96
What We Know about Tasks
  • Relatively isolated units of computation
  • Should be roughly equal in duration
  • Duration of the unit of work must be much greater
    than overhead time
  • Policy decisions and coordination required for
    shared data
  • Simpler algorithm are the easiest to parallelize

97
Questions?
98
More?
99
Matrix Multiplication
100
Inner Product Procedure
  • Procedure inner_prod(a, b, c isoftype in/out
    Matrix, i, j isoftype in Num)
  • // Compute inner product of ai and bj
  • Sum isoftype Num
  • k isoftype Num
  • Sum lt- 0
  • k lt- 1
  • loop
  • exitif(k gt n)
  • sum lt- sum aik bkj
  • k lt k 1
  • endloop
  • endprocedure // inner_prod

101
  • Matrix definesa Array1..N1..N of Num
  • N is // Declare constant defining size
  • // of arrays
  • Algorithm P_Demo
  • a, b, c isoftype Matrix Shared
  • server isoftype Num
  • Initialize(NUM_SERVERS)
  • // Input a and b here
  • // (code not shown)
  • i, j isoftype Num

102
  • i lt- 1
  • loop
  • exitif(i gt N)
  • server lt- (i NUM_SERVERS) DIV N
  • j lt- 1
  • loop
  • exitif(j gt N)
  • RThread(server, inner_prod(a, b, c, i, j ))
  • j lt- j 1
  • endloop
  • i lt- i 1
  • endloop
  • Parallel_Wait(NUM_SERVERS)
  • // Output c here
  • endalgorithm // P_Demo

103
Questions?
104
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com