Title: Multithreaded and Distributed Programming -- Classes of Problems
1Multithreaded and Distributed Programming --
Classes of Problems
- ECEN5053 Software Engineering of
- Distributed Systems
- University of Colorado
Foundations of Multithreaded, Parallel, and
Distributed Programming, Gregory R. Andrews,
Addison-Wesley, 2000
2The Essence of Multiple Threads -- review
- Two or more processes that work together to
perform a task - Each process is a sequential program
- One thread of control per process
- Communicate using shared variables
- Need to synchronize with each other, 1 of 2 ways
- Mutual exclusion
- Condition synchronization
3Opportunities Challenges
- What kinds of processes to use
- How many parts or copies
- How they should interact
- Key to developing a correct program is to ensure
the process interaction is properly synchronized
4Focus in this course
- Imperative programs
- Programmer has to specify the actions of each
process and how they communicate and synchronize.
(Java, Ada) - Declarative programs (not our focus)
- Written in languages designed for the purpose of
making synchronization and/or concurrency
implicit - Require machine to support the languages, for
example, massively parallel machines. - Asynchronous process execution
- Shared memory, distributed memory, networks of
workstations (message-passing)
5Multiprocessing monkey wrench
- The solutions we addressed last semester presumed
a single CPU and therefore the concurrent
processes share coherent memory - A multiprocessor environment with shared memory
introduces cache and memory consistency problems
and overhead to manage it. - A distributed-memory multiprocessor/multicomputer/
network environment has additional issues of
latency, bandwidth, administration, security, etc.
6Recall from multiprogram systems
- A process is a sequential program that has its
own thread of control when executed - A concurrent program contains multiple processes
so every concurrent program has multiple threads,
one for each process. - Multithreaded usually means a program contains
more processes than there are processors to
execute them - A multithreaded software system manages multiple
independent activities
7Why write as multithreaded?
- To be cool ? (wrong reason)
- Sometimes, it is easier to organize the code and
data as a collection of processes than as a
single huge sequential program - Each process can be scheduled and executed
independently - Other applications can continue to execute in
the background
8Many applications, 5 basic paradigms
- Iterative parallelism
- Recursive parallelism
- Producers and consumers (pipelines)
- Clients and servers
- Interacting peers
- Each of these can be accomplished in a
distributed environment. Some can be used in a
single CPU environment.
9Iterative parallelism
- Example?
- Several, often identical processes
- Each contains one or more loops
- Therefore each process is iterative
- They work together to solve a single program
- Communicate and synchronize using shared
variables - Independent computations disjoint write sets
10Recursive parallelism
- One or more independent recursive procedures
- Recursion is the dual of iteration
- Procedure calls are independent each works on
different parts of the shared data - Often used in imperative languages for
- Divide and conquer algorithms
- Backtracking algorithms (e.g. tree-traversal)
- Used to solve combinatorial problems such as
sorting, scheduling, and game playing - If too many recursive procedures, we prune.
11Producers and consumers
- One-way communication between processes
- Often organized into a pipeline through which
info flows - Each process is a filter that consumes the output
of its predecessor and produces output for its
successor - That is, a producer-process computes and outputs
a stream of results - Sometimes implemented with a shared bounded
buffer as the pipe, e.g. Unix stdin and stdout - Synchronization primitives flags, semaphores,
monitors
12Clients Servers
- Producer/consumer -- one-way flow of information
- independent processes with own rates of progress
- Client/server relationship is most common pattern
- Client process requests a service waits for
reply - Server repeatedly waits for a request then acts
upon it and sends a reply. - Two-way flow of information
13Distributed procedures and calls
- Client and server relationship is the concurrent
programming analog of the relationship between
the caller of a subroutine and the subroutine
itself. - Like a subroutine that can be called from many
places, the server has many clients. - Each client request must be handled independently
- Multiple requests might be handled concurrently
14Common example
- Common example of client/server interactions in
operating systems, OO systems, networks,
databases, and others -- reading and writing a
data file. - Assume file server module provides 2 ops read
and write client process calls one or other. - Single CPU or shared-memory system
- File server implemented as set of subroutines and
data structures that represent files - Interaction between client process and a file
typically implemented by subroutine calls
15Client/Server example
- If the file is shared
- Probably must be written to by at most one client
process at a time - Can safely be read concurrently by multiple
clients - Example of what is called the readers/writers
problem
16Readers/Writers -- many facets
- Has a classic solution using mutexes (in chapter
2 last semester) when viewed as a mutual
exclusion problem - Can also be solved with
- a condition synchronization solution
- different scheduling policies
- Distributed system solutions include
- with encapsulated database
- with replicated files
- just remote procedure calls local
synchronization - just rendezvous
17Consider a query on the WWW
- A user opens a new URL within a Web browser
- The Web browser is a client process that executes
on a users machine. - The URL indirectly specifies another machine on
which the Web page resides. - The Web page itself is accessed by a server
process that executes on the other machine. - May already exist may be created
- Reads the page specified by the URL
- Returns it to the clients machine
- Addl server processes may be visited or created
at intermediate machines along the way
18Clients/Servers -- on same or separate
- Clients are processes regardless of machines
- Server
- On a shared-memory machine is a collection of
subroutines - With a single CPU, programmed using
- mutual exclusion to protect critical sections
- condition synchronization to ensure subroutines
are executed in appropriate orders - Distributed-memory or network -- processes
executing on different machine than clients - Often multithreaded with one thread per client
19Communication in client/server app
- Shared memory --
- servers as subroutines
- use semaphores or monitors for synchronization
- Distributed --
- servers as processes
- communicate with clients using
- message passing
- remote procedure call (remote method inv.)
- rendezvous
20Interacting peers
- Occurs in distributed programs, not single CPU
- Several processes that accomplish a task
- executing the copies of same code (hence,
peers) - exchanging messages
- example distributed matrix multiplication
- Used to implement
- Distributed parallel programs including
distributed versions of iterative parallelism - Decentralized decision making
21Among the 5 paradigms are certain characteristics
common to distributed environments.Distributed
memoryProperties of parallel applicationsConcurr
ent computation
22Distributed memory implications
- Each processor can access only its own local
memory - Program cannot use global variables
- Every variable must be local to some process or
procedure and can be accessed only by that
process or procedure - Processes have to use message passing to
communicate with each other
23Example of a parallel application
- Remember concurrent matrix multiplication in a
shared memory environment -- last semester? - Sequential solution first
- for i 0 to n-1
- for j 0 to n-1
- compute inner product of ai, and b,
j - ci, j 0.0
- for k 0 to n-1
- ci, j ci, j ai, k bk, j
-
24Properties of parallel applications
- Two operations can be executed in parallel if
they are independent. - Read set contains variables it reads but does not
alter - Write set contains variables it alters (and
possibly also reads) - Two operations are independent if the write set
of each is disjoint from both the read and write
sets of the other.
25Concurrent computation
- Computing rows of result-matrix in parallel.
- cobegin i 0 to n-1
- for j 0 to n-1
- ci, j 0.0
- for k 0 to n-1
- ci, j ci, j ai, k bk, j
-
- coend
26Differences sequential vs. concurrent
- Syntactic
- cobegin is used in place of for in the outermost
loop - Semantic
- cobegin specifies that its body should be
executed concurrently -- at least conceptually --
for each value of index i.
27- Previous example implemented matrix
multiplication using shared variables - Now -- two ways using message passing as means of
communication - 1. Coordinator process array of independent
worker processes - 2. Workers are peer processes that interact by
means of a circular pipeline
28Worker 0
Worker n-1
Results
data
data
Coordinator
...
Worker n-1
Worker 0
Peers
29- Assume n processors for simplicity
- Use an array of n worker processes, one worker on
each processor, each worker computes one row of
the result matrix - process workeri 0 to n-1
- double an row i of matrix a
- double bn,n all of matrix b
- double cn row i of matrix c
- receive initial values for vector a and
matrix b - for j 0 to n-1
- cj 0.0
- for k 0 to n-1
- cj cj ak bk, j send result
c to coord
30- Aside -- if not standalone
- The source matrices might be produced by a prior
computation and the result matrix might be input
to a subsequent computation. - Example of distributed pipeline.
31Role of coordinator
- Initiates the computation and gathers and prints
the results. - First sends each worker the appropriate row of a
and all of b. - Waits to receive row of c from every worker.
32- process coordinator
- source matrix a, b, and c are declared
- initialize a and b
- for i 0 to n-1
- send row i of a to worker i
- send all of b to worker i
-
- for i 0 to n-1
- receive row i of c from worker i
- print results which are now in matrix c
33Message passing primitives
- Send packages up a message and transmits it to
another process - Receive waits for a message from another process
and stores it in local variables.
34Peer approach
- Each worker has one row of a is to compute one
row of c - Each worker has only one column of b at a time
instead of the entire matrix - Worker i has column i of matrix b.
- With this much source data, worker i can compute
only the result for ci, i. - For worker i to compute all of row i of matrix c,
it must acquire all columns of matrix b. - We circulate the columns of b among the worker
processes via the circular pipeline - Each worker executes a series of rounds in which
it sends its column of b to the next worker and
receives a different column of b from the
previous worker
35- See handout
- Each worker executes the same algorithm
- Communicates with other workers in order to
compute its part of the desired result. - In this case, each worker communicates with just
two neighbors - In other cases of interacting peers, each worker
communicates with all the others.
36Worker algorithm
Process worker I 0 to n-1 double an
row i of matrix a double bn one column
of matrix b double cn row i of matrix
c double sum 0.0 storage for inner
products int nextCol i next column of results
37Worker algorithm (cont.)
receive row i of matrix a and column i of matrix
b compute ci,i ai, x b,i for k 0
to n-1 sum sum ak bk cnextCol
sum circulate columns and compute rest of
ci, for j 1 to n-1 send my column of
b to next worker receive a new column of b
from previous worker
38Worker algorithm (cont. 2)
- sum 0.0
- for k 0 to n-1
- sum sum ak bk
- if (nextCol 0)
- nextCol n-1
- else
- nextCol nextCol 1
- cnextCol sum
-
- send result vector c to coordinator process
-
39Comparisons
- In first program, values of matrix b are
replicated - In second, each has one row of a and one column
of b at any point in time - - First requires more memory but executes faster.
- This is a classic time/space tradeoff.
40Summary
- Concurrent programming paradigms in a
shared-memory environment - Iterative parallelism
- Recursive parallelism
- Producers and consumers
- Concurrent programming paradigms in a
distributed-memory environment - Client/server
- Interacting peers
41Shared-memory programming
42Shared-Variable Programming
- Frowned on in sequential programs, although
convenient (global variables) - Absolutely necessary in concurrent programs
- Must communicate to work together
43Need to communicate
- Communication fosters need for synchronization
- Mutual exclusion need to not access shared data
at the same time - Condition synchronization one needs to wait for
another - Communicate in distributed environment via
messages, remote procedure call, or rendezvous
44Some terms
- State values of the program variables at a
point in time, both explicit and implicit. Each
process in a program executes independently and,
as it executes, examines and alters the program
state. - Atomic actions -- A process executes sequential
statements. Each statement is implemented at the
machine level by one or more atomic actions that
indivisibly examine or change program state. - Concurrent program execution interleaves
sequences of atomic actions. A history is a
trace of a particular interleaving.
45Terms -- continued
- The next atomic action in any ONE of the
processes could be the next one in a history. So
there are many ways actions can be interleaved
and conditional statements allow even this to
vary. - The role of synchronization is to constrain the
possible histories to those that are desirable. - Mutual exclusion combines atomic actions into
sequences of actions called critical sections
where the entire section appears to be atomic.
46Terms continued further
- Property of a program is an attribute that is
true of every possible history. - Safety never enters a bad state
- Liveness the program eventually enters a good
state
47How can we verify?
- How do we demonstrate a program satisfies a
property? - A dynamic execution of a test considers just one
possible history - Limited number of tests unlikely to demonstrate
the absence of bad histories - Operational reasoning -- exhaustive case
analysis - Assertional reasoning abstract analysis
- Atomic actions are predicate transformers
48Assertional Reasoning
- Use assertions to characterize sets of states
- Allows a compact representation of states and
their transformations - More on this later in the course
49Warning
- We must be wary of dynamic testing alone
- it can reveal only the presence of errors, not
their absence. - Concurrent and distributed programs are difficult
to test debug - Difficult (impossible) to stop all processes at
once in order to examine their state! - Each execution in general will produce a
different history
50Why synchronize?
- If processes do not interact, all interleavings
are acceptable. - If processes do interact, only some interleavings
are acceptable. - Role of synchronization prevent unacceptable
interleavings - Combine fine-grain atomic actions into
coarse-grained composite actions (we call this
....what?) - Delay process execution until program state
satisfies some predicate
51- Unconditional atomic action
- does not contain a delay condition
- can execute immediately as long as it executes
atomically (not interleaved) - examples
- individual machine instructions
- expressions we place in angle brackets
- await statements where guard condition is
constant true or is omitted
52- Conditional atomic action - await statement with
a guard condition - If condition is false in a given process, it can
only become true by the action of other
processes. - How long will the process wait if it has a
conditional atomic action?
53How to implement synchronization
- To implement mutual exclusion
- Implement atomic actions in software using locks
to protect critical sections - Needed in most concurrent programs
- To implement conditional synchronization
- Implement synchronization point that all
processes must reach before any process is
allowed to proceed -- barrier - Needed in many parallel programs -- why?
54Desirable Traits and Bad States
- Mutual exclusion -- at most one process at a time
is executing its critical section - its bad state is one in which two processes are
in their critical section - Absence of Deadlock (livelock) -- If 2 or more
processes are trying to enter their critical
sections, at least one will succeed. - its bad state is one in which all the processes
are waiting to enter but none is able to do so - two more on next slide
55Desirable Traits and Bad states (cont.)
- Absence of Unnecessary Delay -- If a process is
trying to enter its c.s. and the other processes
are executing their noncritical sections or have
terminated, the first process is not prevented
from entering its c.s. - Bad state is one in which the one process that
wants to enter cannot do so, even though no other
process is in the c.s. - Eventual entry -- process that is attempting to
enter its c.s. will eventually succeed. - liveness property, depends on scheduling policy
56Logical property of mutual exclusion
- When process1 is in its c.s., set property1 true.
- Similarly, for process2 where property2 is true.
- Bad state is where property1 and property2 are
both true at the same time - Therefore
- want every state to satisfy the negation of the
bad state -- - mutex NOT(property1 AND property2)
- Needs to be a global invariant
- True in the initial state and after each event
that affects property1 or property2 - ltawait (!property2) property1 truegt
57Coarse-grain solution
bool property1 false property2
false COMMENT mutex NOT(property1 AND
property2) -- global invariant
- process process1
- while (true)
- ltawait (!property2) property1 truegt
- critical section
- property1 false
- noncritical section
-
- process process2
- while (true)
- ltawait (!property1) property2 truegt
- critical section
- property2 false
- noncritical section
-
58Does it avoid the problems?
- Deadlock if each process were blocked in its
entry protocol, then both property1 and property2
would have to be true. Both are false at this
point in the code. - Unnecessary delay One process blocks only if
the other one is not in its c.s. - Liveness -- see next slide
59Liveness guaranteed?
- Liveness property -- process trying to enter its
critical section eventually is able to do so - If process1 trying to enter but cannot, then
property2 is true - therefore process2 is in its c.s. which
eventually exits making property2 false allows
process1s guard to become true - If process1 still not allowed entry, its because
the scheduler is unfair or because process2 again
gains entry -- (happens infinitely often?) - Strongly-fair scheduler required, not likely.
60Three spin lock solutions
- A spin lock solution uses busy-waiting
- Ensure mutual exclusion, are deadlock free, and
avoid unnecessary delay - Require a fairly strong scheduler to ensure
eventual entry - Do not control the order in which delayed
processes enter their c.s.s when gt 2 try - Busy-waiting solutions were tolerated on a single
CPU when the critical section was bounded. - What about busy-waiting solutions in a
distributed environment? Is there such a thing?
61Distributed-memory programming
62Distributed-memory architecture
- Synchronization constructs we examined last
semester were based on reading and writing shared
variables. - In distributed architectures, processors
- have their own private memory
- interact using a communication network
- without a shared memory, must exchange messages
63Necessary first stepsto write programs for a
dist.-memory arch.
- 1. Define the interfaces with the communication
network - If they were read and write ops like those that
operate on shared variables, - Programs would have to employ busy-waiting
synchronization. Why? - Better to define special network operations that
include synchronization -- message passing
primitives
64Necessary first stepsto write programs for a
dist.-memory arch. cont.
- 2. Message-passing is extending semaphores to
convey data as well as to provide synchronization - 3. Processes share channels - a communication
path
65Characteristics
- Distributed program may be
- distributed across the processors of a
distributed-memory architecture - can be run on a shared-memory multiprocessor
- (Just like a concurrent program can be run on a
single, multiplexed processor.) - Channels are the only items that processes share
in a distributed program - Each variable is local to one process
66Implications of no shared variables
- Variables are never subject to concurrent access
- No special mechanism for mutual exclusion is
required - Processes must communicate in order to interact
- Main concern of distributed programming is
synchronizing interprocess communication - How this is done depends on the pattern of
process interaction
67Patterns of process interaction
- Vary in way channels are named and used
- Vary in way communication is synchronized
- Well look at asynchronous and synchronous
message passing, remote procedure calls, and
rendezvous. - Equivalent a program written using one set of
primitives can be rewritten using any of the
others - However message passing is best for programming
producers and consumers and interacting peers - RPC and rendezvous best for programming clients
and servers
68How related
Busy waiting
Semaphores
Message passing
Monitors
Rendezvous
RPC
69Match Exampleswith Paradigms and Process
Interaction categories
- ATM
- Web-based travel site
- Stock transaction processing system
- Search service