Loeng 2 - PowerPoint PPT Presentation

1 / 60
About This Presentation
Title:

Loeng 2

Description:

Loeng 2. Teise loengu teema: p him isted j tkub. Meenutus: esimese loengu teemad olid: ... Processes and programs and communication types. Execution order ... – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0
Slides: 61
Provided by: tt895
Category:
Tags: loeng | olid

less

Transcript and Presenter's Notes

Title: Loeng 2


1
Loeng 2
  • Teise loengu teema põhimõisted jätkub
  • Meenutus esimese loengu teemad olid
  • Sissejuhatus kursusesse
  • Lühiülevaade problemaatikast
  • Algus põhimõistetega fine and coarse grained
    parallelism jne

2
Teise loengu teemad
  • Lihtsamad asjad
  • Arhitektuurid (sisuliselt esimese loengu osade
    kordamine)
  • Client-server interaction
  • Transmission modes
  • Metrics
  • Keerukamad asjad
  • Processes and programs and communication types
  • Execution order
  • Program properties safety, liveness, fairness
  • Mutual exclusion (transactions)
  • Virtual time
  • Data races
  • Memory models

3
Client/Server (1-1)
4
Client/Server (1-N)
5
Example Web proxy server
6
Client-Server interaction (IV)
7
Peer-to-Peer Coordination
8
Mobile Code Example Applet
9
Client-Server interaction (I)
  • Remote procedure call

10
Client-Server interaction (II)
  • Multi-tier architectures

11
Client-Server interaction (III)
  • Asynchronous remote procedure call

12
Transmission modes
  • Simplex ühel kanalil liiklus ainult ühes suunas
  • (arvuti-gtmonitor)
  • Half-duplex kanalil võib liiklus olla
    kahesuunaline, aga mitte korraga, vaid vahel ühes
    suunas, vahel teises (politseiraadio)
  • Duplex kanalil liiklus korraga mõlemas suunas
    (telefon)
  • Frequency-division
  • Time-division
  • Synchronous channel divided into time frames.
    Each frame has at least as many time slots as
    logical I/O lines
  • Asynchronous n lines, m slots per frame. M is
    based on statistical analysis

13
Metrics
  • Bandwidth (Mbps / Mhz olenevalt kontekstist)
  • Latency (time to take a message from A to B.
    Sometimes round-trip (A-B-A))
  • Propagation Transmit Queue

14
Basic Paradigms
  • process a unit of sequential instruction
    execution
  • program a collection of processes
  • Process communication, two different ways to go
  • Shared Memory in the language level we find
  • Shared variables
  • Semaphores for synchronization
  • Mutual exclusion, Critical Code, Monitors/Locks
  • Message Passing
  • Local variables for each process
  • Send/receive parameters and data
  • Remote Procedure Call

15
Reality is Different from Paradigm
  • In shared memory reading and writing is
    non-atomic because of queues and caching effects.
  • Message passing is by way of point to point
    jumping and packetization, no direct connection.
  • OS should present to the user one of the simpler
    models. User may assume everything works as in
    the spec.
  • More often than not implementation is
    buggy, or exposes details of a native view
    different from the spec.
  • Sometimes model is being complicated to
    enhance performance and reduce communication
    relaxed consistency.

16
Common Types of Parallel Systems
Communication Efficiency (bandwidth
latency)
  • Multi-threading on a uni-processor (your home
    PC)
  • Multi-threading on a multi-processor (SMP)
  • Tightly-coupled parallel computer
  • (Compaqs Proliant, SGIs Origin 2000,
  • IBMs MP/2, Crays T3D)
  • Distributed system (cluster)
  • Internet computing (peer-to-peer)
  • Traditionally 12 are programmable using shared
    memory, 34 are programmable using message
    passing, in 5 peer processes communicate with
    central control only.
  • However things change! Most importantly recent
    systems in 3 move towards presenting a shared
    memory interface to a physically distributed
    system. Is this an indication for the future?

Scalability, Level of Parallelism
17
Execution Order
  • Process execution is a-synchronic, no global bip,
    no global clock. Each process has a different
    execution speed, which may change over time. For
    an observer, on the time axis, instruction
    execution is ordered in execution order. Any
    order is legal. (Sometimes different processes
    may observe different global orders, TBD).
  • Execution order for a single process is called
    program order.

x
P1
P2
18
Atomicity of Instruction Execution
Consider P1 INC(i) P2 INC(i)
i i2
  • The atomicity model is important for answering
    the question
  • Is my parallel program correct?

19
Program properties or invariants
  • Typically we are interested of
  • Safety bad things cannot happen
  • Liveness program keeps working and necessary
    things will eventually happen
  • Fairness if several processes run in parallel,
    everybody gets some resources (time and memory)

20
Program Properties Safety Properties
  • something bad cannot happen
  • are kept throughout computation, always true
  • if does not hold, we will know within finite
    number of steps
  • Example deadlock freedom
  • There is always a process that can execute
    another instruction (However, not necessarily
    does execute it).
  • Example mutual exclusion
  • It is not allowed for two given code regions (in
    two different processes) to execute concurrently.
  • Example if xgty holds then xgty holds for the rest
    of the execution.
  • However mutual exclusion as above holds even if
    the program does not allow any of the processes
    to execute any of the code regions!

21
Liveness Properties
  • Something good must happen (in finite number
    of steps)
  • Guarantee progress in computation
  • Example no starvation
  • Any process wishes to execute an instruction will
    eventually be able to execute.
  • Example Program/process eventually terminates.
  • Example One of the processes will enter critical
    section.
  • (note the difference from deadlock freedom)

22
Fairness Properties
  • Liveness properties are relatively weak guarantee
    of access to a shared resource.
  • Weak fairness if a process awaits on a certain
    request then eventually it will be granted.
  • Eventually is not good enough for OS and
    real-time systems, when response time counts.
  • Strong fairness if the process performs the
    request sufficiently frequently then eventually
    it will be granted.
  • Linear waiting if a process performs the
    request it will be allowed previous to any other
    process granted twice.
  • FIFO - . previous to granting any other process
    that asked later.
  • Easy to implement in a centralized system.
    However, in a distributed system it is not clear
    what before or later mean.

23
Mutual Exclusion
  • N processes perform an infinite loop of
    instruction sequence, which is composed of a
    critical section and a non-critical section.
  • Mutual exclusion property instructions from
    critical sections of two or more processes must
    not be interleaved in the (global observers)
    execution order.

x
(x
x
x)
x
x
x
x
x
x
x
x
P1
o
(o
o
o)
o
o
o
o
o
o
o
o
P2
time
24
Mutual exclusion the solution
  • The solution is by way of additional instructions
    executed by every process which is to enter or
    leave its critical section.
  • The pre_protocol
  • The post_protocol
  • Loop
  • Non_critical_section
  • Pre_protocol
  • Critical_section
  • Post_protocol
  • End_loop

25
Solution must guarantee
  • A process cannot stop for indefinite time in the
    critical_section or the protocols. The solution
    must ensure that such a stop at the
    non_critical_section by one of the processes will
    not violate the ability of the other processes to
    enter the critical section.
  • No deadlock. It may be that several processes
    perform inside their pre_protocols. Eventually,
    one of them will succeed to enter the
    critical_section.
  • No starvation. If a process enters its
    pre_protocol with the intention to enter the
    critical section, it will eventually succeed.
  • No self exclusion. In the absence of other
    processes trying to enter the critical_section, a
    single process will always succeed doing so in a
    very short time.

26
Solution try 1 Give them a token to decide
whose turn is it
  • Integer Turn 1
  • P1
  • begin
  • loop
  • non_crit_1
  • loop
  • exit when Turn 1
  • end loop
  • crit_sec_1
  • Turn 2
  • end loop
  • end P1

P2 begin loop non_crit_2 loop
exit when Turn 2 end loop crit_sec_2
Turn 1 end loop end P2
(Note atomic Read/Write)
27
Solution try 2 Lets give each process a
variable it can use to announce that it is in its
crit_sec
  • Integer C11, C21
  • P1
  • Loop
  • non_crit_sec_1
  • loop
  • exit when C21
  • end loop
  • C1 0
  • crit_sec_1
  • C1 1
  • End Loop

Problem no mutual exclusion Execution
example P1 sees C21 P2 sees C11 P1 sets C1
0 P2 sets C2 0 P1 enters critical sec P2
enters critical sec
P2 Loop non_crit_sec_2 loop
exit when C11 end loop C2 0
crit_sec_2 C2 1 End Loop
28
Solution try 3 Lets set announcing variable
before the loop
  • Integer C11, C21
  • P1
  • Loop
  • non_crit_sec_1
  • C1 0
  • loop
  • exit when C21
  • end loop
  • crit_sec_1
  • C1 1
  • End Loop

P2 Loop non_crit_sec_2 C2 0
loop exit when C11 end loop
crit_sec_2 C2 1 End Loop
Problem deadlock Execution example P1 sets
C10 P2 sets C20 P1 checks C2 forever P2
checks C1 forever
29
Solution try 4 Lets allow other process to
enter its crit_sec if we fail to do so
  • Integer C11, C21
  • P1
  • Loop
  • non_crit_sec_1
  • C1 0
  • loop
  • exit when C21
  • C1 1
  • C1 0
  • end loop
  • crit_sec_1
  • C1 1
  • End Loop

P2 Loop non_crit_sec_2 C2 0
loop exit when C11 C2 1
C2 0 end loop crit_sec_1
C2 1 End Loop
Can other process enter between Ci1 and Ci0 ?
Problem starvation Between C11 and C10 P2
completed a full round. Problem livelock
30
Dekkers algorithm lets give processes a
priority token that will give holder the right of
way when competing
  • Integer C11, C21, Turn1
  • P1
  • Loop
  • non_crit_sec_1
  • C1 0
  • loop
  • exit when C21
  • if Turn 2 then
  • C1 1
  • loop exit when Turn 1
  • end loop
  • C1 0
  • end if
  • end loop
  • crit_sec_1
  • C1 1
  • Turn 2
  • End Loop

P2 Loop non_crit_sec_2 C2 0
loop exit when C11 if Turn 1
then C2 1 loop
exit when Turn 2 end loop
C2 0 end if end
loop crit_sec_2 C2 1 Turn
1 End Loop
  • Algorithm Correct!!!
  • P1 is performing inside the
  • insisting loop
  • If C20 then P1 knows P2 wants to enter crit_sec
  • If, in addition, Turn2, then P1 gives turn to
    P2, and waits for P2 to finish.
  • Clearly, while P1 does all these, P2 itself will
    not give up because it is his Turn.
  • All characteristics for a valid solution exist.

31
Bakery Algorithm mutual exclusion for N
processes
  • Loop
  • non_crit_sec_i
  • choosing(i) 1
  • number(i) 1 max(number)
  • choosing(i) 0
  • for j in 1..N loop
  • if j / i then
  • loop exit when choosing(j) 0
    end loop
  • loop
  • exit when
  • number(j) 0 or
  • number(i) lt
    number(j) or
  • number(i) number
    (j) and i lt j)
  • end loop
  • end if
  • end loop
  • crit_sec_i
  • number(i) 0
  • End loop

Shared arrays array(1..N) of integer Choosing,
Number Process Pi performs integer i
process id
The idea is to have processes take tickets with
numbers on them (just like in the city hall, or
health care). Other processes give turn to
process holding the ticket with minimal
number (he got there first). If two tickets
happen to be the same, the process having minimal
id enters.
32
Changing the rules of the game increasing
atomicity (loadstore)
  • C shared variable
  • Bi Pis private variable
  • TS (Test and Set) Bi C
  • C 1
  • CS (Compare and Swap)
  • if Bi / C
  • tmp C
  • C Bi
  • Bi tmp
  • end if

Loop non_crit_sec_i loop
TS(Bi) exit when Bi0
end loop crit_sec_i C 0 End
loop
Such strong ops are usually supported by the
underlying hardware/OS.
33
The Price of Atomic loadstoreor Why not
Simply Always use Strong Operations?
  • The Set of C must be seen immediately by all
    other processors, in case they execute competing
    code. Since communication between processors is
    via the main memory, need to cut through cache
    levels. Price dozens to hundreds of clock
    cycles, and growing.

Proc. 1
Proc. 2
Proc. 3
B0
B2
Local cache and registers
LoadStore
L2/L3 cache
L2/L3 cache
LoadStore
TS
Main Memory
C
34
Semaphores
  • A semaphore is a special variable.
  • After initialization, only two atomic operations
    are applicable
  • Busy-Wait Semaphore
  • P(S) WAIT(S) When Sgt0 then S S-1
  • V(S) SIGNAL(S) S S1
  • Another definition Blocked-Set Semaphore
  • WAIT(S) if Sgt0 then S S-1
  • else wait on S
  • SIGNAL(S) if there are processes waiting on S
  • then let one of them proceed,
  • else SS1

NOTE LoadStore are embedded in both WAIT and
SIGNAL. Thus, Mutual Exclusion using semaphores
is easy.
35
Virtual Time
  • Virtual Time and Global States of Distributed
    Systems
  • Friedmann Mattern, 1989
  • The Model An asynchronous distributed system a
    set of processes having no shared memory,
    communicating by message transfer.
  • Message delay gt 0, but is not known in advance.
  • A global observer sees the global state at
    certain points in time. It can be said to take a
    snapshot of the global state.
  • A local observer (one of the processes in the
    system) sees the local state. Because of the
    asynchrony, a local observer can only gather
    local views to an approximate global view.
  • This is a hard hazard for many management and
    control problems
  • Mutual exclusion, deadlock detection, distributed
    contracts, leader election, load sharing,
    checkpointing etc.

36
Solution Approaches
  • Simulating a synchronous system by an
    asynchronous one. This requires high overhead on
    global synchronization of each and every step.
  • Simulation of a global state. A snapshot, taken
    asynchronously, which is not necessarily correct
    for any specific point in time, but is in a way
    consistent with the local states of all
    processes.
  • A logical clock which is not global, but can be
    used to derive useful global information. The
    system works asynchronously, but the processes
    make sure to maintain their part of the clock.

37
Events
  • An event is a change in the process state.
  • An event happens instantly, it does not take
    time.
  • A process is a sequence of events
  • There are 3 types of events
  • send event causes a message to be sent
  • receive event causes a message to be received
  • local event only causes an internal change of
    state
  • Events correspond to each other as follows
  • All events in the same process happen
    sequentially, one after the other.
  • Each send event has a corresponding receive
  • This allows us to define the happened before
    relation among events.

38
The Happened Before Relation
We say that event e happened before event e (and
denote it by e ? e or e lt e) if one of the
following properties holds
Processor Order e precedes e in the same
process Send-Receive e is a send and e is the
corresponding receive Transitivity exists e
s.t. e lt e and elt e
Example
39
Independent/Concurrent Events
Two such diagrams are called equivalent when the
happened before relation is the same in both.
(When global time differs for certain
events, think of processor execution as if it was
a rubber band).
Two events e, e are said to be independent or
concurrent (denoted by e e) if not e lt e and
not e lt e.
40
Virtual Time (Lamport, 1978)
  • A logical clock is a function CE ? T
  • E a set of events, C(e) timestamp of e
  • T a partially ordered set s.t. elte ?
    C(e)ltC(e)
  • (the opposite not necessarily true, e.g.
    concurrent events.)
  • Commonly, TN, and there is a local clock Ci for
    each process Pi.
  • To meet the requirements, the clocks perform the
    following protocol
  • Just before executing a local event in Pi Ci
    Ci d (dgt0)
  • Each message m, sent by event e send(m), is
    time-stamped t(m) C(e).
  • Just before Pi receives a message with timestamp
    t Ci max(Ci,t(m)) d (d gt0)

Usually, d1. However, d may change arbitrarily
and dynamically, say, to reflect actual time. The
timestamp of e, C(e), is given after advancing
the clock, i.e., after (1) above was already
performed for e.
41
Logical Clocks Cntd.
Example
C11
C12
C13
P1
e11
e12
e13
C21
C22
P2
e21
e22
P3
e31
C33
A problem When e and e are concurrent, then any
of C(e) lt C(e), C(e) lt C(e), C(e) C(e) may
hold. Thus, when only the timestamps of the
events are known, there is a loss of information.
We do know that C(e) lt C(e) ? not(e lt e). But
we do not know whether e lt e or e e. In
particular, the information whether the events
are independent is most important, and
unfortunately lost.
42
What is a Data-Race?
  • Data-race is an anomaly of concurrent accesses by
    two or more threads to a shared variable and at
    least one is for writing.
  • Example (variable X is global and shared)
  • Thread 1 Thread 2
  • X1 TY
  • Z2 TX

43
Why Data-Races areUndesired?
  • Programs which contain data-races usually
    demonstrate unexpected and even non-deterministic
    behavior.
  • The outcome might depend on specific execution
    order (A.K.A threads interleaving).
  • Re-running the program may not always produce the
    same results.
  • Thus, hard to debug and hard to write correct
    programs.

44
Why Data-Races areUndesired? - Example
  • First Interleaving Thread 1 Thread 2
  • 1. X0
  • 2. TX
  • 3. X
  • Second Interleaving Thread 1 Thread 2
  • 1. X0
  • 2. X
  • 3. TX
  • T0 or T1?

45
Execution Order
  • Each thread has a different execution speed,
    which may change over time.
  • For an external observer of the time axis,
    instructions execution is ordered in execution
    order.
  • Any order is legal.
  • Execution order for a single
  • thread is called program order.

46
How Data-Races Can be Prevented? Explicit
Synchronization
  • Idea In order to prevent undesired concurrent
    accesses to shared locations, we must explicitly
    synchronize between threads.
  • The means for explicit synchronization are
  • Locks, Mutexes and Critical Sections
  • Barriers
  • Binary Semaphores and Counting Semaphores
  • Monitors
  • Single-Writer/Multiple-Readers (SWMR) Locks
  • Others

47
Synchronization Bad Bank Account Example
  • Thread 1 Thread 2
  • Deposit( amount ) Withdraw( amount )
  • balanceamount if (balanceltamount)
  • print( Error )
  • else
  • balanceamount
  • Deposit and Withdraw are not atomic!!!
  • What is the final balance after a series of
    concurrent deposits and withdraws?

48
Synchronization Good Bank Account Example
  • Thread 1 Thread 2
  • Deposit( amount ) Withdraw( amount )
  • Lock( m ) Lock( m )
  • balanceamount if (balanceltamount)
  • Unlock( m ) print( Error )
  • else
  • balanceamount
  • Unlock( m )
  • Since critical sections can never execute
    concurrently, this version exhibits no data-races.

49
Is This Enough?
  • Is This Enough?
  • Theoretically YES.
  • Practically NO.
  • What if programmer accidentally forgets to place
    correct synchronization?
  • How all such data-race bugs can be detected in
    large program?

50
Can Data-Races be Easily Detected? No!
  • Unfortunately, the problem of deciding whether a
    given program contains potential data-races is
    computationally hard!!!
  • There are a lot of execution orders. For t
    threads of n instructions each the number of
    possible orders is about tnt.
  • In addition to all different schedulings, all
    possible inputs should be tested as well.
  • To compound the problem, inserting a detection
    code in a program can perturb its execution
    schedule enough to make all errors disappear.

51
Feasible Data-Races
  • Feasible Data-Races races that are based on the
    possible behavior of the program (i.e. semantics
    of the programs computation).
  • These are the actual (!) data-races that can
    possibly happen in any specific execution.
  • Locating feasible data-races requires full
    analyzing of the programs semantics to determine
    if the execution could have allowed a and b
    (accesses to same shared variable) to execute
    concurrently.

52
Apparent Data-Races
  • Apparent Data-Races approximations (!) of
    feasible data-races that are based on only the
    behavior of the explicit synchronization
    performed by some feasible execution (and not the
    semantics of the programs computation, i.e.
    ignoring all conditional statements).
  • Important, since data-races are usually a result
    of improper synchronization. Thus easier to
    detect, but less accurate.

53
Why Memory Model?
Answers the question Which writes by a process
are seen by which reads of the other processes?
54
Memory Consistency Models
Pi R V W V,7 R V R V Pj R V W V,13 R V R V
Example program
A consistency/memory model is an agreement
between the execution environment (H/W, OS,
middleware) and the processes. Runtime guarantees
to the application certain properties on the way
values written to shared variables become visible
to reads. This determines the memory model,
whats valid, whats not.
55
Memory Model Coherence
  • Coherence is the memory model in which (the
    runtime guarantees to the program that) writes
    performed by the processes for every specific
    variable are viewed by all processes in the same
    full order.

Example program
All valid executions under Coherence
Note the view of a process consists of the
values it sees in its reads, and the writes it
performs. Thus, if a R V in P which is later than
a W V,x in P sees a value different than x, then
a later R V cannot see x.
56
Formal definition of Coherence
  • Program Order The order in which instructions
    appear in each process. This is a partial order
    on all the instructions in the program.
  • A serialization A full order on all the
    instructions (reads/writes) of all the processes,
    which is consistent with the program order.
  • A legal serialization A serialization in which
    each read X returns the value written by the
    latest write X in the full order.
  • Let P be a program let PX be the sub-program
    of P which contains all the read X/write X
    operations on X only.
  • Coherence P is said to be coherent if for every
    variable X there exists a legal serialization of
    PX. (Note a process cannot distinguish one such
    serialization from another for a given execution)

57
Examples
Process 2 read y,1 write x,1
Process 2 read y,1 write x,1
Coherent. Serializations x write x,1, read
x,1 y write y,1, read y,1
Process 1 read x,1 write x,2
Process 2 read x,2 write x,1
Not Coherent. Cycle of dependencies. Cannot be
serialized.
Not Coherent. Cannot be serialized.
58
Sequential Consistency Lamport 1979
  • Sequential Consistency is the memory model in
    which all reads/writes performed by the processes
    are viewed by all processes in the same full
    order.

Coherent. Not Sequentially consistent.
Coherent. Not Sequentially consistent.
59
Strict (Strong) Memory Models
Sequential Consistency Given an execution, there
exists an order of reads/writes which is
consistent with all program orders.
Coherence For any variable x, there exists an
order of read x/write x consistent with all p.o.s.
60
Formal definition of Sequential Consistency
  • Let P be a program.
  • Sequential Consistency P is said to be
    sequentially consistent if there exists a legal
    serialization of all reads/writes in P.

Observation Every program which is sequentially
consistent is also coherent. Conclusion Sequentia
l Consistency has stronger requirements and we
thus say that it is stronger than Coherence. In
general A consistency model A is said to be
(strictly) stronger than B if all executions
which are valid under A are also valid under B.
Write a Comment
User Comments (0)
About PowerShow.com