MultiprocessorMulticore Systems Scheduling, Synchronization - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

MultiprocessorMulticore Systems Scheduling, Synchronization

Description:

... the computing resources provided by multi-core processors. ... Intel Core 2 dual core processor, with CPU-local Level 1 caches shared, on-die Level 2 cache. ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 51
Provided by: patt207
Category:

less

Transcript and Presenter's Notes

Title: MultiprocessorMulticore Systems Scheduling, Synchronization


1
Multiprocessor/Multicore SystemsScheduling,
Synchronization
2
Multiprocessors
  • DefinitionA computer system in which two or
    more CPUs share full access to a common RAM

3
Multiprocessor/Multicore Hardware (ex.1)
  • Bus-based multiprocessors

4
Multiprocessor/Multicore Hardware (ex.2) UMA
(uniform memory access)
  • Not/hardly scalable
  • Bus-based architectures -gt saturation
  • Crossbars too expensive (wiring constraints)
  • Possible solutions
  • Reduce network traffic by caching
  • Clustering -gt non-uniform memory latency
    behaviour (NUMA.

5
Multiprocessor/Multicore Hardware (ex.3) NUMA
(non-uniform memory access)
  • Single address space visible to all CPUs
  • Access to remote memory slower than to local
  • Cache-controller/MMU determines whether a
    reference is local or remote
  • When caching is involved, it's called CC-NUMA
    (cache coherent NUMA)
  • Typically Read Replication (write invalidation)

6
Cache-coherence
7
Cache coherence (cont)
  • cache coherency protocols are based on a set of
    (cache block) states and state transitions 2
    types of protocols
  • write-update
  • write-invalidate (suffers from false sharing)
  • Some invalidations are not necessary for correct
    program execution
  • Processor 1
    Processor 2
  • while (true) do while
    (true) do
  • A A 1 B
    B 1
  • If A and B are located in the same cache block,
    a cache miss occurs in each loop-iteration due to
    a ping-pong of invalidations

8
On multicores
Reason for multicores physical limitations can
cause significant heat dissipation and data
synchronization problems In addition to
operating system (OS) support, adjustments to
existing software are required to maximize
utilization of the computing resources provided
by multi-core processors. Virtual machine
approach again in focus.
Intel Core 2 dual core processor, with CPU-local
Level 1 caches shared, on-die Level 2 cache.
9
On multicores (cont)
  • Also possible (figure from www.microsoft.com/licen
    sing/highlights/multicore.mspx)

10
OS Design issues (1)Who executes the
OS/scheduler(s)?
  • Master/slave architecture Key kernel functions
    always run on a particular processor
  • Master is responsible for scheduling slave
    sends service request to the master
  • Disadvantages
  • Failure of master brings down whole system
  • Master can become a performance bottleneck
  • Peer architecture Operating system can execute
    on any processor
  • Each processor does self-scheduling
  • New issues for the operating system
  • Make sure two processors do not choose the same
    process

11
Master-Slave multiprocessor OS
Bus
12
Non-symmetric Peer Multiprocessor OS
Bus
  • Each CPU has its own operating system

13
Symmetric Peer Multiprocessor OS
Bus
  • Symmetric Multiprocessors
  • SMP multiprocessor model

14
Scheduling in Multiprocessors
  • Recall Tightly coupled multiprocessing (SMPs)
  • Processors share main memory
  • Controlled by operating system
  • Different degrees of parallelism
  • Independent and Coarse-Grained Parallelism
  • no or very limited synchronization
  • can by supported on a multiprocessor with little
    change (and a bit of salt ?)
  • Medium-Grained Parallelism
  • collection of threads usually interact
    frequently
  • Fine-Grained Parallelism
  • Highly parallel applications specialized and
    fragmented area

15
Design issues 2 Assignment of Processes to
Processors
  • Per-processor ready-queues vs global ready-queue
  • Permanently assign process to a processor
  • Less overhead
  • A processor could be idle while another processor
    has a backlog
  • Have a global ready queue and schedule to any
    available processor
  • can become a bottleneck
  • Task migration not cheap

16
Multiprocessor Schedulingper partition RQ
  • Space sharing
  • multiple threads at same time across multiple CPUs

17
Multiprocessor SchedulingLoad sharing / Global
ready queue
  • Timesharing
  • note use of single data structure for scheduling

18
Multiprocessor Scheduling Load Sharing a problem
  • Problem with communication between two threads
  • both belong to process A
  • both running out of phase

19
Design issues 3Multiprogramming on processors?
  • Experience shows
  • Threads running on separate processors (to the
    extend of dedicating a processor to a thread)
    yields dramatic gains in performance
  • Allocating processors to threads allocating
    pages to processes (can use working set model?)
  • Specific scheduling discipline is less important
    with more than on processor the decision of
    distributing tasks is more important

20
Gang Scheduling
  • Approach to address the prev. problem
  • Groups of related threads scheduled as a unit (a
    gang)
  • All members of gang run simultaneously
  • on different timeshared CPUs
  • All gang members start and end time slices
    together

21
Gang Scheduling another option
22
Multiprocessor Thread Scheduling Dynamic
Scheduling
  • Number of threads in a process are altered
    dynamically by the application
  • Programs (through thread libraries) give info to
    OS to manage parallelism
  • OS adjusts the load to improve use
  • Or os gives info to run-time system about
    available processors, to adjust of threads.
  • i.e dynamic vesion of partitioning

23
Summary Multiprocessor Thread Scheduling
  • Load sharing processors/threads not assigned to
    particular processors
  • load is distributed evenly across the processors
  • needs central queue may be a bottleneck
  • preemptive threads are unlikely to resume
    execution on the same processor cache use is
    less efficient
  • Gang scheduling Assigns threads to particular
    processors (simultaneous scheduling of threads
    that make up a process)
  • Useful where performance severely degrades when
    any part of the application is not running (due
    to synchronization)
  • Extreme version Dedicated processor assignment
    (no multiprogramming of processors)

24
Multiprocessor Scheduling and Synchronization
  • Priorities blocking synchronization may result
    in
  • priority inversion low-priority process P holds
    a lock, high-priority process waits, medium
    priority processes do not allow P to complete and
    release the lock fast (scheduling less
    efficient). To cope/avoid this
  • use priority inheritance
  • use non-blocking synchronization (wait-free,
    lock-free, optimistic synchronization)
  • convoy effect processes need a resource for
    short time, the process holding it may block them
    for long time (hence, poor utilization)
  • non-blocking synchronization is good here, too

25
Readers-Writersnon-blocking synchronization
  • (some slides are adapted from J. Andersons
    slides on same topic)

26
The Mutual Exclusion ProblemLocking
Synchronization
while true do Noncritical Section Entry
Section Critical Section Exit Section od
  • N processes, each with this structure
  • Basic Requirements
  • Exclusion Invariant( in CS ? 1).
  • Starvation-freedom (process i in Entry) leads-to
    (process i in CS).
  • Can implement by busy waiting (spin locks) or
    using kernel calls.

27
Non-blocking Synchronization
  • The problem
  • Implement a shared object without mutual
    exclusion.
  • Shared Object A data structure (e.g., queue)
    shared by concurrent processes.
  • Why?
  • To avoid performance problems that result when a
    lock-holding task is delayed.
  • To avoid priority inversions (more on this later).

Locking
28
Non-blocking Synchronization
  • Two variants
  • Lock-free
  • Only system-wide progress is guaranteed.
  • Usually implemented using retry loops.
  • Wait-free
  • Individual progress is guaranteed.
  • Code for object invocations is purely sequential.

29
Readers/Writers Problem
  • Courtois, et al. 1971.
  • Similar to mutual exclusion, but several readers
    can execute critical section at once.
  • If a writer is in its critical section, then no
    other process can be in its critical section.
  • no starvation, fairness

30
Solution 1
Readers have priority
Reader P(mutex) rc rc 1 if rc 1
then P(w) fi V(mutex) CS P(mutex) rc rc
? 1 if rc 0 then V(w) fi V(mutex)
Writer P(w) CS V(w)
First reader executes P(w). Last one
executes V(w).
31
Solution 2
Writers have priority readers should not
build long queue on r, so that writers can
overtake gt mutex3
Reader P(mutex3) P(r)
P(mutex1) rc rc 1 if rc 1
then P(w) fi V(mutex1) V(r) V(mutex3) CS
P(mutex1) rc rc ? 1 if rc 0 then V(w)
fi V(mutex1)
Writer P(mutex2) wc wc 1 if wc 1
then P(r) fi V(mutex2) P(w)
CS V(w) P(mutex2) wc wc ? 1 if wc 0
then V(r) fi V(mutex2)
32
Properties
  • If several writers try to enter their critical
    sections, one will execute P(r), blocking
    readers.
  • Works assuming V(r) has the effect of picking a
    process waiting to execute P(r) to proceed.
  • Due to mutex3, if a reader executes V(r) and a
    writer is at P(r), then the writer is picked to
    proceed.

33
Concurrent Reading and WritingLamport 77
  • Previous solutions to the readers/writers problem
    use some form of mutual exclusion.
  • Lamport considers solutions in which readers and
    writers access a shared object concurrently.
  • Motivation
  • Dont want writers to wait for readers.
  • Readers/writers solution may be needed to
    implement mutual exclusion (circularity problem).

34
Interesting Factoids
  • This is the first ever lock-free algorithm
    guarantees consistency without locks
  • An algorithm very similar to this is implemented
    within an embedded controller in Mercedes
    automobiles!!

35
The Problem
  • Let v be a data item, consisting of one or more
    digits.
  • For example, v 256 consists of three digits,
    2, 5, and 6.
  • Underlying model Digits can be read and written
    atomically.
  • Objective Simulate atomic reads and writes of
    the data item v.

36
Preliminaries
  • Definition vi, where i ? 0, denotes the ith
    value written to v. (v0 is vs initial value.)
  • Note No concurrent writing of v.
  • Partitioning of v v1 ? vm.
  • vi may consist of multiple digits.
  • To read v Read each vi (in some order).
  • To write v Write each vi (in some order).

37
More Preliminaries
We say r reads vk,l. Value is consistent if
k l.
38
Theorem 1
If v is always written from right to left, then a
read from left to right obtains a value
v1k1,l1 v2k2,l2 vmkm,lm where
k1 ? l1 ? k2 ? l2 ? ? km ? lm.
Example v v1v2v3 d1d2d3
Read reads v10,0 v21,1 v32,2.
39
Another Example
v v1 v2
read v1
read v2
d1d2
d3d4
?
?
read
?
?
rd1
rd2
rd4
rd3
wv1
wv2
wv1
wv2
?
?
?
?
?
?
?
?
wd3
wd4
wd1
wd2
wd3
wd4
wd1
wd2
write1
write0
Read reads v10,1 v21,2.
40
Theorem 2
Assume that i ? j implies that vi ? vj, where
v d1 dm. (a) If v is always written from
right to left, then a read from left to
right obtains a value vk,l ? vl. (b) If v is
always written from left to right, then a read
from right to left obtains a value vk,l ?
vk.
41
Example of (a)
v d1d2d3
Read obtains v0,2 390 lt 400 v2.
42
Example of (b)
v d1d2d3
Read obtains v0,2 498 gt 398 v0.
43
Readers/Writers Solution
gt means assign larger value. ? V1 means left
to right. ? V2 means right to left.
44
Proof Obligation
  • Assume reader reads V2k1, l1 Dk2, l2 V1k3,
    l3.
  • Proof Obligation V2k1, l1 V1k3, l3 ? k2
    l2.

45
Proof
By Theorem 2,
V2k1,l1 ? V2l1 and V1k3 ?
V1k3,l3.
(1) Applying Theorem 1 to V2 D V1,
k1 ? l1 ? k2
? l2 ? k3 ? l3 .
(2) By the
writer program,
l1 ? k3 ? V2l1 ? V1k3.

(3) (1), (2), and (3) imply
V2k1,l1 ? V2l1 ? V1k3
? V1k3,l3. Hence, V2k1,l1 V1k3,l3 ?
V2l1 V1k3 ? l1 k3
, by the writers
program. ? k2 l2
by (2).
46
Supplemental Reading
  • check
  • G.L. Peterson, Concurrent Reading While
    Writing, ACM TOPLAS, Vol. 5, No. 1, 1983, pp.
    46-55.
  • Solves the same problem in a wait-free manner
  • guarantees consistency without locks and
  • the unbounded reader loop is eliminated.
  • First paper on wait-free synchronization.
  • Now, very rich literature on the topic. Check
    also
  • PhD thesis A. Gidenstam, 2006, CTH
  • PhD Thesis H. Sundell, 2005, CTH

47
Useful Synchronization PrimitivesUsually
Necessary in Nonblocking Algorithms
CAS(var, old, new) ? if var ? old then return
false fi var new return true ?
CAS2 extends this
LL(var) ? establish link to var return
var ? SC(var, val) ? if link to var still
exists then break all current links of all
processes var val return true
else return false fi ?
48
Another Lock-free ExampleShared Queue
type Qtype record v valtype next pointer
to Qtype end shared var Tail pointer to
Qtype local var old, new pointer to
Qtype procedure Enqueue (input valtype) new
(input, NIL) repeat old Tail
until CAS2(Tail, old-gtnext, old, NIL, new, new)
retry loop
new
new
old
old
Tail
Tail
49
Using Locks in Real-time SystemsThe Priority
Inversion Problem
Uncontrolled use of locks in RT systems can
result in unbounded blocking due to priority
inversions.
High
Med
Low
Time
t
t
t
0
1
2
Shared Object Access
Priority Inversion
Computation not involving object accesses
50
Dealing with Priority Inversions
  • Common Approach Use lock-based schemes that
    bound their duration (as shown).
  • Examples Priority-inheritance and -ceiling
    protocols.
  • Disadvantages Kernel support, very inefficient
    on multiprocessors.
  • Alternative Use non-blocking objects.
  • No priority inversions or kernel support.
  • Wait-free algorithms are clearly applicable here.
  • What about lock-free algorithms?
  • Advantage Usually simpler than wait-free
    algorithms.
  • Disadvantage Access times are potentially
    unbounded.
  • But for periodic task sets access times are also
    predictable!! (check further-reading-pointers)
Write a Comment
User Comments (0)
About PowerShow.com