Title: Multiprocessing and Distributed Systems
1Multiprocessing and Distributed Systems
- CS-502 Operating SystemsFall 2007
- (Slides include materials from Operating System
Concepts, 7th ed., by Silbershatz, Galvin,
Gagne, Modern Operating Systems, 2nd ed., by
Tanenbaum, andOperating Systems Internals and
Design Principles, 5th ed., by Stallings)
2Multiprocessing ? Distributed Computing(a
spectrum)
- Many independent problems at same time
- Similar
- Different
- One very big problem (or a few)
-
- Computations that are physically separated
- Client-server
- Inherently dispersed computations
- Different kinds of computers and operating systems
3Multiprocessing ? Distributed Computing(a
spectrum)
- Many independent problems at same time
- Similar e.g., banking credit card airline
reservations - Different e.g., university computer center
your own PC - One very big problem (or a few)
-
- Computations that are physically separated
- Client-server
- Inherently dispersed computations
- Different kinds of computers and operating systems
4Multiprocessing ? Distributed Computing(a
spectrum)
- Many independent problems at same time
- Similar e.g., banking credit card airline
reservations - Different e.g., university computer center
your own PC - One very big problem (too big for one computer)
- Weather modeling, finite element analysis drug
discovery gene modeling weapons simulation
etc. - Computations that are physically separated
- Client-server
- Inherently dispersed computations
- Different kinds of computers and operating systems
5Multiprocessing ? Distributed Computing(a
spectrum)
- Many independent problems at same time
- Similar e.g., banking credit card airline
reservations - Different e.g., university computer center
your own PC - One very big problem (too big for one computer)
- Weather modeling, Finite element analysis Drug
discovery Gene modeling Weapons simulation
etc. - Computations that are physically separated
- Client-server
- Dispersed and/or peer-to-peer
- Routing tables for internet
- Electric power distribution
- International banking
- Different kinds of computers and operating systems
6Inherently Distributed Computation(example)
- Internet routing of packets
- Network layer send a packet to its IP address
- Problems
- No central authority to manage routing in
Internet - Nodes and links come online or fail rapidly
- Need to be able to send a packet from any address
to any address - Solution
- Distributed routing algorithm
7Distributed routing algorithm(simplified example)
- Each node knows networks other nodes directly
connected to it. - Each node maintains table of distant networks
- network , 1st hop, distance
- Periodically adjacent nodes exchange tables
- Update algorithm (for each network in table)
- If (neighbors distance to network my distance
to neighbor lt my distance to network), then
update my table entry for that network
8Distributed routing algorithm(result)
- All nodes in Internet have reasonably up-to-date
routing tables - Rapid responses to changes in network topology,
congestion, failures, etc. - Very reliable with no central management!
- Network management software
- Monitoring health of network (e.g., routing
tables) - Identifying actual or incipient problems
- Data and statistics for planning purposes
9Summary
- Some algorithms are inherently distributed
- Depend upon computations at physically separate
places - Result is the cumulative results of individual
computations - Not much impact on computer or OS architecture
10Taxonomy of Parallel Processing
11Client-Server Computations
- Very familiar model
- Parts of computation takes place on
- Users PC, credit-card terminal, etc.
- and part takes place on central server
- E.g., updating database, search engine, etc.
- Central issues
- Protocols for efficient communication
- Where to partition the problem
12Very Big Problems
- Lots and lots of calculations
- Highly repetitive
- Easily parallelizable into separate subparts
- (Usually) dependent upon adjacent subparts
- Arrays or clusters of computers
- Independent memories
- Very fast communication between elements? close
coupling
13Closely-coupled Systems
- Examples
- Beowulf clusters 32-256 PCs
- ASCI Red 1000-4000 PC-like processors
- BlueGene/L up to 65,536 processing elements
14Taxonomy of Parallel Processing
15IBM BlueGene/L
16Systems for Closely-Coupled Clusters
- Independent OS for each node
- E.g., Linux in Beowulf
- Additional middleware for
- Distributed process space
- Synchronization among nodes (Silbershatz, Ch 18)
- Access to network and peripheral devices
- Failure management
- Parallelizing compilers
- For partitioning a problem into many processes
17Cluster Computer Architecture
18Interconnection Topologies
19Goal
- Microsecond-level
- messages
- synchronization and barrier operations among
processes
20Questions?
21Multiprocessing ? Distributed Computing(a
spectrum)
- Many independent problems at same time
- Similar e.g., banking credit card airline
reservations - Different e.g., university computer center
your own PC - One very big problem (or a few)
-
- Problems that physically separated by their very
nature - Different kinds of computers and operating systems
22Many Separate Computations
- Similar banking credit card airline
reservation - Transaction processing
- Thousands of small transactions per second
- Few (if any) communications among transactions
- Exception locking, mutual exclusion,
serialization - Common database
23Requirements
- Lots of disks, enough CPU power
- Accessible via multiple paths
- High availability
- Fault-tolerant components OS support
- Processor independence
- Any transaction on any processor
- Global file system
- Load-balancing
- Scalable
24Options
- Cluster (of a somewhat different sort)
- Shared-memory multi-processor
25Taxonomy of Parallel Processing
26Cluster Computing
- Scalable to large number of processors
- Common I/O space
- Multiple channels for disk access
- Hyper-channel, Fiber-channel, etc.
- Multi-headed disks
- At least two paths from any processor to any disk
- Remote Procedure Call for communication among
processors - DCOM, CORBA, Java RMI, etc.
27Shared-Memory Multiprocessor
- All processors can access all memory
- Some memory may be faster than other memory
- Any process can run on any processor
- Multiple processors executing in same address
space concurrently - Multiple paths to disks (as with cluster)
28Shared Memory MultiprocessorsBus-based
- Bus contention limits the number of CPUs
- Lower bus contention
- Caches need to be synced (big deal)
- Compiler places data and text in private or
shared memory
29Multiprocessors (2) - Crossbar
- Can support a large number of CPUs -
- Non-blocking network
- Cost/performance effective up to about 100 CPU
growing as n2
30Multiprocessors(3) Multistage Switching Networks
- Omega Network blocking
- Lower cost, longer latency
- For N CPUs and N memories log2 n stages of n/2
switches
31Type of Multiprocessors UMA vs. NUMA
- UMA (Uniform Memory Access)
- All memory is equal
- Familiar programming model
- Number of processors is limited
- 8-32
- Symmetrical
- No processor is different from others
- NUMA (Non-Uniform Memory Access)
- Single address space visible to all CPUs
- Access to remote memory via commands
- LOAD STORE
- remote memory access slower than to local
32Common SMP Implementations
- Multiple processors on same board or bus
- Dual core and multi-core processors
- I.e., two or more processors on same chip
- Share same L2 cache
- Hyperthreading
- One processor shares its circuitry among two
threads or processes - Separate registers, PSW, etc.
- Combinations of above
33Hardware issue Cache Synchronization
- Each processor cache monitors memory accesses on
bus (i.e., snoops) - Memory updates are propagated to other caches
- Complex cache management hardware
- Order of operations may be ambiguous
- Some data must be marked as not cacheable
- Visible to programming model
34Operating System Considerations
- Master-slave vs. full symmetry
- Explicit cache flushing
- When page is replaced or invalidated
- Processor affinity of processes
- Context switches across processors can be
expensive - Synchronization across processors
- Interrupt handling
35Multiprocessor OS Master-Slave
- One CPU (master) runs the OS and applies most
policies - Other CPUs
- run applications
- Minimal OS to acquire and terminate processes
- Relatively simple OS
- Master processor can become a bottleneck for a
large number of slave processors
36Multiprocessor OS Symmetric Multi-Processor
(SMP)
- Any processor can execute the OS and any
applications - Synchronization within the OS is the issue
- Lock the whole OS poor utilization long
queues waiting to use OS - OS critical regions much preferred
- Identify independent OS critical regions that be
executed independently protect with mutex - Identify independent critical OS tables protect
access with MUTEX - Design OS code to avoid deadlocks
- The art of the OS designer
- Maintenance requires great care
37Multiprocessor OS SMP (continued)
- Multiprocessor Synchronization
- Need special instructions test-and-set
- Spinlocks are common
- Can context switch if time in critical region is
greater than context switch time - OS designer must understand the performance of OS
critical regions - Context switch time could be onerous
- Data cached on one processor needs to be
re-cached on another
38Multiprocessor Scheduling
- When processes are independent (e.g.,
timesharing) - Allocate CPU to highest priority process
- Tweaks
- For a process with a spinlock, let it run until
it releases the lock - To reduce TLB and memory cache flushes, try to
run a process on the same CPU each time it runs - For groups of related processes
- Attempt to simultaneously allocate CPUs to all
related processes (space sharing) - Run all threads to termination or block
- Gang schedule apply a scheduling policy to
related processes together
39Linux SMP Support
- Multi-threaded kernel
- Any processor can handle any interrupt
- Interrupts rarely disabled, and only for shortest
time possible - Multiple processors can be active in system calls
concurrently - Process vs. Interrupt context
- Process context when processor is acting on
behalf of a specific process - Interrupt context when processor is acting on
behalf of no particular process - Extensive use of spinlocks
- Processor affinity properties in task_struct
40WPI Computing Communications Center
- CCC cluster 11 dual 2.4 GHz Xeon processors
- toth 16-node 1.5 GHz Itanium2
- big16 8 dual-core 2.4 GHz Opterons
- toaster NetApp 5.5 TByte Fiber Channel RAID NFS
- breadbox NetApp 12 TByte RAID NFS
- Many others
- All connected together by very fast networks
41Questions?