Parallel OS - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Parallel OS

Description:

need local proxies for performance and fault isolation ... Possible path. Revisit 'Sandia design', in light of larger SMP nodes ... – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Slides: 15
Provided by: snir4
Category:
Tags: delay | fault | parallel | path

less

Transcript and Presenter's Notes

Title: Parallel OS


1
Parallel OS
  • Goals and Constraints

2
I, too can have a cartoon
3
OS
  • implements virtual machine seen by application
    software
  • higher level of abstraction than HW
  • can be runtime!
  • provides protection error recovery
  • performs resource management
  • because system goal function application
    goal function
  • resources are allocated on demand
  • works when each app uses small fraction of total
    resources
  • OS predicts future application needs, based on
    past behavior
  • OS is autonomous

4
Key Questions
  • Does parallel OS define right virtual machine?
  • Does is apply right resource management policies?
  • Does it provide required protection recovery?

5
Cluster OS
  • Node OS is full-fledged Unix system
  • Global OS is set of cluster services built atop
    Unix networking APIs

6
Problems
  • Very limited set of global services, with low
    performance, since atop networking APIs
  • Local resource management decisions are often
    wrong, since based on local information and on
    irrelevant time sharing model
  • No distinction between internal and external
    allocation
  • Resource management is too fine grain
  • E.g. mem management, global malloc (same
    address everywhere), dynamic gang scheduling
    (comm variance)
  • IO
  • node asynchronous IO
  • global blocking IO, for long delay events
    (msecs)
  • global swap, for very long delay events (secs)

7
Desired Structure
  • Each parallel application is provided with a
    dedicated virtual parallel machine
  • E.g. space partition, for a long time block
  • changes in VPM resources are rare and are
    negotiated
  • Hw provides protection across VPMs
  • resource management inside VPM done by runtime
  • thread scheduler, memory manager, IO

8
Implementation
  • OS proxy is local representative of global OS
  • attaches/detaches resources to/from VPM
  • handles exceptions
  • does not implement local policies
  • does not manage VPM resources

9
Global runtime abstractions
  • Collective service invocations
  • collective IO, collective malloc,
  • Scalable, associative individual invocations
  • call logically made to global server
  • actually serviced by local proxy or regional
    proxy
  • associative service throughput scales with
    number of servers
  • e.g. global queue
  • also gains locality

10
Where does OS run?
  • Local proxy must run on local node
  • may mostly use semi-dedicated resources?
    (separate CPU in large SMP node)
  • Global OS logic may run at dedicated server nodes
    or be distributed all across, or both
  • answer not obvious with large SMP nodes and
    modern interconnects

11
Shared Memory OS is it different?
  • Many differences of detail same global
    structure
  • need local proxies for performance and fault
    isolation
  • need recoverable/transactional communication
    protocol between proxies (message passing?)
  • may have more logical/flexible definition of
    node
  • VPM has global shared memory, protected by hw
    from other VPMs, and managed by runtime

12
Why ideal OS will not come from commercial
system vendors?
  • Weaker requirement for strong coupling across
    nodes in most commercial environments
  • Desire to avoid high-end unique technology
  • Strong reluctance to revisit established
    boundaries (OS/runtime/compiler/)
  • which are also organizational boundaries
  • Possible commodization of OS
  • all action is in the middleware
  • Cost of testing, even for minute changes
  • Risk avoidance and lack of expertise

13
Why ideal OS may come from commercial vendors
  • Needs of large subsystems
  • DB, web servers
  • Needs of new hw architectures
  • Security problems in large, monolithic kernels

14
Possible path
  • Revisit Sandia design, in light of larger SMP
    nodes
  • heavy server node with general purpose OS
  • light compute node with simple exec, managing
    single user process
  • derived from real-time OS? from Linux?
  • Develop global OS functions incrementally
  • Ensure that future architectures provide right hw
    protection mechanisms to enable user runtime VPM
    management
  • take advantage of hw support of hypervisor?
Write a Comment
User Comments (0)
About PowerShow.com