Title: Shrinking AIX as a compute node OS
1Fast-OS
Shrinking AIX as a compute node OS
July-10-2002 Terry Jones, Integrated Computing
Communications Dept trj_at_llnl.gov
2Outline
- Introduction
- Todays landscape
- Directions
- Problem Areas Ripe for Investigation
- Parallel Aware Scaling
- Parallel Aware Memory Management
- Metrics for evaluating system software
- Why would anyone want to muck with AIX
- Bottom-up and Top-down Approaches
- Why AIX?
- How AIX?
- Conclusion
3- Introduction
- Todays landscape
- Directions
- Problem Areas Ripe for Investigation
- Parallel Aware Scaling
- Parallel Aware Memory Management
- Metrics for evaluating system software
- Why would anyone want to muck with AIX
- Bottom-up and Top-down Approaches
- Why AIX?
- How AIX?
- Conclusion
4The Landscape
- Parallel applications need to span thousands of
nodes - Architectures are adding more processor state
- Applications are not mission critical
- Both interrupts and busy-waiting are bad
- Cache effects (processor affinity) cannot be
ignored - Two modes Capability mode (jobs are dedicated)
- Capacity mode (jobs may space-share machine)
5Directions
- Continue to move from a monolithic operating
system which communicates via shared-memory TO a
decentralized design which communicates via
efficient messages - Small kernel process level managers
- Modularity
- Fault-tolerance
- Extensibility
Question How much should system software offer
in terms of features?
Answer Everything required, and as much
desired as possible
6- Introduction
- Todays landscape
- Directions
- Problem Areas Ripe for Investigation
- Parallel Aware Scaling
- Parallel Aware Memory Management
- Metrics for evaluating system software
- Why would anyone want to muck with AIX
- Bottom-up and Top-down Approaches
- Why AIX?
- How AIX?
- Conclusion
7Problem Areas Ripe For Investigation
- Add parallel awareness
- CPU resource (local/global program context,
scheduling) - Memory resource (demand paging, address space
extent) - Metrics
- Other possibilities Fault tolerance/Membership
services - Re-visit where we insert boundaries (e.g.
boundary between kernel and user-level code)
8Scheduling Is An Overloaded Word
- Spatial Scheduling
- Assign processes to nodes
- For example, batch schedulers gang-schedulers
- Coarse grain view of work to be done
- Temporal Scheduling
- For example, native operating system scheduling
- Fine grain view of work to be done (e.g.
efficient pthread level scheduling) - Lack necessary global view
- Coscheduling
9The Need for Parallel Aware Scheduling
- Even on the most bare-bones operating systems,
there can be more runnable processes than
processors - Many parallel algorithms are extremely sensitive
to serializations - A first order goal is to maximize the overlap of
competing (interfering) processes during a
parallel application.
10Improving Memory Management
- Provide as much memory as possible with as
little pain as possible - Memory systems are becoming more complex
- Improved mechanisms to counter false-sharing.
11Why Demand Paging
- External storage (secondary networked) will
continue to exceed local memory - Memory requirements for certain simulations are
almost unbounded - Removing constraints on memory is very desirable,
but the cost of a page-fault is too much to have
hidden from an application - Default process level manager provide page-cache
management as in Stanford DASH.
12Challenges For A
Virtual Memory Environment
- Thought to preclude or make more difficult OS
bypass communications - An application cannot know the amount of physical
memory it has available - An application cannot efficiently control the
contents of the physical memory allocated to it - An application cannot control the read-ahead,
writeback and discarding of pages within its
physical memory.
13Metrics For Evaluating System Software
- An aid for reaching agreement on what we want
- A quantitative measure of different approaches
- Compared to the scheduler work and the virtual
memory work, may be the most difficult
14- Introduction
- Todays landscape
- Directions
- Problem Areas Ripe for Investigation
- Parallel Aware Scaling
- Parallel Aware Memory Management
- Metrics for evaluating system software
- Why would anyone want to muck with AIX
- Bottom-up and Top-down Approaches
- Why AIX?
- How AIX?
- Conclusion
15Bottom-up Top-down Approaches
- Bottom-up
- Start with a clean-slate
- Add features as the need arises
- Settle on a reasonable boundary
- Top-down
- Start with a full-featured implementation
- Remove the unnecessary cruft
- Settle on a reasonable boundary
16Why AIX?
- AIX is ubiquitous in supercomputer centers
- AIX already has extensive capabilities
- Not required to build everything before we try
anything - AIX is mature (read is not in radical change
mode) - AIX scalability (32-way with AIX 5.x)
17How AIX?
- In close conjunction with IBM
- Expect successes to payoff in IBM products
- Done in an operating system independent manner
- Findings apropos and available to other operating
systems - Evaluated with real applications on very large
machines
18- Introduction
- Todays landscape
- Directions
- Problem Areas Ripe for Investigation
- Parallel Aware Scaling
- Parallel Aware Memory Management
- Metrics for evaluating system software
- Why would anyone want to muck with AIX
- Bottom-up and Top-down Approaches
- Why AIX?
- How AIX?
- Conclusion
19Conclusion
- New needs arising from todays parallel machines
pose new challenges for system software - Among the key needs which emerge...
- Parallel aware scheduling
- Improved memory management
- Metrics for evaluating operating systems
- These can be investigated from a bottom-up
approach, or a top-down approach, or both - AIX is a reasonable choice for a top-down
approach
This work was performed under the auspices of the
U.S. Department of Energy by University
of California Lawrence Livermore National
Laboratory under contract No. W-7405-Eng-48.