Shrinking AIX as a compute node OS - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Shrinking AIX as a compute node OS

Description:

New needs arising from today's parallel machines pose new challenges for system software ... Lawrence Livermore National Laboratory under contract No. W-7405 ... – PowerPoint PPT presentation

Number of Views:112

Avg rating:3.0/5.0

Slides: 20

Provided by: terry270

Category:

more less

Transcript and Presenter's Notes

Title: Shrinking AIX as a compute node OS

1
Fast-OS
Shrinking AIX as a compute node OS
July-10-2002 Terry Jones, Integrated Computing
Communications Dept trj_at_llnl.gov
2
Outline

Introduction
Todays landscape
Directions
Problem Areas Ripe for Investigation
Parallel Aware Scaling
Parallel Aware Memory Management
Metrics for evaluating system software
Why would anyone want to muck with AIX
Bottom-up and Top-down Approaches
Why AIX?
How AIX?
Conclusion

Introduction
Todays landscape
Directions
Problem Areas Ripe for Investigation
Parallel Aware Scaling
Parallel Aware Memory Management
Metrics for evaluating system software
Why would anyone want to muck with AIX
Bottom-up and Top-down Approaches
Why AIX?
How AIX?
Conclusion

4
The Landscape

Parallel applications need to span thousands of
nodes
Architectures are adding more processor state
Applications are not mission critical
Both interrupts and busy-waiting are bad
Cache effects (processor affinity) cannot be
ignored
Two modes Capability mode (jobs are dedicated)
Capacity mode (jobs may space-share machine)

5
Directions

Continue to move from a monolithic operating
system which communicates via shared-memory TO a
decentralized design which communicates via
efficient messages
Small kernel process level managers
Modularity
Fault-tolerance
Extensibility

Question How much should system software offer
in terms of features?
Answer Everything required, and as much
desired as possible
6

Introduction
Todays landscape
Directions
Problem Areas Ripe for Investigation
Parallel Aware Scaling
Parallel Aware Memory Management
Metrics for evaluating system software
Why would anyone want to muck with AIX
Bottom-up and Top-down Approaches
Why AIX?
How AIX?
Conclusion

7
Problem Areas Ripe For Investigation

Add parallel awareness
CPU resource (local/global program context,
scheduling)
Memory resource (demand paging, address space
extent)
Metrics
Other possibilities Fault tolerance/Membership
services
Re-visit where we insert boundaries (e.g.
boundary between kernel and user-level code)

8
Scheduling Is An Overloaded Word

Spatial Scheduling
Assign processes to nodes
For example, batch schedulers gang-schedulers
Coarse grain view of work to be done
Temporal Scheduling
For example, native operating system scheduling
Fine grain view of work to be done (e.g.
efficient pthread level scheduling)
Lack necessary global view
Coscheduling

9
The Need for Parallel Aware Scheduling

Even on the most bare-bones operating systems,
there can be more runnable processes than
processors
Many parallel algorithms are extremely sensitive
to serializations
A first order goal is to maximize the overlap of
competing (interfering) processes during a
parallel application.

10
Improving Memory Management

Provide as much memory as possible with as
little pain as possible
Memory systems are becoming more complex
Improved mechanisms to counter false-sharing.

11
Why Demand Paging

External storage (secondary networked) will
continue to exceed local memory
Memory requirements for certain simulations are
almost unbounded
Removing constraints on memory is very desirable,
but the cost of a page-fault is too much to have
hidden from an application
Default process level manager provide page-cache
management as in Stanford DASH.

12
Challenges For A
Virtual Memory Environment

Thought to preclude or make more difficult OS
bypass communications
An application cannot know the amount of physical
memory it has available
An application cannot efficiently control the
contents of the physical memory allocated to it
An application cannot control the read-ahead,
writeback and discarding of pages within its
physical memory.

13
Metrics For Evaluating System Software

An aid for reaching agreement on what we want
A quantitative measure of different approaches
Compared to the scheduler work and the virtual
memory work, may be the most difficult

Introduction
Todays landscape
Directions
Problem Areas Ripe for Investigation
Parallel Aware Scaling
Parallel Aware Memory Management
Metrics for evaluating system software
Why would anyone want to muck with AIX
Bottom-up and Top-down Approaches
Why AIX?
How AIX?
Conclusion

15
Bottom-up Top-down Approaches

Bottom-up
Start with a clean-slate
Add features as the need arises
Settle on a reasonable boundary
Top-down
Start with a full-featured implementation
Remove the unnecessary cruft
Settle on a reasonable boundary

16
Why AIX?

AIX is ubiquitous in supercomputer centers
AIX already has extensive capabilities
Not required to build everything before we try
anything
AIX is mature (read is not in radical change
mode)
AIX scalability (32-way with AIX 5.x)

17
How AIX?

In close conjunction with IBM
Expect successes to payoff in IBM products
Done in an operating system independent manner
Findings apropos and available to other operating
systems
Evaluated with real applications on very large
machines

Introduction
Todays landscape
Directions
Problem Areas Ripe for Investigation
Parallel Aware Scaling
Parallel Aware Memory Management
Metrics for evaluating system software
Why would anyone want to muck with AIX
Bottom-up and Top-down Approaches
Why AIX?
How AIX?
Conclusion

19
Conclusion

New needs arising from todays parallel machines
pose new challenges for system software
Among the key needs which emerge...
Parallel aware scheduling
Improved memory management
Metrics for evaluating operating systems
These can be investigated from a bottom-up
approach, or a top-down approach, or both
AIX is a reasonable choice for a top-down
approach

This work was performed under the auspices of the
U.S. Department of Energy by University
of California Lawrence Livermore National
Laboratory under contract No. W-7405-Eng-48.

Write a Comment

User Comments (0)