Title: LACSI Priorities
1LACSI Priorities StrategiesSystems OverviewFeb
2005Discussion Points
http//lacsi.rice.edu/.../systems_overview.ppt
2This Years Agenda
- Brief overview of FY05 activities.
- Evaluate r.e. long- vs. short-term, research vs
development. - Which will have met their goals?
- Declare victory and move on?
- Transfer to other organizations/funding?
- Which should be rethought?
- Continuations?
- Rethink long-term priorities.
- Reassess technology trends research, industry
- Reassess the needs of LANL, ASC, NNSA,
- Identify a strategy by which LACSI resources can
have the most positive impact. - Funding constraints
- Expertise of participants
- Leverage other projects and funding sources.
3History PS2003 (Planning for FY04)
There were six thrusts
1. Reliability 2. Adaptability 3. Commodity 4.
Compiler/System Interface 5. Advanced
Architectures (WANs)
(6. Systems for Scalable Visualization)
Problems with this approach 1-3 were too
abstract 4-6 were too concrete Identified lots of
interesting problems, but way too many for
available funding. ? Poor match with academic
SOW, LANL work.
4History PS 2004 (for FY05)
- Focus on
- Needs of ASCI HPC at LANL for new systems.
- Items on which this group will have a significant
impact.
5BackgroundThe Future of Cluster Interconnects.
- Commodity networks vs MPP interconnects
- GigE and 10GigE
- Questions of cost and reliability for full
bandwidth interconnects. - Cheap (e.g. Broadcom) NICs being built into
motherboards. - Futures of Quadrics, Myrinet are suspect.
- Infiniband looks viable, but may never achieve
commodity status. - It will be fast.
- Quad data rate 12X Infiniband ? 120Gb/sec
15GB/sec - Much faster than todays memories and I/O busses.
- LACSI Systems Challenge end-point nodes that can
handle very high bandwidth. - Processessors, NICs, and Iterconnect
architectures? - Strawman Very high speed NIC on a cache coherent
bus, where the NIC is a peer of the CPU, e.g.
Hypertransport.
6Messaging \ Reliability
- Open-MPI a successor to LA-MPI, LAM, and FT-MPI
- Improve MPI reliability at all levels.
- Transport
- Failure modes
- Monitoring for reliability and user feedback
- (Integration of performance monitoring/reliability
framework?)
7Network-Messaging \ Performance
- Coordinated activities among all LACSI
networking researchers to address future cluster
interconnects. - Node Architectures
- Interfacing the NICS to the Nodes
- Protocol design and structure
- Assignment of work to the hardware components
- Implementation of protocols for performance
- Assignment of work to NICS (co-procs.) to offload
CPU - Zero-copy, zero-map implementations for latency,
bandwidth, and efficiency.
8Networking-Messaging \ Utility
- Ensure that the messaging layers correctly
implement standards. - Subsetting is tolerable, but correctness is
required. - Tier 1 standards
- MPI
- MPI/IO
- MPI-2
- Tier 2
- Everything else
- Issues
- Defines constraints on useful
networking/messaging activities. - Research vs. Development vs. Deployment tar baby.
9Clustering \ Performance
- Continued efforts on inherent performance of
Clustermatic - Build performance monitoring infrastructure into
Clustermatic systems. - Different from other, e.g. fault, monitoring in
the continuous and pervasive nature of
performance monitoring.
10Clustering \ Reliability
- Address application reliability through support
of compiler-driven (assisted) checkpointing
mechanisms. - Dynamic application reconfiguration
- fault prediction
- Reliability characterization
- HAPI Health API
- Providing drivers for health monitoring sensors
- Administrator/User Level Tools
- Actuators
- fail-over
- Compute nodes
- Master nodes
11Clustering \ Utility
- Improved System Administration through
Clustermatic - Work on tools needed to improve
administrator/user productivity - Improvements in the Single System Image
- Scripting in SSI vs. pile of workstations
models. - File system Issues.
- Private namespaces, the V9 FS
- Programming Models and Runtime Systems.
- The right HLLs for performance and productivity.
- Systems section or separate compilers section?
12What was missing from the draft?
- WAN activities IP for High Bandwidh High
Latency networks. - Good progress being made by Feng et al.
- Work was added after the PS meeting.
13FY05 Projects on the Academic SOW
- ? Project and task definitions tailored to 1-year
contract cycle - Efficient, Portable, and Scalable Support for MPI
Messaging - Scott Rixner, Alan Cox
- Operating System Issues Related to Scalability
- Arthur B. Maccabe, Patrick G. Bridges
- Scalability of TCP
- Application Impact of Fault-handling Placement
- Infiniband Testbed
- OpenMPI
- Jack Dongarra
- Highly Scalable Fault Tolerance
- Dan Reed, Kevin Gamiel
- Clustermatic Performance Instrumentation
- Rob Fowler, Patrick Bridges, John Mellor-Crummey
14FY06 Issues Whence MPI?
- Status and future of MPI extensions Open-MPI
- Fault tolerance
- Performance
- Development vs. research issues.
- Alternatives and successors?
15FY06 Issues The Runtime Software Stack
- Kernel Issues
- Linux vs alternatives (K42, Plan9, BSDvariants,
etc.) on clusters. - Flexible, adaptive, rich, general purpose
execution environmentvs. small, fast,
surveyable/controllable special purpose env. - Other
- File Systems
- Communication interfaces, models, drivers
- Beyond bproc SSI on non-Linux systems
- Rethink everything?
16FY06 Issues System Management
- Health monitoring, reporting, actuators, etc.
- Fundamental research to predict future behavior.
- Actuators Reconfiguration, checkpointing, etc.
- Improving the management interface.
- Eclipse parallel tools
- Development vs. research?
- Long-term vs. short-term?
17FY06 Issues Performance Instrumentation.
- Processor chips and systems are becoming more
difficult to understand. - Multi-issue, out-of-order processors with memory
parallelism are difficult enough. - New chips/systems will have hardware
multithreading (shared pipes and other CPU
resources), multiple-cores, more complex memory
systems, other shared resources. - Hardware instrumentation will necessarily support
- Activity views The performance story of one
thread, where a thread may visit multiple
resources at hardware speeds - Resource views Measure the cost of contention
for shared resources, attribute the costs in a
useful way to activities - Tracing time series of activity of one activity
or resource - Profiling spatial view of classes of
activities and resources - Vendors will not implement instrumentation unless
theres a business case. - Important customers need to demonstrate demand.
- Software needed to justify hardware investment.
- Counters Workshop at HPCA-11 is a step.
18Slide 396