Crystal Ball Panel - PowerPoint PPT Presentation

About This Presentation
Title:

Crystal Ball Panel

Description:

Crystal Ball Panel. ORNL Heterogeneous Distributed Computing Research. Al Geist. ORNL ... Collectively (with labs, NSF centers, and industry) define standard ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 9
Provided by: alge9
Category:
Tags: ball | crystal | geist | panel

less

Transcript and Presenter's Notes

Title: Crystal Ball Panel


1
Crystal Ball Panel
ORNL Heterogeneous Distributed Computing Research
Al Geist ORNL March 6, 2003
2
Look into the Future
ORNL Heterogeneous Distributed Computing Research
Federated Tera-clusters
Petascale systems
Reply Hazy Try Again
Adaptable software
HPC Linux
Fault Tolerance
High performance I/O
Eight Ball
3
Scalable Systems Software for Terascale Centers
ORNL Heterogeneous Distributed Computing Research
IBM Cray Intel Unlimited Scale
ORNL ANL LBNL PNNL
SNL LANL Ames
NCSA PSC SDSC
Collectively (with labs, NSF centers, and
industry) define standard interfaces between
systems components for interoperability
Goal
Create scalable, standardized management tools
for efficiently running our large computing
centers
Part of the DOE SciDAC effort
www.scidac.org/ScalableSystems
4
Progress so far on Integrated Suite
Working Components and Interfaces (bold)
Grid Interfaces
Meta Scheduler
Meta Monitor
Meta Manager
Meta Services
Accounting
Scheduler
System Job Monitor
Node State Manager
Service Directory
Standard XML interfaces
Node Configuration Build Manager
authentication communication
Event Manager
Important!
Allocation Management
Usage Reports
Validation Testing
Process Manager
Job Queue Manager
Components written in any mixture of C, C,
Java, Perl, and Python
Hardware Infrastructure Manager
Checkpoint / Restart
5
Underneath it all
ORNL Heterogeneous Distributed Computing Research
Rogue OS and/or daemons cited as problem by
existing computer centers
Single System Img Adaptive O/S Asymmetric
Kernels A scalable file system
Scalable High Performance OS
What will it be? Linux Lightweight kernel (like
Red, BG/L) Scyld approach Other?
Fast-OS effort
6
Scale up and Fall Down
ORNL Heterogeneous Distributed Computing Research
Fault Tolerance serious issue when scaling to 100
TF and beyond RAS critical
Checkpointing eventually becomes ineffective
Need a Fault Tolerance Overhaul
Needs Adaptive runtime MPI Fault Tolerance New
FT paradigms
7
ORNL Heterogeneous Distributed Computing Research
General Purpose vs Simple and Custom
Software Minimum OS w/ High performance but
limited app support Full OS Tuned to hardware
adapt on the fly Autonomic algorithms
Hardware Customized clusters for each
group Centralized general purpose
machine Internet in a box Or out of the box
8
Big Science
ORNL Heterogeneous Distributed Computing Research
The final word - dont lose track of why we
justify petascale systems
Science will ultimately be driven by
computation, simulation and modeling.
Science drivers are key to success in HPC and
visa versa
Write a Comment
User Comments (0)
About PowerShow.com