Palacios and Kitten: New High Performance Operating Systems For Scalable Virtualized and Native Supercomputing - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Palacios and Kitten: New High Performance Operating Systems For Scalable Virtualized and Native Supercomputing

Description:

Palacios and Kitten: New High Performance Operating Systems For Scalable Virtualized and Native Supercomputing John R. Lange and Kevin Pedretti – PowerPoint PPT presentation

Number of Views:221
Avg rating:3.0/5.0
Slides: 23
Provided by: JackL63
Category:

less

Transcript and Presenter's Notes

Title: Palacios and Kitten: New High Performance Operating Systems For Scalable Virtualized and Native Supercomputing


1
Palacios and Kitten New High Performance
Operating Systems ForScalable Virtualized and
Native Supercomputing
  • John R. Lange and Kevin Pedretti
  • Trammell Hudson, Peter Dinda,
  • Zheng Cui, Lei Xia,
  • Patrick Bridges, Andy Gocke,
  • Steven Jaconette,
  • Mike Levenhagen and Ron Brightwell
  • Northwestern University
  • Sandia National Labs
  • University of New Mexico

2
Summary
  • Palacios
  • First VMM for scalable HPC
  • Open Source and available
  • Kitten
  • First open source Lightweight Kernel for High
    Performance Computing (HPC)
  • Open Source and available
  • Proved HPC virtualization is effective at scale
  • Performance within 5 of native
  • Largest scale study of virtualization

3
What is a virtual machine?
  • Run an OS as an application
  • Run multiple OS environments on a single machine
  • Start, stop, pause
  • Can easily move entire OS environments

Page Tables CPU state Hardware
Application
Application
Application
Guest
Application
Guest OS
Guest OS
Guest OS
OS
Host OS/VMM
Page Tables CPU state Hardware
VMM
Emulate
Hardware
Hardware
Hardware
4
What are VMMs currently used for?
  • Server Consolidation
  • Fault tolerance
  • Legacy application support
  • Debugging
  • Isolation
  • Virtual appliances
  • Failover and disaster recovery
  • Market size
  • 2007 5.5 billion
  • 2011 11.7 billion

7.58 Billion
16.70 Billion
5
High Performance Computing (HPC)
  • Large scale simulations to solve Big Problems

6
Virtualization in HPC
  • Fault tolerance
  • RedStorm MTBI target 50 hours
  • RedStorm Min TTR 30 minutes 1 hour
  • Broader usage
  • Allow applications to select best OS
  • Only if it doesnt degrade performance
  • Tightly coupled parallel applications
  • Very large scale

A.B. Nagarajan, F. Mueller, C. Engelmann, and
S.L. Scott Proactive Fault Tolerance for HPC with
Xen Virtualization ICS 2007
7
Palacios VMM
  • OS-independent embeddable virtual machine monitor
  • Developed at Northwestern and University of New
    Mexico
  • Open source and freely available
  • Downloaded over 1000 times as of July 2009
  • Users
  • Kitten Lightweight supercomputing OS from Sandia
    National Labs
  • MINIX 3
  • Modified Linux versions
  • Successfully used on supercomputers, clusters
    (Infiniband and Ethernet), and servers

http//www.v3vee.org/palacios
8
Palacios as an HPC VMM
  • Minimalist interface
  • Suitable for an LWK
  • Compile and runtime configurability
  • Create a VMM tailored to specific environments
  • Low noise
  • Contiguous memory pre-allocation
  • Passthrough resources and resource partitioning

9
Lightweight Kernel Timeline
1991 Sandia/UNM OS (SUNMOS), nCube-2 1991
Linux 0.02 1993 SUNMOS ported to Intel Paragon
(1800 nodes)? 1993 SUNMOS experience used to
design Puma First implementation of Portals
communication architecture 1994 Linux 1.0 1995
Puma ported to ASCI Red (4700 nodes)? Renamed
Cougar, productized by Intel 1997 Stripped down
Linux used on Cplant (2000 nodes)? Difficult to
port Puma to COTS Alpha server Included Portals
API 2002 Cougar ported to ASC Red Storm (13000
nodes)? Renamed Catamount, productized by
Cray Host and NIC-based Portals
implementations 2004 IBM develops LWK (CNK) for
BG/L/P (106000 nodes)? 2005 IBM ETI develop
LWK (C64) for Cyclops64 (160 cores/die)?
10
Kitten An Open Source LWK
  • Better match for user expectations
  • Provides mostly Linux-compatible user environment
  • Including threading
  • Supports unmodified compiler toolchains and ELF
    executables
  • Better match vendor expectations
  • Modern code-base with familiar Linux-like
    organization
  • Drop-in compatible with Linux
  • Infiniband support
  • End-goal is deployment on future capability system

http//software.sandia.gov/trac/kitten
11
Complexity
  • Scalable HPC performance requires minimal
    overhead

Component Lines of code
Kitten 33,000
Palacios 28,000
Total 61,000
Xen 580k lines (50k 80k core)
KVM 50k-60k lines Kernel dependencies
(??) User level devices (180k)
12
HPC Performance Evaluation
  • Virtualization is very useful for HPC, but
  • Only if it doesnt hurt performance
  • Virtualized RedStorm with Palacios
  • Evaluated with Sandias system evaluation
    benchmarks

17th fastest supercomputer Cray XT3 38208
cores 3500 sq ft 2.5 MegaWatts 90 million
13
Virtualized performance(Catamount)
Within 5
Scalable
HPCCG conjugant gradient solver
14
Comparison of Operating Systems
Shadow Paging
Catamount
Compute Node Linux
HPCCG conjugant gradient solver
15
Comparison of Operating Systems
Catamount
Compute Node Linux
CTH multi-material, large deformation, strong
shockwave simulation
16
Large Scale Study
  • Evaluation on full RedStorm system
  • 12 hours of dedicated system time on full machine
  • Largest virtualization performance scaling study
    to date
  • Measured performance at exponentially increasing
    scales
  • Up to 4096 nodes
  • Publicity
  • New York Times
  • Slashdot
  • HPCWire
  • Communications of the ACM
  • PC World

17
Scalability at Large Scale(Catamount)
Within 3
Scalable
CTH multi-material, large deformation, strong
shockwave simulation
18
Commodity Systems
  • Kitten and Palacios fully support commodity
    systems
  • Infiniband clusters
  • Ethernet servers
  • Generic PC hardware
  • Palacios embeddable in many OSes
  • Kitten
  • MINIX 3
  • Linux
  • GeekOS

19
Infiniband on Commodity Linux
(Linux guest on IB cluster)
2 node Infiniband Ping Pong bandwidth measurement
20
Summary
  • Virtualization can scale
  • Near native performance for optimized VMM/guest
    (within 5)
  • VMM needs to know about guest internals
  • Should modify behavior for each guest environment
  • Example Paging method to use depends on guest
  • Black Box inference is not desirable in HPC
    environment
  • Unacceptable performance overhead
  • Convergence time
  • Mistakes have large consequences
  • Need guest cooperation
  • Guest and VMM relationship should be symbiotic
  • Paper forthcoming (4096 scaling results and
    techniques)

21
Future Work
  • Continue exploring virtualization in HPC
  • NU, UNM and SNL collaboration
  • Granted 5 million hours on Jaguar
  • Current fastest supercomputer in the world

Oak Ridge National Labs Cray XT5 224,256
cores 4352 sq. ft 6.95 MegaWatts 104 million
22
Conclusion
  • Palacios and Kitten
  • Two open source tools for HPC
  • Proved virtualization of HPC systems can scale
  • Contributions Welcome!!
  • http//www.v3vee.org
  • http//software.sandia.gov/trac/kitten
Write a Comment
User Comments (0)
About PowerShow.com