NPACI Rocks - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

NPACI Rocks

Description:

Linux Competency Centre in SCS Enterprise Systems Pte Ltd in Singapore. Provide PVFS Support. Working on user documentation for Rocks 2.2. What sets us apart ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 30
Provided by: Mason71
Category:
Tags: npaci | rocks

less

Transcript and Presenter's Notes

Title: NPACI Rocks


1
NPACI Rocks
  • Mason Katz
  • San Diego Supercomputer Center

2
Who is NPACI Rocks?
  • Cluster Computing Group at SDSC
  • UC Berkeley Millennium Project
  • Provide Ganglia Support
  • Linux Competency Centre in SCS Enterprise Systems
    Pte Ltd in Singapore
  • Provide PVFS Support
  • Working on user documentation for Rocks 2.2

3
What sets us apart
  • Fully automated cluster deployment
  • Get and burn ISO CD image from Rocks.npaci.edu
  • Fill-out form to build initial kickstart file for
    your first front-end machine
  • Kickstart naked frontend with CD and kickstart
    file
  • Reboot frontend machine
  • Integrate compute nodes with Insert Ethers
  • Ready to go!
  • Complete out of the box solution with rational
    default settings

4
Who is Using It?
  • Growing (and partial) list of users that we know
    about
  • SDSC, SIO, UCSD (8 Clusters, including CMS
    (GriPhyN) prototype)
  • Caltech
  • Burnham Cancer Institute
  • PNNL (several clusters, small, medium, large)
  • University of Texas
  • University of North Texas
  • Northwestern University
  • University of Hong Kong
  • Compaq (Working relationship with their Intel
    Standard Servers Group)
  • Singapore Bioinformatics Institute
  • Myricom (Their internal development cluster)
  • Cray (partnered with DELL)

5
Motivation
  • Care and feeding for a system isnt fun.
  • Enable non-cluster experts to run clusters
  • Essential to track software updates
  • Open source moves fast!
  • On the order of 3 updates a week for Red Hat
  • Essential to track Red Hat releases
  • Feature rot
  • Unplugged security holes
  • Run on heterogeneous, standard high-volume
    components

6
Philosophy
  • All nodes are 100 automatically installed
  • Zero hand configuration
  • Scales very well
  • NPACI Rocks is an entire cluster-aware
    distribution
  • Included packages
  • Full Red Hat release
  • De-facto standard cluster packages (MPI,
    PBS/Maui, etc.)
  • NPACI Rocks packages

7
More Philosophy
  • Use installation as common mechanism to manage
    cluster
  • Install when
  • Initial bring-up
  • Replacing a dead node
  • Adding new nodes
  • Also use installation to keep software consistent
  • If you catch yourself wondering if a nodes
    software is up-to-date, reinstall!
  • In 10 minutes, all doubt is erased.

8
Basic Architecture
Front-end Node(s)
Power Distribution (Net addressable units as
option)
Public Ethernet
Fast-Ethernet Switching Complex
Gigabit Network Switching Complex
9
Major Components
10
Configuration Derived from Database
Automated node discovery
Node 0
mySQL DB
insert-ethers
Node 1
makehosts
makedhcp
pbs-config-sql
Node N
/etc/hosts
/etc/dhcpd.conf
pbs node list
11
Key Tables - Nodes
12
Software Installation
Collection of all possible software
packages (AKA Distribution)
Descriptive information to configure a node
Kickstart file
RPMs
Appliances
Compute Node
IO Server
Web Server
13
Software Repository
Collection of all possible software
packages (AKA Distribution)
Descriptive information to configure a node
Kickstart file
RPMs
Appliances
Compute Node
IO Server
Web Server
14
Rocks-dist
  • Distribution builder
  • Rocks
  • Red Hat
  • Same thing
  • Version Manager
  • Resolves software updates
  • Default to the most recent software
  • Can force package versions as needed
  • Distribution versioning
  • Allows multiple distributions at once
  • CDROM building
  • Build your own bootable Rocks CD

15
How we use rocks-dist
16
How you use rocks-dist
17
Inheritance
  • Rocks
  • Red Hat plus updates
  • Rocks software
  • Campus
  • Rocks software
  • Campus changes
  • Cluster
  • Campus Rocks

18
Installation Instructions
Collection of all possible software
packages (AKA Distribution)
Descriptive information to configure a node
Kickstart file
RPMs
Appliances
Compute Node
IO Server
Web Server
19
Kickstart
  • Description based installation
  • Manage software components not the bits on the
    disk
  • Only way to deal with heterogeneous hardware
  • System imaging (aka bit blasting) relies on
    homogeneity
  • Homogenous clusters do not exist
  • Red Hats Kickstart
  • Flat ASCII file
  • No macro language
  • Requires forking based on site information and
    node type
  • Rocks XML Kickstart
  • Decompose a kickstart file into nodes and graphs
  • Macros and SQL for site configuration
  • Driven from web cgi script

20
XML Kickstart
  • Nodes
  • Describe a single set of functionality
  • Ssh
  • Apache
  • Kickstart file snippets (XML tags map to
    kickstart commands)
  • Pull site configuration from SQL Database
  • Over 80 node files in Rocks
  • Graph
  • Defines interconnections for nodes
  • Think OOP or dependencies
  • A single graph file in Rocks
  • Graph Nodes SQL gt Node specific kickstart
    file

21
Sample Node File
lt?xml version"1.0" standalone"no"?gt lt!DOCTYPE
kickstart SYSTEM "_at_KICKSTART_DTD_at_" lt!ENTITY ssh
"openssh"gtgt ltkickstartgt ltdescriptiongt Enable
SSH lt/descriptiongt ltpackagegtsshlt/packagegt
ltpackagegtssh-clientslt/packagegt ltpackagegtssh-s
erverlt/packagegt ltpackagegtssh-askpasslt/packagegt
ltpostgt cat gt /etc/ssh/ssh_config ltlt
'EOF lt!-- default client setup --gt Host
ForwardX11 yes ForwardAgent
yes EOF chmod orx /root mkdir /root/.ssh chmod
orx /root/.ssh lt/postgt lt/kickstartgtgt
22
Sample Graph File
lt?xml version"1.0" standalone"no"?gt lt!DOCTYPE
kickstart SYSTEM "_at_GRAPH_DTD_at_"gt ltgraphgt ltdescrip
tiongt Default Graph for NPACI Rocks. lt/descripti
ongt ltedge from"base" to"scripting"/gt ltedge
from"base" to"ssh"/gt ltedge from"base"
to"ssl"/gt ltedge from"base" to"lilo"
arch"i386"/gt ltedge from"base" to"elilo"
arch"ia64"/gt ltedge from"node" to"base"
weight"80"/gt ltedge from"node"
to"accounting"/gt ltedge from"slave-node"
to"node"/gt ltedge from"slave-node"
to"nis-client"/gt ltedge from"slave-node"
to"autofs-client"/gt ltedge from"slave-node"
to"dhcp-client"/gt ltedge from"slave-node"
to"snmp-server"/gt ltedge from"slave-node"
to"node-certs"/gt ltedge from"compute"
to"slave-node"/gt ltedge from"compute"
to"usher-server"/gt ltedge from"master-node"
to"node"/gt ltedge from"master-node"
to"x11"/gt ltedge from"master-node"
to"usher-client"/gt lt/graphgt
23
Kickstart framework
24
Composition
  • Aggregate FunctionalityScripting
  • IsA perl-development
  • IsA python-development
  • IsA tcl-development

25
Minor Differences
  • Specify only the deltas
  • Desktop IsA
  • Standalone
  • Laptop IsA
  • Standalone
  • Pcmcia

26
Architecture
  • Conditional inheritance
  • Annotate edges with target architects

27
Payoff Never before seen hardware
  • Dual Athlon, White box, 20 GB IDE, 3Com Ethernet
  • 300 PM In cardboard box
  • Shook out the loose screws
  • Dropped in a Myrinet card
  • Inserted it into cabinet 0
  • Cabled it up
  • 325 PM Inserted the NPACI Rocks CD
  • Ran insert-ethers (assigned node name
    compute-0-24)
  • 340 PM Ran Linpack

28
Futures
  • Improve Monitoring, debugging, self-diagnosis of
    cluster-specific software
  • Improve documentation!
  • Continue Tracking RedHat updates/releases
  • Prepare for Infiniband Interconnect
  • Global file systems, I/O is an Achilles heel of
    clusters
  • Grid Tools (Development and Testing)
  • Globus
  • Grid research tools (APST)
  • GridPort toolkit
  • Integration with other SDSC projects
  • SRB
  • MiX - data mediation
  • Visualization Cluster - Display Wall

29
Summary
  • Rocks significantly lowers the bar for users to
    deploy usable compute clusters
  • Very simple hardware assumptions
  • XML module descriptions allows encapsulation
  • Graph interconnection allows appliances to share
    configuration
  • Deltas among appliances easily visualize
  • HTTP transport scalable in
  • Performance
  • Distance
Write a Comment
User Comments (0)
About PowerShow.com