Leveraging Standard Core Technologies to Programmatically Build Linux Cluster Appliances - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Leveraging Standard Core Technologies to Programmatically Build Linux Cluster Appliances

Description:

Leveraging Standard Core Technologies to Programmatically Build Linux Cluster ... Different chipset revisions. Chipset of the day (e.g. Linksys Ethernet cards) ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 35
Provided by: Mason71
Category:

less

Transcript and Presenter's Notes

Title: Leveraging Standard Core Technologies to Programmatically Build Linux Cluster Appliances


1
Leveraging Standard Core Technologies to
Programmatically Build Linux Cluster Appliances
  • University of Zurich
  • May 5, 2003

2
Outline
  • Problem definition
  • What is so hard about clusters?
  • Distinction between
  • Software Packages (bits)
  • System Configuration (functionality and state)
  • Programmatic software installation with
  • XML, SQL, HTTP, Kickstart
  • Future Work

3
Build this cluster
  • Build a 128 node cluster
  • Known configuration
  • Consistent configuration
  • Repeatable configuration
  • Do this in an afternoon
  • Problems
  • How to install software?
  • How to configure software?
  • We manage clusters with (re)installation
  • So we care a lot about this problem
  • Other strategies still must solve this

4
The Myth of the Homogeneous COTS Cluster
  • Hardware is not homogeneous
  • Different chipset revisions
  • Chipset of the day (e.g. Linksys Ethernet cards)
  • Different disk sizes (e.g. changing sector sizes)
  • Vendors do not know this is happening!
  • Entropy happens
  • Hardware components fail
  • Cannot replace with the same components past a
    single Moore cycle
  • A Cluster is not just compute nodes (appliances)
  • Fileserver Nodes
  • Management Nodes
  • Login Nodes

5
What Heterogeneity Means
  • Hardware
  • Cannot blindly replicate machine software
  • AKA system imaging / disk cloning
  • Requires patching the system after cloning
  • Need to manage system software at a higher level
  • Software
  • Subsets of a cluster have unique software
    configuration
  • One golden image cannot build a cluster
  • Multiple images replicate common configuration
  • Need to manage system software at a higher level

6
Description Based Software Installation
7
Packages vs. Configuration
Collection of all possible software
packages (AKA Distribution)
Descriptive information to configure a node
Kickstart file
RPMs
Appliances
Compute Node
IO Server
Web Server
8
Software Packages
Collection of all possible software
packages (AKA Distribution)
Descriptive information to configure a node
Kickstart file
RPMs
Appliances
Compute Node
IO Server
Web Server
9
System Configuration
Collection of all possible software
packages (AKA Distribution)
Descriptive information to configure a node
Kickstart file
RPMs
Appliances
Compute Node
IO Server
Web Server
10
What is a Kickstart File?
  • Setup Packages (20)
  • cdrom
  • zerombr yes
  • bootloader --location mbr --useLilo
  • skipx
  • auth --useshadow --enablemd5
  • clearpart --all
  • part /boot --size 128
  • part swap --size 128
  • part / --size 4096
  • part /export --size 1 --grow
  • lang en_US
  • langsupport --default en_US
  • keyboard us
  • mouse genericps/2
  • timezone --utc GMT
  • rootpw --iscrypted nrDq4Vb42jjQ.
  • text
  • install
  • Post Configuration (80)
  • post
  • cat gt /etc/nsswitch.conf ltlt 'EOF'
  • passwd files
  • shadow files
  • group files
  • hosts files dns
  • bootparams files
  • ethers files
  • EOF
  • cat gt /etc/ntp.conf ltlt 'EOF'
  • server ntp.ucsd.edu
  • server 127.127.1.1
  • fudge 127.127.1.1 stratum 10
  • authenticate no
  • driftfile /etc/ntp/drift
  • EOF

11
Issues
  • High level description of software installation
  • List of packages (RPMs)
  • System configuration (network, disk, accounts, )
  • Post installation scripts
  • De facto standard for Linux
  • Single ASCII file
  • Simple, clean, and portable
  • Installer can handle simple hardware differences
  • Monolithic
  • No macro language (as of RedHat 7.3 this is
    changing)
  • Differences require forking (and code
    replication)
  • Cut-and-Paste is not a code re-use model

12
XML Kickstart
13
It looks something like this
14
Implementation
  • Nodes
  • Single purpose modules
  • Kickstart file snippets (XML tags map to
    kickstart commands)
  • Over 100 node files in Rocks
  • Graph
  • Defines interconnections for nodes
  • Think OOP or dependencies (class, include)
  • A single default graph in Rocks
  • Macros
  • SQL Database holds site and node specific state
  • Node files may contain ltvar namestate/gt tags

15
Composition
  • Aggregate Functionality
  • Scripting
  • IsA perl-development
  • IsA python-development
  • IsA tcl-development

16
Functional Differences
  • Specify only the deltas
  • Desktop IsA
  • Standalone
  • Laptop IsA
  • Standalone
  • Pcmcia

17
Architecture Differences
  • Conditional inheritance
  • Annotate edges with target architectures
  • if i386
  • Base IsA lilo
  • if ia64
  • Base IsA elilo

18
Putting it all together
- Complete Appliances (compute, NFS, frontend,
desktop, )
- Some key shared configuration nodes
(slave-node, node, base)
19
Sample Node File
lt?xml version"1.0" standalone"no"?gt lt!DOCTYPE
kickstart SYSTEM "_at_KICKSTART_DTD_at_" lt!ENTITY ssh
"openssh"gtgt ltkickstartgt ltdescriptiongt Enable
SSH lt/descriptiongt ltpackagegtsshlt/packagegt
ltpackagegtssh-clientslt/packagegt ltpackagegtssh-s
erverlt/packagegt ltpackagegtssh-askpasslt/packagegt
ltpostgt cat gt /etc/ssh/ssh_config ltlt
'EOF lt!-- default client setup --gt Host
ForwardX11 yes ForwardAgent
yes EOF chmod orx /root mkdir /root/.ssh chmod
orx /root/.ssh lt/postgt lt/kickstartgtgt
20
Sample Graph File
lt?xml version"1.0" standalone"no"?gt lt!DOCTYPE
kickstart SYSTEM "_at_GRAPH_DTD_at_"gt ltgraphgt ltdescrip
tiongt Default Graph for NPACI Rocks. lt/descripti
ongt ltedge from"base" to"scripting"/gt ltedge
from"base" to"ssh"/gt ltedge from"base"
to"ssl"/gt ltedge from"base" to"lilo"
arch"i386"/gt ltedge from"base" to"elilo"
arch"ia64"/gt ltedge from"node" to"base"
weight"80"/gt ltedge from"node"
to"accounting"/gt ltedge from"slave-node"
to"node"/gt ltedge from"slave-node"
to"nis-client"/gt ltedge from"slave-node"
to"autofs-client"/gt ltedge from"slave-node"
to"dhcp-client"/gt ltedge from"slave-node"
to"snmp-server"/gt ltedge from"slave-node"
to"node-certs"/gt ltedge from"compute"
to"slave-node"/gt ltedge from"compute"
to"usher-server"/gt ltedge from"master-node"
to"node"/gt ltedge from"master-node"
to"x11"/gt ltedge from"master-node"
to"usher-client"/gt lt/graphgt
21
Cluster SQL Database
22
Nodes and Groups
Nodes Table
Memberships Table
23
Groups and Appliances
Memberships Table
Appliances Table
24
Simple key - value pairs
  • Used to configure DHCP and to customize appliance
    kickstart files

25
Putting it together
26
Space-Time and HTTP
Node Appliances
Frontends/Servers
DHCP
IP Kickstart URL
Kickstart RQST
Generate File
kpp
SQL DB
Request Package
Serve Packages
kgen
Install Package
  • HTTP
  • Kickstart URL (Generator) can be anywhere
  • Package Server can be (a different) anywhere

Post Config
Reboot
27
Practice
28
256 Node Scaling
  • Attempt a TOP 500 Run on a two fused 128 node
    PIII (1GHz, 1GB mem) clusters
  • 100 Mbit ethernet, Gigabit to frontend.
  • Myrinet 2000. 128 port switch on each cluster
  • Questions
  • What LINPACK performance could we get?
  • Would Rocks scale to 256 nodes?
  • Could we set up/teardown and run benchmarks in
    the allotted 48 hours?
  • SDSCs Teragrid Itanium2 system is about this size

29
Setup
New Frontend
8 Cross Connects (Myrinet)
128 nodes (120 on Myrinet)
128 nodes (120 on Myrinet)
  • Fri Night Built new frontend. Physical rewiring
    of Myrinet, added Ethernet switch.
  • Sat Initial LINPACK runs, and debugging hardware
    failures, 240 node Myri run.
  • Sun Submitted 256 Ethernet run, re-partitioned
    clusters, complete re-installation (40 min)

30
Some Results
240 Dual PIII (1Ghz, 1GB) - Myrinet
  • 285 GFlops
  • 59.5 Peak
  • Over 22 hours of continuous computing

31
Installation, Reboot, Performance
  • lt 15 minutes to reinstall 32 node subcluster
    (rebuilt myri driver)
  • 2.3min for 128 node reboot

32 Node Re-Install
Start
Finsish
Reboot
Start HPL
32
Future Work
  • Other backend targets
  • Solaris Jumpstart
  • Windows Installation
  • Supporting on-the-fly system patching
  • Cfengine approach
  • But using the XML graph for programmability
  • Traversal order
  • Subtleties with order of evaluation for XML nodes
  • Ordering requirements ! Code reuse requirements
  • Dynamic cluster re-configuration
  • Node re-targets appliance type according to
    system need
  • Autonomous clusters?

33
Summary
  • Installation/Customization is done in a
    straightforward programmatic way
  • Leverages existing standard technologies
  • Scaling is excellent
  • HTTP is used as a transport for
    reliability/performance
  • Configuration Server does not have to be in the
    cluster
  • Package Server does not have to be in the cluster
  • (Sounds grid-like)

34
www.rocksclusters.org
Write a Comment
User Comments (0)
About PowerShow.com