P1253814501UhWry - PowerPoint PPT Presentation

1 / 157
About This Presentation
Title:

P1253814501UhWry

Description:

Ottawa Linux Symposium (OLS'02) June 29, 2002 ... Project Overview. Oak Ridge National Laboratory -- U.S. Department of Energy 5. What does it do? ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 158
Provided by: thomasnau
Category:

less

Transcript and Presenter's Notes

Title: P1253814501UhWry


1
Clustering with OSCAR
Ottawa Linux Symposium (OLS02)

June 29, 2002
Thomas Naughton naughtont_at_ornl.gov Oak Ridge
National Laboratory
2
Also presenting today
Sean Dague IBM (SIS) Brian Luethke ORNL
(C3) Steve DuChene BGS (Ganglia)
3
OSCAR Open Source Cluster Application Resources
  • Snapshot of best known methods for building,
  • programming, and using clusters.
  • Consortium of academic/research industry
  • members.

4
Project Overview
5
What does it do?
  • Wizard based cluster software installation (OS
    environment)
  • Automatically configures cluster components
  • Increases consistency among cluster builds
  • Reduces time to build/install a cluster
  • Reduces need for expertise

6
Functional Areas
  • cluster installation
  • programming environment
  • workload management
  • security
  • administration
  • maintenance
  • documentation
  • packaging

7
OCG/OSCAR Background
8
History Organization
  • What is Open Cluster Group (OCG)?
  • How is OSCAR related to OCG?
  • When was it started?
  • Why was it started?
  • What is the industry / academic/research facet?

9
OSCAR Members
  • Dell
  • IBM
  • Intel
  • MSC.Software
  • Bald Guy Software
  • Silicon Graphics, Inc.
  • Indiana University
  • Lawrence Livermore National Lab
  • NCSA
  • Oak Ridge National Lab

blue denotes 2002 core members
10
Software releases
oscar-1.0 RedHat 6.2 Apr 2001
oscar-1.1 RedHat 7.1 Jul 2001
oscar-1.2b RedHat 7.1 Jan 2002
oscar-1.2.1 RedHat 7.1 Feb 2002
oscar-1.2.1rh72 RedHat 7.2 Apr 2002
oscar-1.3beta RH 7.1/7.2, MDK 8.2 Jun 2002
Early LUI Based
Latest SIS based
NOTE
11
Installation Overview
12
Assumptions Requirements
  • Currently assume a single headnode, multiple
    compute nodes configuration.
  • User is able to install RedHat Linux with X
    Window support and setup the network for this
    machine (headnode).
  • Currently only support single Ethernet (eth0) in
    compute nodes.
  • Selected RedHat for current version, design is
    to be distribution agnostic.

13
An OSCAR Cluster
  • Installed and configured items
  • Head node services, e.g. DHCP, NFS
  • Internal cluster networking configured
  • SIS bootstraps compute-node installation, OS
    installed via network (PXE) or floppy boot
  • OpenSSH/OpenSSL configured
  • C3 power tools setup
  • OpenPBS and MAUI installed and configured
  • Install message passing libs LAM/MPI, MPICH, PVM
  • Env-Switcher/Modules installed and defaults setup

14
OSCAR 1.3
  • Continue to use SIS (replaced LUI in v1.2)
  • Add Drop-in package support
  • Supports Add/Del node
  • Supports RH 7.1,7.2, MDK 8.2, and Itanium
  • Add Env-Switcher/Modules
  • Add Ganglia
  • Update packages C3, LAM/MPI, MPICH, OpenPBS,
    OpenSSH/SSL, PVM

15
OSCAR 1.3 base pkgs
Package Name Version
SIS 0.90-1/2.1.3oscar-1/1.25-1
C3 3.1
OpenPBS 2.2p11
MAUI 3.0.6p9
LAM/MPI 6.5.6
MPICH 1.2.4
PVM 3.4.46
Ganglia 2.2.3
Env-switcher/modules 1.0.4/3.1.6
16
Virtual OSCAR Install
17
Step 0
  • Install RedHat on head node (see also next
    slide)
  • Include X Window support
  • Configure external/internal networking (eth0,
    eth1)
  • Create RPM directory, and copy RPMs from CD(s)
  • Download OSCAR
  • Available at http//oscar.sourceforge.net/
  • Extract the tarball (see also next slide)
  • Print/Read document
  • Run wizard (install_cluster ethX) to begin the
    install

18
Step 0.5
  • Installing Headnode (standard or via KickStart,
    etc.)
  • Configure networking/naming (internal/external
    NICs)
  • Reboot, login as root, run following commands
  • Create the RPM dir (must be this path)
  • root_at_headnode root mkdir p /tftpboot/rpm
  • insert RedHat CD1 into drive
  • root_at_headnode root mount /mnt/cdrom
  • root_at_headnode root cp ar /mnt/cdrom/RedHat/RP
    MS/ \
  • gt /tftpboot/rpm
  • wait...wait...wait...
  • root_at_headnode root eject /mnt/cdrom
  • insert RedHat CD2 into drive
  • root_at_headnode root mount /mnt/cdrom
  • root_at_headnode root cp ar /mnt/cdrom/RedHat/RP
    MS/ \
  • gt /tftpboot/rpm
  • wait...wait...wait...

19
Step 0.75
  • root_at_headnode root eject /mnt/cdrom
  • root_at_headnode root cd
  • root_at_headnode root pwd
  • /root
  • root_at_headnode root tar zxf oscar-1.3.tar.gz
  • root_at_headnode root cd oscar-1.3
  • root_at_headnode oscar-1.3 ifconfig
  • look at the output and determine the internal
    interface
  • Ex.
  • eth1 Link encapEthernet Hwaddr
    00A0CC536DF4
  • inet addr10.0.0.55 Bcast10.0.0.255
    Mask255.255.255.0
  • root_at_headnode oscar-1.3 ./install_cluster
    eth1
  • Follow steps in the Install Wizard...

20
Install Wizard Overview
  1. Select default MPI.
  2. Build image per client type (partition layout, HD
    type)
  3. Define clients (network info, image binding)
  4. Setup networking (collect MAC addresses,
    configure DHCP, build boot floppy)
  5. Boot clients / build
  6. Complete setup (post install)
  7. Run test suite
  8. Use cluster

21
OSCAR 1.3 Step-by-Step
  • After untarring, tar zxvf oscar-1.3b3.tar.gz

22
OSCAR 1.3 Step-by-Step
  • NOTE On RedHat 7.2 Upgrade RPM to v4.0.4

23
OSCAR 1.3 Step-by-Step
  • Run the install script, ./install_cluster eth1

24
OSCAR 1.3 Step-by-Step
  • OSCAR Wizard

25
OSCAR 1.3 Step-by-Step
26
OSCAR 1.3 Step-by-Step
27
OSCAR 1.3 Step-by-Step
28
OSCAR 1.3 Step-by-Step
29
OSCAR 1.3 Step-by-Step
30
OSCAR 1.3 Step-by-Step
31
OSCAR 1.3 Step-by-Step
32
OSCAR 1.3 Step-by-Step
33
OSCAR 1.3 Step-by-Step
34
OSCAR 1.3 Step-by-Step
35
OSCAR 1.3 Step-by-Step
36
OSCAR 1.3 Step-by-Step
37
OSCAR 1.3 Step-by-Step
38
OSCAR 1.3 Step-by-Step
39
OSCAR 1.3 Step-by-Step
40
OSCAR 1.3 Step-by-Step
41
OSCAR 1.3 Step-by-Step
PXE capable nodes - select (temporarily) the NIC
as boot device Otherwise use the autoinstall
floppy (not as quick but reliable!)
42
OSCAR 1.3 Step-by-Step
43
OSCAR 1.3 Step-by-Step
44
OSCAR 1.3 Step-by-Step
45
OSCAR 1.3 Step-by-Step
46
OSCAR 1.3 Step-by-Step
47
OSCAR 1.3 Step-by-Step
48
OSCAR 1.3 Step-by-Step
49
OSCAR 1.3 Step-by-Step
50
OSCAR 1.3 Step-by-Step
51
OSCAR 1.3 Step-by-Step
52
OSCAR 1.3 Step-by-Step
53
OSCAR 1.3 Step-by-Step
54
OSCAR 1.3 Step-by-Step
55
OSCAR 1.3 Step-by-Step
56
OSCAR 1.3 Step-by-Step
57
OSCAR 1.3 Step-by-Step
58
OSCAR 1.3 Step-by-Step
59
OSCAR 1.3 Step-by-Step
60
Community Usage
61
OSCAR, over 40,000 customers served!
  • oscar.sourceforge.net
  • 41,046 downloads
  • 121,355 page hits
  • (May 17, 2002 1115am)

62
More OSCAR Stats,
  • Known packages using OSCAR
  • NCSA -in-a-box series
  • MSC.Linux
  • Large Installations
  • LLNL 3-clusters (236-nodes, 472-processors)
  • ORNL 3-clusters (150-nodes, 200-processors)
  • SNS cluster
  • etc

63
MSC.Linux
  • OSCAR based
  • Adds
  • Webmin tool
  • Commercial grade integration and testing

64
Cluster-in-a-Box
  • OSCAR based
  • www.ncsa.uiuc.edu/News/Access/Stories/IAB
  • Cluster-in-a-Box
  • Grid-in-a-Box
  • Display Wall-in-a-Box
  • Access Grid-in-a-Box
  • Presently
  • Cluster-in-a-Box OSCAR
  • Goal to add
  • Myrinet
  • Additional Alliance software
  • IA-64

65
eXtreme TORC powered by OSCAR
  • Major users
  • CSMD - SciDAC SSS scalability research testing
  • Spallation Neutron Source Facility codes for
    neutronics performance, activation analysis,
    shielding analysis, and design engineering
    data-support
  • Genome Analysis and Systems Modeling Genomic
    Integrated Supercomputing Toolkit
  • SciDAC fusion codes
  • CSMD - checkpoint/restart capability for
    out-of-core scalapack dense solvers
  • 65 P4 machines
  • performance peak 129.7 GFLOPS
  • memory 50.152 GB
  • disk 2.68 TB
  • dual interconnects
  • gigabit ethernet
  • fast ethernet

66
OSCARmost popular install management package
  • clusters.top500.org
  • May 17, 2002 11am
  • OSCAR install 30
  • OSCAR
  • MSC.Linux
  • NCSA in-a-box
  • ( Non-Scientific survey)

67
Component Details
68
Cluster Management
  • Cluster Management is not a single service or
    package. Cluster management covers four main
    areas
  • System software mgmt SIS
  • Cluster wide monitoring Ganglia
  • Parallel execution command env C3
  • Power mgmt none presently offered

69
Components Presenters
  • System Installation Suite (SIS)
  • Sean Dague, IBM
  • C3 Cluster Power Tools
  • Brian Luethke, ORNL
  • Ganglia monitoring system
  • Steve DuChene, BGS
  • Env-Switcher
  • Thomas Naughton, ORNL

70
BREAK
71
System Installation Suite (SIS)
Component Presenter Sean Dague, IBM
72
System Installation Suite
Sean Dague sldague_at_us.ibm.com Software
Engineer IBM Linux Technology Center
73
SIS SystemImager LUI
  • SystemImager Image Based Installation and
    Maintenance Tool
  • LUI Resource Based Cluster Installation Tool
  • Projects merged in April of 2001
  • Goals
  • Support all Linux distributions
  • Support a large number of architectures
  • Make it easy to add support for new distro and
    architectures
  • Make it so no one has to solve the massive
    installation issue again
  • (i.e. Do it once, do it right, do it for everyone)


74
System Installation Suite at a glance
75
SIS at a glance described
  • Many different images and versions of images may
    be stored on an Image Server
  • Image can be captured from an existing machine
  • Image can be created from a set of packages
    directly on the Image Server
  • RSYNC is used to propagate the image during
    installation
  • Because RSYNC is used, maintenance is easy done
    (i.e. only changes are pulled across the
    network).
  • Because replication is done at the file level and
    not the package level it is very distribution
    agnostic

76
What does it do for me?
  • System Installation
  • Fast and efficient way to install machines
  • System Maintenance
  • Rsync only propagates the changes between client
    and image
  • File System Migration
  • Image an ext2 machine, image back as ext3 or
    reiserfs (XFS and JFS coming soon)
  • Migrate systems from non-RAID to Software RAID
  • Easy Machine Backup
  • Build replicants of machines

77
Capturing an Image from a Golden Client
  • getimage is the standard SystemImager way of
    capturing an image from a golden-client to the
    image server
  • On the client
  • prepareclient run on client sets up rsyncd on
    client machine
  • On the server
  • getimage rsyncs the image from the client to
    the server
  • mkautoinstallscript creates the autoinstall
    script on the server for the image
  • addclients adds client definitions for the image

78
Creating an Image directly on the Image Server
  • buildimage is the System Installer program to
    create an image directly on the server from an
    RPM list and disk partition file
  • On the server
  • mksiimage builds the base image
  • mksidisk creates disk partition table
    information
  • mkautoinstallscript builds the autoinstall
    script for the image
  • mksimachine creates client definitions for a
    machine
  • System Installer stores all the image and client
    info in a flat file database for other
    applications to utilize

79
Tksis System Installation Suite GUI
  • Perl-Tk GUI for System Installation Suite
    available as the systeminstaller-x11 package
    (still in early stages)
  • Currently only interfaces with System Installer
    buildimage calls (will integrate with
    SystemImager calls in the near future)
  • Provides an easy to use interface for
    installation
  • Component Panels may easily be integrated into
    other Perl based installation tools

80
Installing an image part 1
  • Image can be autoinstalled via diskette, cd, or
    network
  • mkautoinstalldiskette creates autoinstall
    floppy
  • mkautoinstallcd creates autoinstall cd ISO
  • mkbootserver creates PXE autoinstall server
  • Boot steps
  • autoinstall media boots
  • looks for local.cfg (network information) or uses
    dhcp to get ip
  • determines hostname from ip address or local.cfg
  • fetches hostname.sh autoinstall script from
    Image Server

81
Installing an image part 2
  • Autoinstall Steps
  • rsync over any additionally needed utilities
    (mkraid, raidstop, raidstart, mkreiserfs, etc.)
  • partitions disk drives using sfdisk
  • format and mount all filesystems
  • rsync image from Image Server
  • run systemconfigurator to setup networking and
    bootloader
  • unmount all filesystems
  • do specified postinstall actions (one of beep,
    shutdown, or reboot)
  • Autoinstall will dump to a shell if any errors
    are encountered

82
Maintaining a machine
  • Choice 1 Maintain the Image directly
  • Image is a full live filesystem
  • you can chroot into the image
  • compile code in the image
  • run rpm -Uhv newpackage.rpm in the image
  • Choice 2 Maintain the Golden Client
  • Apply hot fixes to the golden client
  • Rerun getimage to recapture the image
  • updateclient resyncs client to image
  • because rsync is used, only the changes between
    image and client are propagated

83
Who's using it?
  • SIS SystemImager
  • All users of SystemImager gt 2.0 are SIS users
  • OSCAR 1.2 uses SIS for installation
  • SCore, Clubmask and other clustering groups
    interested in using SIS for installation
  • SystemImager 2.0 and System Configurator 1.0
    accepted into Debian 3.0 distribution

84
Future Directions
  • System Installer 1.0
  • Debian Package support
  • IA64 arch support
  • SystemImager 2.2
  • devfs clients
  • IA64 arch support
  • SystemImager 2.4
  • PPC, S390, and HPARISC arches
  • JFS and XFS file systems
  • remote logging
  • internal API (allows for Tksis integration)
  • Inclusion in more Linux Distributions
  • Unified GUI for System Installer and SystemImager

85
Questions?
  • System Installation Suite http//sisuite.org
  • SystemImager http//systemimager.org
  • System Installer http//systeminstaller.sf.net
  • OSCAR http//oscar.sf.net
  • Team can be found on sisuite and systemimager
    channels on irc.openprojects.net

86
Cluster Command Control (C3)
Component Presenter Brian Luethke, ORNL
87
C3 Cluster Power ToolsCluster Command Control
Presented by Brian Luethke Brian Luethke, John
Muggler, Thomas Naughton, Stephen Scott
88
Overview
  • command line based
  • single system illusion (SSi) single machine
    interface
  • cluster configuration file
  • ability to rapidly deploy from server software
    and system images
  • command line list option enable subcluster
    management
  • distributed file scatter and gather operations
  • execution of non-interactive commands
  • multiple cluster capability from single entry
    point

89
Building Blocks
  • System administration
  • cpushimage - push image across cluster
  • cshutdown - Remote shutdown to reboot or halt
    cluster
  • User tools
  • cpush - push single file -to- directory
  • crm - delete single file -to- directory
  • cget - retrieve files from each node
  • ckill - kill a process on each node
  • cexec - execute arbitrary command on each node
  • cexecs serial mode, useful for debugging
  • clist list each cluster available and its
    type
  • cname returns a node position from a given
    node name
  • cnum returns a node name from a given node
    position

90
Cluster Classification Scheme
  • Direct local
  • The cluster nodes are known at run time
  • The command is run from the head node
  • Direct remote
  • The cluster nodes are known at run time
  • The command is not run from the head node
  • Indirect remote
  • The cluster nodes are not known at run time
  • The command is not run from the head node
  • Notes
  • Local or remote is checked by comparing the head
    node names to the local hostname
  • Indirect clusters will execute on the default
    cluster of the head node specified.

91
Cluster Configuration File
  • default cluster configuration file
  • /etc/c3.conf
  • Cluster torc direct local cluster
  • orc-00bnode0
  • node1-4
  • exclude 3
  • Cluster htorc indirect remote cluster
  • htorc-00
  • user specified configuration file
  • /somewhere/list_of_nodes
  • Cluster auto-gen direct remote cluster
  • node0.csm.ornl.gov
  • node1.csm.ornl.gov
  • node2.csm.ornl.gov
  • node3.csm.ornl.gov
  • dead node4.csm.ornl.gov

92
Configuration File Information
  • Offline Node Specifier
  • Exclude tag applies to ranges
  • Dead applies to single machines
  • Important for node ranges on the command line
  • Cluster Definition Blocks as Meta-clusters
  • Group based on hardware
  • Groups based on software
  • Groups based on role
  • User specified cluster configuration files
  • Specified at runtime
  • User can create both sub-clusters and
    super-clusters
  • Useful for scripting
  • Can not have a indirect local cluster (info has
    to be somewhere)
  • Infinite loop warning When using a indirect
    remote cluster, the default cluster on the remote
    head node is executed. This could make a call
    back.

93
MACHINE DEFINITIONS (Ranges) on Command Line
  • MACHINE DEFINITONS as used in command line
  • Position number from configuration file
  • Begin at 0
  • Does not include head node
  • dead and exclude maintain a nodes position
  • Format on command line
  • First cluster name from configuration file with
    a colon
  • Cluster2 would represent all nodes on cluster2
  • signifies default cluster
  • ranges and single nodes are separated by a comma
  • Cluster21-5,7 executes on nodes 1, 2, 3, 4, 5,
    7
  • 4 executes node at position 4 on the default
    cluster
  • cexec torc1-5,7 hostname

94
Execution Model External to Multi-Cluster
  • desktop knowledge
  • TORC head node
  • eXtremeTORC head node
  • HighTORC head node

cluster head-node knowledge node 1 node 2 node
7
95
Execution Model External to Multi-Cluster
On desktop
Indirect remotes (several in one
file) --------------------------------------------
------- cluster torc torc cluster
exterme_torc xtorc cluster high_torc
htorc
On eXtremeTORC
Direct local -------------------------------------
-------- cluster xtorc xtorcnode0 nod
e1-7
96
cpush
cpush OPTIONS MACHINE DEFINITIONS source
target -h, --help display help message -f,
--file ltfilenamegt alternate cluster configuration
file, default is /etc/c3.conf -l, --list
ltfilenamegt list of files to push (single file
per line, column1SRC column2DEST) -i
interactive mode, ask once before
executing --head execute command on head
node, does not execute on compute nodes --nolocal
the source file or directory lies on the head
node of the remote cluster -b, --blind pushes
the entire file (normally cpush uses rsync)
97
cpush
  • to move a single file
  • cpush /home/filename
  • This pushes the file filename to /home on each
    compute node
  • to move a single file, renaming it on the
    cluster nodes
  • cpush /home/filename1 /home/filename2
  • Push the file filename1 to each compute node
    in the cluster, renaming it to
  • filename2 on the cluster nodes
  • to move a set of files listed in a file
  • cpush --list/home/filelist escaflowne
  • This pushes each file in the filelist where it
    is specified to send it. Filelist format is on
  • the next slide.

98
Notes on using a file list
  • One file per line
  • If no destination is specified then it will push
    the file to the location it is on the local
    machine
  • No comments
  • Example file
  • /home/filename
  • /home/filename2 /tmp
  • /home/filaname3 /tmp/filename4
  • The first line pushes the file filename to
    /home on each compute node
  • The second line pushes the file filename2 to
    /tmp on each compute node
  • The third line pushes the file filename3 to
    /tmp on each compute node renaming
  • the file to filename4
  • All options on the command line are applied to
    each file, In a filelist, you can not specify
    that file one uses the nolocal option and file
    two goes to the machine definition clusters3-5.

99
cexec
Usage cexec(s) OPTIONS MACHINE_DEFINITIONS
command --help h display help message
--file -f ltfilenamegt alternate cluster
configuration file if one is not supplied
then /etc/c3.conf will be used -i
interactive mode, ask once before
executing --head execute command on head
node, does not execute on the
cluster Using cexecs executes the serial version
of cexec
100
cexec
  • to simply execute a command
  • cexec mkdir temp
  • This executes mkdir temp on each node in the
    cluster. The working directory of the
  • cexec command is always your home directory thus
    temp would be created in /
  • to print the machine name and then execute the
    string
  • ( serial version only )
  • cexecs hostname
  • This executes hostname on each node in the
    cluster. This differs from cexec in that
  • each node is executed before the next one. This
    is useful if a node is offline and you
  • wish to see which one.

101
cexec
  • to execute a command with wildcards on several
    clusters
  • cexec cluster1 cluster22-5 ls
    /tmp/pvmd
  • This will execute ls /tmp/pvmd on each
    compute node on cluster one and nodes
  • 2, 3, 4, and 5 on cluster2. Notice the use of
    the quotes. This keeps the shell from
    interpreting the command untill it reaches the
    compute nodes.
  • Using pipes
  • cexec ps A grep a.out
  • cexec ps A grep a.out
  • In the first example the symbol is enclosed in
    the quotes. In this case
  • ps Agrep a.out is executed on each node. In
    this way you get the standard cexec
  • output format with a.out in each nodes block if
    it exists. In the second example
  • ps A is executed on each node and the all the
    a.out lines are greped out. This
  • demonstrates that placement of s is very
    important. Example output on next slide.

102
cexec quotation example
cexec ps Agrep xinetd
local processin
g node node1 local
processing node
node2 local
---------
node1--------- 9738 ? 000000
xinetd --------- node2--------- 4856 ?
000000 xinetd cexec ps A grep xinetd
9738 ? 000000 xinetd 4856 ?
000000 xinetd
103
cname
Usage cname OPTIONS MACHINE DEFINTIONS
--help -h display help message --file -f
ltfilenamegt alternate cluster configuration
file if one is not supplied then
/etc/c3.conf will be used
104
cname
  • To search the deafult cluster
  • cname 0-5
  • This returns the node name for the nodes
    occupying slots 0, 1, 2, 3, 4, and 5 in the
    default configuration file
  • To search a specific cluster
  • cname cluster1 cluster24-8
  • All of the nodes in cluster1 are returned and
    nodes 4, 5, 6, 7, and 8 are returned from
    cluster2

105
cnum
Usage cnum OPTIONS MACHINE DEFINTIONS
node_name --help -h display help
message --file -f ltfilenamegt alternate
cluster configuration file if one is not
supplied then /etc/c3.conf will be used
106
cnum
  • To search the default cluster
  • cnum node2
  • This returns the node position (number) that
    node2 occupies in the default cluster
  • configuration file
  • To search several clusters in the configuration
    file
  • cnum cluster1 cluster2 gundam eva
  • This returns the node position that the nodes
    gundam and eva occupy in both
  • cluster1 and cluster2. If the node does not
    exist in the cluster node number is
  • returned.

107
clist
Usage clist OPTIONS --help -h display
help message --file -f ltfilenamegt alternate
cluster configuration file if one is not
supplied then /etc/c3.conf is used
108
clist
  • To list all the clusters from the default
    configuration file
  • clist
  • This lists each cluster in the default
    configuration file and its type(direct local,
    direct
  • remote, or indirect remote)
  • To list all the clusters from an alternate file
  • clist f cluster.conf
  • This lists each cluster in the specified
    configuration file and its type(direct local,
    direct
  • remote, or indirect remote)

109
Multiple cluster examples
  • Command line Same as single clusters, only
    specify several clusters
  • example
  • installing and rpm on two clusters
  • First push rpm out to cluster nodes
  • cpush xtorc example-1.0-1.rpm
  • Use RPM to install application
  • cexec xtorc rpm i example-1.0-1.rpm
  • Check for errors in installation
  • cexec xtorc rpm q example
  • Notice the addition of xtorc cluster
    specifier only difference between examples
  • All clusters in this list will participate in
    this command (the standalone represents the
    default cluster)

110
Usage Notes
  • By default C3 does not execute commands on the
    head node
  • Use head option to execute only on the head
    node
  • Interactive option only asks once before
    execution
  • Commands only need to be homogeneous within
    itself
  • Example binary and data on an intel and HPUX
  • Data can be pushed to both systems
  • cpush head intel hp data.txt
  • Binary for each cluster
  • cpush head intel app.intel app
  • cpush head hp app.HPUX app
  • Then execute app
  • cexec head intel hp app

111
Usage Notes
  • Notes on using multiple clusters
  • Very powerful, but with power comes danger
  • malformed commands can be VERY bad
  • homogeneous within its self becomes very
    important
  • crm all could bring down MANY nodes
  • Extend nearly all unix/linux gotchas to multiple
    clusters/many nodes and very fast
  • High level administrators can easily set policies
    on several clusters from single access point.
  • Federated clusters those within single domain
  • Meta-clusters wide area joined clusters

112
Contact Information
torc_at_msr.csm.ornl.gov contact ORNL cluster
team www.csm.ornl.gov/torc/C3 version 3.1
(current release) www.csm.ornl.gov/TORC ORNL
team site www.openclustergroup.org C3 v3.1
included in OSCAR 1.3
113
Ganglia
Component Presenter Steve DuChene, BGS
114
Ganglia
115
Overview
  • Ganglia provides a real-time cluster monitoring
    environment.
  • Communication takes place between nodes across a
    multicast network using XML XDR formatted text.
  • Ganglia currently runs on Linux, FreeBSD,
    Solaris, AIX, IRIX.

116
History
  • Ganglia was developed as part of the Millennium
    Project at UC Berkeley Computer Sci. Div.
  • Principal author is Matt Massie
    ltmassie_at_cs.berkeley.edugt
  • Packaged for OSCAR by Steve DuChene
    ltlinux-clusters_at_mindspring.comgt

117
Ganglia for monitoring
  • gmond multithreaded daemon which acts as a
    server for monitoring a host
  • Additional utilities
  • gmetric allows adding arbitrary host metrics to
    the monitoring data stream
  • gstat CLI to get cluster status report.

118
Graphical Interface
  • Php/rrdtool web client.
  • Creates histographs of individual data streams
    and formats output for web display.

119
gmond, a few specifics
  • Each gmond stores all of the information for the
    entire cluster locally in memory.
  • Opens up port 8649 and a telnet to this port will
    result in a dump of all the information stored in
    memory. (XML formatted)
  • Additionally, when a change occurs in the host
    that is being monitored, the gmond multicasts
    this information to the other gmonds.

120
gstat
  • The simplest commandline client available.
  • gstat, shows all nodes with basic load info.
  • gstat --help, shows general options
  • gstat --dead, shows dead nodes
  • gstat -m, lists the nodes from least to most
    loaded.

121
Gmetric
  • Gmetric announces a metric value to all the rest
    of the gmond multicast channel. Main command line
    options are
  • --nameString what appears in list of
    monitored metrics
  • --valueString value of the metric
  • --typeString (string,int8,uint8,int16,uint16,fl
    oat,double)
  • --unitsString (i.e. Degrees F or Kilobytes)

122
W83782d-i2c-0-2d Adapter SMBus Via Pro adapter
at 5000 Algorithm
Example LM Sensor output
W83782d-i2c-0-2d Adapter SMBus Via Pro adapter
at 5000 Algorithm Non-I2C SMBus adapter VCore 1
1.40 V (min 0.00 V, max 0.00 V)
VCore 2 1.42 V (min 0.00 V, max
0.00 V) 3.3V 3.32 V
(min 2.97 V, max 3.63 V)
5V 4.94 V (min 4.50 V, max
5.48 V) 12V 12.16 V (min
10.79 V, max 13.11 V) -12V
-12.29 V (min -13.21 V, max -10.90 V)
-5V -5.10 V (min -5.51 V, max
-4.51 V) V5SB 4.99 V (min
4.50 V, max 5.48 V) VBat
3.15 V (min 2.70 V, max 3.29 V)
fan1 10714 RPM (min 3000 RPM,
div 2) fan2 10887 RPM
(min 3000 RPM, div 2)
fan3 0 RPM (min 1500 RPM, div 4)
temp1 -48C (limit
60C, hysteresis 50C) sensor thermistor
temp2 43.5C (limit 60C,
hysteresis 50C) sensor PII/Celeron diode
temp3 40.5C (limit 60C,
hysteresis 50C) sensor PII/Celeron diode
vid 0.00 V
123
The web client php/rrd
124
Displaying all the metrics.
125
gmond across multiple clusters
  • gmond --trusted_host xxx.xxx.xxx.xxx
  • Allows setting up a unicast connection to another
    gmond across the Internet.
  • Must do this on each gmond, so that the
    communication is 2-way.

126
Ganglia Summary
  • gmond is scaleable because of its use of
    multicast.
  • gmond is usefull, as it allows realtime
    information gathering of which hosts are alive,
    before running a job.
  • Available at ganglia.sourceforge.net
  • Now an included package in OSCAR.

127
Ganglia / C3Example
128
Description of sync_users
  • The sync_users script that ships with OSCAR is a
    very simple example usage of cpush to distribute
    the files
  • /etc/passwd,group,shadow,gshadow
  • to the nodes manually or via a cron entry.

129
Statement of Problem
  • The default sync_users script is very simple and
    a very annoying characteristic is that it stalls
    when any of the nodes are down. (Stalls until
    the SSH timeout for that node.)
  • All available nodes roll by perfectly but the
    script pushes 2-4 files and the stall happens at
    the end of each cpush (file). Therefore if the
    timeout is 2 minutes, it could hang for 8
    minutes if no CTRLC is applied.

130
Re-Statement of Problem
  • Need some way to dynamically determine the down
    nodes and skip them when running sync_users.
  • Also, need to display the list of missed nodes.

131
Enter Ganglia
  • Same day the sync_user discussion took place
    Ganglia was demod by a group member.
  • Ganglia maintains information about nodes in the
    cluster and most relevantly it offers a nice
    tool, gstat, with options to list available nodes
    and their load!

132
Quick sync_users2
  • So, a quick sync_users2 was whipped up using
    Ganglias gstat in conjunction with C3s cpush to
    make a smarter script.
  • The script uses output from cname and gstat -m
  • The output is massaged to build the cpush
    command-line and to clearly report missed nodes.

133
Usage example
  • Things could almost be done from a command-line
    like this
  • root gstat -m gt upnodes.tmp \
  • gt cpush l upnodes.tmp /etc/passwd \
  • gt rm upnodes.tmp
  • repeat for all files passwd,group,shadow,gshadow
  • Insteadjust type
  • root ./sync_users2

134
Perl Script Summary
  • Build a hash of the default cluster in the
    c3.conf file (use cname)
  • c3conf munge_c3conf(/opt/c3-3/cname)
    name-gtnum
  • Get list of Up/Avail nodes (via Ganglia)
  • _at_uplist get_nodelist(/usr/bin/gstat m)
  • Munge standard nodelist into C3-3 format
    nodeN-gtN
  • _at_c3nodelist c3ify_nodelist(aref_uplist,
    href_c3conf)
  • Build C3-3 cmd-ln nodelist
  • nodes . join(,, _at_c3nodelist)
  • Distribute the files with the above cmd-ln
    nodelist
  • cpush nodes /etc/passwd
  • Print missed nodes info
  • _at_missed get_missednodes(aref_uplist,
    href_c3conf)
  • print \n Missed nodes\n _at_missed \n

135
Ganglia / C3 Comments
  • This is just a simple application of C3 and
    Ganglia.
  • The goal was to use these two tool to create a
    smarter sync_users this has been met.
  • Since C3 and Ganglia can be use by standard users
    (not just root) this method could be used by
    anyone for user-level scripts.

136
ganglia is for specific metrics
  • A python script, added for convenience.
  • It is both an executable and a class (library).
  • gmond monitors 15 metrics by default.
  • ganglia --help to see the metrics.
  • To run ganglia metric metric
  • Example ganglia cpu_nice

137
Env-Switcher
Component Presenter Thomas Naughton, ORNL
138
Env-Switcher
  • Written by Jeff Squires,
  • jsquyres_at_lam-mpi.org
  • Uses the modules package

139
The OSCAR switcher package
  • Contains 2 RPMs
  • modules
  • env-switcher
  • Each RPM has different intended uses

140
Super-short explanation
  • modules
  • Changes the current shell environment
  • Changes are non-persistent current shell only
  • env-switcher
  • Change future shell environments
  • Changes are persistent all future shells
  • Controls the list of which modules are loaded at
    each future shell invocation

141
Design goals for OSCAR switcher package
  • Allow users an easy way to persistently control
    their shell environment without needing to edit
    their dot files
  • Strongly discourage the use of /etc/profile.d
    scripts in OSCAR
  • Use already-existing modules package
  • Contains sophisticated controls for shell
    environment manipulation
  • Uses deep voodoo to change current shell env.

142
Design goals for OSCAR switcher package
  • Cannot interfere with advanced users wanting to
    use modules without switcher
  • Two-tier system of defaults
  • System-level default
  • User-level defaults (which always override the
    system default)
  • E.g., system default to use LAM/MPI, but user
    bob wants to have MPICH as his default

143
Why doesnt switcher change the current
environment?
  • Changing the current env requires deep voodoo
  • Cannot layer switcher over modules to change the
    current mechanism
  • at least, not without re-creating the entire
    change the current env. mechanism
  • The modules package already does this
  • Seems redundant to re-invent this mechanism
  • Users can use the module command to change the
    current environment

144
Why discourage /etc/profile.d scripts?
  • Such scripts are not always loaded
  • Canonical example is rsh/ssh
  • For non-interactive remote shells, profile.d
    scripts are not loaded
  • Non-interactive remote shells are used by all MPI
    and PVM implementations
  • The modules philosophy is a fine-grained approach
    to making software packages available
  • In contrast to the monolithic /usr/bin approach

145
The modules software package
  • modules.sourceforge.net
  • At the core of modules
  • Set of TCL scripts called modulefiles
  • Each modulefile loads a single software package
    into the environment
  • Can modify anything in the environment (e.g,
    PATH)
  • Each modulefile is reversable can load and
    unload them from the environment

146
The modules software package
  • Loading and unloading modules requires individual
    commands no persistent changes
  • Examples
  • module load lam-6.5.6
  • module unload pvm

147
The env-switcher software package
  • Controls the set of modulefiles that are loaded
    for each shell
  • Guarantees that this set is loaded for all shells
  • Including the corner cases of rsh/ssh
  • Allows users to manipulate this set via the
    command line
  • Current cmd line syntax is somewhat clunky
  • Will be made nicer by OSCAR 1.3 stable

148
OSCARs three kinds of modulefiles
  • Normal
  • Not automatically loaded by OSCAR
  • /opt/modules/modulefiles
  • Auto-loaded
  • Guaranteed to be loaded by OSCAR for every shell
  • /opt/modules/oscar-modulefiles

149
OSCARs three kinds of modulefiles
  • Switcher-controlled
  • May or may not be loaded by OSCAR, depending on
    system and user defaults
  • No fixed directory location for these modulefiles
  • Use the switcher command to register
    switcher-controlled modulefiles

150
What do RPM / OSCAR package authors need to do?
  • Do not provide /etc/profile.d scripts
  • Provide a modulefile instead
  • Decide how that modulefile will be used in OSCAR
  • Normal
  • Auto-loaded
  • Switcher-controlled
  • Install the modulefile in post as appropriate
  • Uninstall the modulefile in preun

151
Still to be done in switcher
  • Add simplified command line syntax for users
  • Add a man page
  • Add some form of documentation in OSCAR for using
    switcher to change MPI implementation

152
Future Development
153
OSCAR v1.4
  • Major topics
  • Node grouping
  • GUI/CLI/Wizard
  • Publish API for OSCAR DB
  • Packages exploit DB via API
  • Security enhancements compute head node
  • User selectable pkgs for contrib pkgs
  • Mandrake support (if not already avail)

154
OSCAR v1.5 ? v2.0
  • Major topics
  • Add/Delete package
  • OSCAR itself a package
  • Maintenance of nodes/images via GUI CLI

155
Future OCG
  • Thin-OSCAR
  • UNCL?

156
Future OSCAR
  • OSCAR migration/upgrade db migration,etc.
  • Support for non-RPM packages
  • Support for other UNIXes
  • Support for Diskless nodes

157
Join OSCAR
OSCAR Research Center
PULL
www.openclustergroup.org oscar.sourceforge.net so
urceforge.net/projects/oscar
Cluster Lab For The Gifted
Write a Comment
User Comments (0)
About PowerShow.com