Title: Cluster Administration Tool
1Jaeyoung Choi
Jiyeon Kim, Yongkwan Park, Sungjoo Kwon, Jaeyoung
Choi
choi_at_comp.ssu.ac.kr
heaven, psiver, lithmmon_at_ss.ssu.ac.kr,
choi_at_comp.ssu.ac.kr
School of Computing, Soongsil University 1-1,
Sangdo-Dong, Dongjak-Ku Seoul 156-743, Korea
School of Computing, Soongsil University 1-1,
Sangdo-Dong, Dongjak-Ku Seoul 156-743, Korea
2Motivation
- Linux Cluster System widely used for high
performance computing - It emphasizes on the use of commodity hardware
and open source software - It delivers a very high-performance at the
extremely low cost - System management is a challenging task
- Automatic and convenient installation of OS
application software packages - The effective way to navigate and interact with
cluster component - Mechanism and tools to perform collective
commands - Some services such as monitoring, fault detection
and recovery
3What is CATS-i ?
- Cluster Administration ToolS on the Internet
- A collection of system management tools
- Provides automatic and convenient installation of
OS application software packages - Provides efficient monitoring and management of
cluster nodes with simple operation on the
Internet. - Provides easy-to-use GUI of PBS.
- Easy-to-install CATS-i rpm package
4CATS-i System Architecture
- Client Daemon
- Get system information from local OS on each node
- Server Daemon
- Running on server node to collect information
from client daemon - Setup tool
- Implemented with JAVA
- Management tool
- Implemented with JAVA
- Support internet
5Difference with CATS-i
- NodeCloner
- CACR at CalTech
- to make all nodes identical
- using the Bootp and NFS
- not provide a GUI
- must edit the setup files related to NodeCloner
- Beoboot
- Rembo Technology SaRL, Swizerland
- Boot-ROM booting
- using DHCP
- using batch file interpreter
- defect make the batch file, difficult interface
6Difference with CATS-i
- LUI(Linux Utility for cluster Installation)
- IBM
- Support BOOTP protocol and using DHCP and PXE.
- GUI Interface
- Heterogeneous cluster
- Must define the resource object
- Using TFTP
- As the number of nodes is increased, I/O road is
increased.
7Installation using the IP Multicasting
- It provides same speed of installation and reduce
I/O load - Automatically, multicast a client module through
NFS - Sever sends slave node disk image through the D
class IP address - To make up for the unreliability of UDP
- timeout and retransmission
8Setup tools with IP multicasting
Master node
GUI
Node DB
Network Configuration info
Error/Flow Control
Multicast Server Module
UDP
D class IP (224.0.0.0 239.255.255.255)
UDP
Node N
Node 1
Node 2
Node 3
9Setup tool in the CATS-i
- Disk Cloning using the NFS
- A slave node must be boot with DHCP and NFS
enabled kernel - It has a same way to boot as the diskless
terminal - using DHCP
- It makes a disk image of a slave node include
hard disk info - store slave node disk image in the server disk
10OS Setup tools Architecture - Disk cloning
- Disk cloning preparation
- Step 1, 2, 3
- Command operation
- Step 4, 5, 6, 7, 8
- Make disk image
- Step 9, 10, 11, 12, 13
- Save disk image
- Step 14, 15
11OS Setup tools Architecture - Installation
- Installation preparation
- Step 1, 2, 3
- Command operation
- Step 4, 5, 6, 7, 8, 9
- Installation
- Step 10, 11, 12, 13, 14
12OS Setup tools
 Â
Slave Node
Master Node
13Related works for CMS -VACM
- Cluster administration tool runs on VA-Linux
- Real-time hardware sensor data such as
temperature, fan speed and voltage are reported
14Related works for CMS - MAT
- Ryerson University, Canada
- It is implemented with Tcl/Tk
- It causes a lot of overhead to display rapidly
changing data - Individual management about each node
- monitor about system file mainly
15Related works for CMS - SCMS
- Kasetsart University
- It consists of real-time monitoring system,
parallel unix command and numerous system
administration utilities - It supports java applet to report real-time
system information - It supports 3D interface using VRML
16Related works for CMS M3C
- Oak Ridge National Lab
- It is implemented with java.User can manage
multiple cluster group in one interface - It supports job scheduling and software
installation
17Management tools in the CATS-i
- Management tool offers maintenance of cluster
nodes. - Characteristics of management tool
- It is possible to bind many node as one cluster
group, and manage multiple cluster groups in one
place. - It is possible to apply the same operation
efficiently to all or selected nodes. - It offers real-time monitoring to users for
resource information such as CPU, memory and etc. - Console implemented with java is interactive and
easy to use. - Job scheduling using JPBS through Internet
- CATS-i offers many function about resource.
18CATS-i function
- Node status
- CPU, memory, process, user list, account
- Disk space
- File management
- Alarm
- System log
- Shutdown/Reboot
- Package management
- JPBS
19Management tools Node status
- It shows node information for each group
- Real-time information about CPU and memory
total view
20Management tools Node status
- It enable user to monitor resource information of
cluster nodes such as CPU, memory, account, user,
real-time CPU and memory monitoring, process
monitoring, and managing
basic info
Performance
21 process
Disk
Account
User List
22Management tools file management
- It provides file management functions for a
cluster group.
File Management
- It is very easy to use
- When they want to perform jobs related with
files, users just click the right button to show
a pop-up menu.
23Management tools alarm function
- Monitor import system parameters
- Processor utilization, Memory Usage, etc.
- Notification is done through e-mail of system
functions.
24Management tools system log
- Log information is very useful in various
situation - Server daemon collects log information from each
node
Log Tree
25Management tools RPM package
- User can install, remove, upgrade application
packages with management tool and query about
installed RPM
- Support REDHAT Linux
- It is implemented with thread library
Option Dialog
26Management tools PBS Interface
- It enables users to user a general PBS with the
same CATS-i interface.
JPBS job Submission Dialog
main screen
27 Conclusion Future works
- CATS-i will offer more functions such as
- Status of CPU temperature, voltage and speed
- Extended aggregation of services
- Statistical memory and CPU information for each
user - Statistical information can be displayed
graphically - Network monitoring using SNMP and network
analysis - detect network bottleneck of clusters.
- Enhanced alarm services
- Administrator can can specify the condition to
alarm and action to be taken - In emergence, CATS-i can shutdown or reboot
cluster nodes