Dolly for system management of PC Linux cluster - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Dolly for system management of PC Linux cluster

Description:

4.3) kickstart invokes a post' shell script on nodes. 4.3.1) running dollyC on nodes ... Continuing kickstart post shell script ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 20
Provided by: man4150
Category:

less

Transcript and Presenter's Notes

Title: Dolly for system management of PC Linux cluster


1
Dolly forsystem management ofPC Linux cluster
  • A.Manabe (CRC_at_KEK)
  • Atsushi.Manabe_at_kek.jp
  • http//corvus.kek.jp/manabe

2
Motivation
  • System (Software) installation update to more
    than 10 PC was boring and even hard work for me.
  • How hard will installation to over 100PCs be?
  • If installation is very fast, you can easily
    switch the OS from old to new version and its
    opposite way as well. It is very convenient for
    testing a brand-new version of OS.
  • If it is, it is also good for system recovery
    from HD trouble.

3
An idea and Objective
  • An way of installation process.
  • At first, you install a system to one PC.
  • Then clone the disk image to other PCs via
    network.
  • Config. files unique for each node such as
    hostname, IP address or so on will be created and
    overwrited to each PCs disk.
  • Target
  • Installation to very many PCs (1001000) of
    almost same spec..
  • Objective
  • Very fast installation for example, 100PC
    installation in 10min.
  • Good scalability against the number of nodes.
  • Necessary human operation as small as possible.
    If there is, do in centralized way.

4
Dolly and Dolly
  • Dolly
  • A Linux application software to copy/clone files
    or/anddisk images among many PCs through a
    network.
  • Dolly is originally developed by CoPs project in
    ETH (Swiss) and a free software.
  • Dolly features
  • Sequential files (no limitation of over 2GB)
    and/or normal files (optinaldecompress and untar
    on the fly) transfer/copy via TCP/IP network.
  • Virtual RING network connection topology.
  • Pipeline and multi-threading mechanism for
    speed-up.
  • Fail recovery mechanism for robust operation.

turn up to the next page
5
Dolly How do you start it on linux
  • Server side (which has the original file)
  • dollyS -v -f config_file
  • Nodes side
  • dollyC -v

Config file example
iofiles 3 /dev/hda1 gt /tmp/dev/hda1 /data/file.gz
gtgt /data/file boot.tar.Z gtgt /boot server
n000.kek.jp firstclient n001.kek.jp lastclient
n020.kek.jp client 20 n001 n002
n020 endconfig
of files to Xfer server name of
client nodes clients names end code
The left of gt is input file in the server.
The right is output file in clients. 'gt' means
dolly does not modify the image. 'gtgt' indicate
dolly should cook (decompress , untar ..) the
file according to the name of the file.
6
Dolly Virtual Ring Topology
Server host having original image
  • Physical network connection is as you like.
  • Logically Dolly makes a node ring chain. Its
    order is specified by dollys config file.
  • Though transfer is only between its two adjacent
    nodes, it can utilize max. performance ability of
    switching network of full duplex ports.
  • Good for network complex by many switches.

node PC
network hub switch
physical connection
Logical (virtual) connection
7
Other possibility of network connection which is
not supported by Dolly
8
(few) Server - (Many) Client model
  • Server could be a daemon process.(you dont need
    to start it by hand)
  • Performance is not scalable against of nodes.
  • Server bottle neck. Network congestion.

Multicasting or Broadcasting
  • No server bottle neck.
  • Get max performance of network which support
    multicasting in switch fablics.
  • Nodes failure does not affect to all the process
    very much, it could be robust.
  • Since failed node need re-transfer. Speed is
    governed by the slowest node as in RING topology.
  • Not TCP but UDP, so application must take care
    of transfer reliability.

9
Cascade Topology
  • Server bottle neck could be overcome.
  • Cannot get maximum network performance but better
    than many to only one topology.
  • Week against a node failure. Failure will spread
    in cascade way as well and difficult to recover.

10
PIPELINING multi threading
3 thread in parallel
11
Fail recovery mechanism
  • Only one node failure could be show stopper in
    RING (series connection) topology.
  • Dolly provides automatic short cut mechanism
    in node problem.
  • In a node trouble, the upper stream node detect
    it by sending time out.
  • The upper stream node negotiate with the lower
    stream node for reconnection and retransfer of a
    file chunk.
  • RING topology makes its implementation easy.

time out
Short cutting
12
Re-transfer in short cutting
BOF
EOF
1 2 3 4 5 6 7 8 9 ..
File chunk 4MB
6
9
8
7
6
network
Server
5
8
7
Node 1
network
5
7
6
Node 2
Next node
13
Performance (measured and expected)
  • Measured performance (see the next page graph!)
  • 1Server - 1Nodes (Pent.III 1GHz x 2cpu)
  • ATA 100 IDE disk/ full duplex 100Base-TX network
    MB/s
  • 2GB image copy 8.2MB/s, elapsed time
    230sec.
  • 1Server - 10Nodes (At the moment I have only 11
    PCs available for the test.)
  • All nodes are the same type hardware as above.
  • 2GB image copy 720MB/s in aggregate, elapsed
    time 260 sec
  • Thanks to pipelining mechanism, elapsed time does
    not increase not so much as the nodes increase.
    (See the next page graph!)
  • 5min for 1 node then 10min for 500 nodes
    theoretically.
  • The measured time is only for image cloning,
    Actually you need around more 4 min. for
    booting process ( PXEkickstart).

turn up to the next page
14
(No Transcript)
15
measured
16
How does dolly start after pushing reset button.
  • You setup kickstart, PXE and DHCP config. file
    and run these servers. Prepare one installed PC.
    Connect all PC to network.
  • Push reset button of all nodes.
  • Starting PXE process in all nodes.
  • 3.1) PXE ask DHCP server for booting process
    (local boot, kickstart installation or diskless
    client ), IP address and its hostname.
  • Default booting process you can setup by DHCP
    config file. So keyboard operate on each node is
    not necessary.
  • (assume you select kickstart installation)
  • 3.2) PXE download small OS kernel with kickstart
    images (2MB) by multicast TFTP from PXE server.
  • 3.3) Running RAM disk root Linux on all nodes.

Red items you have to operate Blue items
process automatically
turn up to the next page
17
How does dolly start after pushing reset button.
(2)
  • Starting kickstart process
  • 4.1) kickstart get IP address and hostname from
    DHCP server andkeep it at the local RAM disk.
  • 4.2) make file system in nodes disks according
    to the kickstart config. file.
  • 4.3) kickstart invokes a post shell script on
    nodes.
  • 4.3.1) running dollyC on nodes
  • Start dollyS in the pre-installed machine.
  • 5.1) After all nodes are ready,you start dollyS
    in the PC which has been installed before this
    process. You can know the nodes readiness by
    using ping command.

turn up to the next page
18
How does dolly start after pushing reset button.
(3)
  • Continuing kickstart post shell script
  • 4.4) overwrite individual host information (IP
    address, hostname, fstab ..) to the cloned
    local disk from the RAM disk.
  • 4.5) Do LILO.
  • Re-configure the DHCP servers for nodes to boot
    locally.
  • Push reset button of all nodes again
  • PXE (Pre eXecution Environment)
    http//developer.intel.com/ial/WfM/wfmspecs.htm
  • A standard propsed by Intel of Network Card
    firmware for network booting. It requires PXE
    compliant NIC hardware.
  • Kickstart (RedHat)
  • RedHat batch installer.

19
Conclusion
  • I have developed a fast installation process
    suitable for Linux cluster which consists of
    massive number of PCs.
  • Dolly is a disk image cloning software and is a
    part of here proposed installation process. It
    only takes almost same minutes for 10PCs disk
    image cloning as for 1 PC cloning and high
    scalability upto several hundreds PC installation
    is expected.
  • We are using it for 16 PCs cluster and will use
    for a cluster of 70 PCs in this September.
  • Software is available from http//corvus.kek.jp/
    manabe/pcf/dolly
  • Thank you for your reading !
Write a Comment
User Comments (0)
About PowerShow.com