Installation of a Condor Supercomputing pool - PowerPoint PPT Presentation

About This Presentation
Title:

Installation of a Condor Supercomputing pool

Description:

The international polar year was designed to study and better ... grid concept was proposed to delegate the analysis of the existing ... New Jersey: ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 22
Provided by: cerseradmi
Learn more at: http://nia.ecsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Installation of a Condor Supercomputing pool


1
Installation of a Condor Supercomputing pool
  • Brain Campbell
  • Bryce Carmichael
  • Unquiea Wade
  • Mentor
  • Dr. Eric Akers

2
Abstract
The international polar year was designed to
study and better understand the current state of
the climatic changes to the worlds ice sheets.
For the last few decades, there have been
automated weather stations and satellites in
geo-synchronous orbit that created data sets.
Today, numerous amounts of data are unexplored
due to insufficient funding and the scarcity of
resources. For this reason, the polar grid
concept was proposed to delegate the analysis of
the existing data sets.    The goal of the
Elizabeth City State Universitys Polar Grid Team
was to construct a model network to serve as a
base for a super computing pool. The super
computing pool will be constructed on the
universitys campus and linked to the overall
polar grid system. Numerous Software and
protocols were researched that are currently in
use at other institutions around the nation. From
the possible protocols, the condor software was
chosen. Condor was created and developed at the
University of Wisconsin because of easier usage
and its willingness for expansion.   An
eighteen node computing pool was constructed and
tested within Dixon Hall's second floor lab using
Condor. This pool was comprised of seventeen
desk-tops running on a Windows NT platform, with
the pool's mater housed in Lane hall acting as a
Linux based server.
3
Purpose
  • The goal was to utilize all of our computers.
  • Gain knowledge about Supercomputing.
  • Setup a pool of computers that can be accessed by
    Polar Grid.
  • Familiarize team members with job submission
    and overall operation of Condor.

4
Introduction to Supercomputing
  • What is Supercomputing?
  • Supercomputing a term given to a system capable
    of processing at speeds much greater than
    commercially available CPUs.
  • High throughput computing is used in describing
    systems with intermediate processing abilities.

5
Distributive vs. Parallel
  • Distributed computing utilizes a network of many
    computers, each accomplishing a portion of an
    overall task, to achieve a computational result
    much more quickly than with a single computer.
  • Distributed computing also allows many users to
    interact and connect openly.
  • Parallel processing is the simultaneous
    processing of the same task on two or more
    microprocessors in order to obtain faster
    results.
  • The computer resources can include a single
    computer with multiple processors.

6
Size vs. Efficiency
  • Parallel processing allows more intimate
    communication between nodes increasing
    efficiency.
  • As the size of the network grows communication
    takes up a greater part of the CPUs time
  • This can be limited by using more than one type
    of protocol in a system

7
Hardware/Software Options
Condor is a specialized workload management
system for compute-intensive jobs. Like other
full-featured batch systems, Condor provides a
job queueing mechanism, scheduling policy,
priority scheme, resource monitoring, and
resource management.
Beowulf is a design for high-performance parallel
computing clusters on inexpensive personal
computer hardware. Beowulf cluster is a group of
usually identical PC computers running a Free and
Open Source Software (FOSS) Unix-like operating
system, such as BSD, Linux or Solaris.
BOINC is a software platform for volunteer
computing and desktop Grid computing. BOINC is
designed to support applications that have large
computation requirements, storage requirements,
or both.
8
History of Condor
  • The Condor project was started in 1988.
  • Condor was built from the results of the Remote
    Unix project and from the continuation of
    research in the area of Distribute Resource
    Management (DRM).
  • Condor was created at the University of
    Wisconsin-Madison (UW-Madison), and it was first
    installed as a production system in the
    UW-Madison Department of Computer Science.

9
Why choose Condor?
  • Versatility
  • Capability of switching between distributive or
    parallel computing
  • Multiple programming codes for simple execution
    of jobs.
  • Operates on Multiple platforms

10
Resources Required
  • Availability Open source software
  • Easy Expansion Any number of nodes can be added
    to an existing pool
  • Cost efficiency Any CPU meeting the base
    requirements can be use efficiently.

11
System Requirements
  • Windows
  • Condor for Windows requires Windows 2000 (or
    better) or Windows XP.
  • 300 megabytes of free disk space is recommended.
    Significantly more disk space could be desired to
    be able to run jobs with large data files.
  • Condor for Windows will operate on either an NTFS
    or FAT file system. However, for security
    purposes, NTFS is preferred.
  • Unix
  • The size requirements for the downloads are
    currently vary from about 20 Mbytes
  • (statically linked HP Unix on a PA RISC) to more
    than 50 Mbytes (dynamically linked Irix on an
    SGI).
  • In addition, you will need a lot of disk space
    in the local directory of any machines that are
    submitting jobs to Condor

12
Installation
http//parrot.cs.wisc.edu/
.
Condor software can be access through their main
website. Condor can be downloaded on various
platform such as Solaris, Linux/Unix, Windows,
and MAC Administrative and user manuals are also
available on the website.
13
Configuration
Installation overseen through the windows
installation wizard Changes to default Pool
master node Linux base machine in lane hall
10.40.20.37 having a Linux based master will
allow the eventual use of the full array of
condor options. Read Write access -
parameters changed to include 10... to allow
fee back and access from different nodes. Due
to the use of the CERSER labs during class hours
each node is required to be idle for 15 minutes
before it is available to perform tasks. If a
tasks interrupted it will be restarted on a
different machine, if the original node is not
freed in less than ten minutes
14
Job Submission and Tracking
Jobs can be submitted using any executable file
format through the condor/bin directory. Job
s are submitted through the condor bin using the
condor_submit filename,the status of the nodes
within the system can be checked using the
command condor_status,
15
Condor Status Menu
condor _status command will bring up a menu given
the condition that will list the current platform
and availability of each node. Availability is
signified by the one word qualifiers in the
fourth column. Unclaimed The node is open but
is unable to perform the specified task Claimed
The node is currently running a specified task
Matched The node is opened and can perform a
specified task Owner The node has a local
user demanding its attention
16
Job Submission and Tracking
After submission a task can be traced through the
pool using condor_q, command. The results of
the tasks can be seen within the output files
created through the executable. or through the
.log file that is created automatically for each
task.
17
Results
Condor pool composed of 17 nodes running on
windows NT platform has been established in the
Dixon hall laboratory. Operating under a Linux
based master housed at the lane hall
offices. To date simple tasks have been
submitted using C code and have ran
successfully through the pool. Diagnostic
assessment has shown two CPUs unconnected to the
network and that there were naming redundancies
which hindered the installation of the condor
system.
18
Conclusions
  • Installation of Condor was a success .
  • Expansion of the cluster is easy and can be done
    efficiently with minimal cost in resources.
  • Management and Programming with Condor can be
    done on an undergraduate level and is encouraged.

19
Future Work
  • Familiarize more of CERSER teams with Condor
    software.
  • Continue the expansion of the Condor pool .
  • Link ECSU to the Polar Grid network.
  • Encourage the development of a programs to aide
    future CERSER research projects.

20
References
  1. Andrew S. Tanenbaum, Maarten Van Steen (2002)
    Distributed Systems Principles and Paradigms. New
    Jersey Prentice- Hall Inc.
  2. Amza C., A.L. Cox, S. Dwarkadas, P. Keleher, R.
    Rajamony H. Lu, W. Yu, and W.Zwaenepoel.
    ThreadMarks Shared memory computing on networks
    of workstations, to appear in IEEE
    Computer,(draft copy) www.cs.rice.edu/willy/Tread
    Marks/papers.html.
  3. A.J. van der Steen, An evaluation of some Beowulf
    clusters, Technical Report WFI-00-07, Utrecht
    University, Dept. of Computational Physics,
    December 2000. (Also available through
    www.euroben.nl, directory reports/.)
  4. A.J. van der Steen, Overview of recent
    supercomputers high-end servers, June 2005,
    www.euroben.nl, directory reports/.
  5. http//www.cs.wisc.edu/condor/manual/v7.0/
  6. http//boinc.berkeley.edu/trac/wiki/BoincIntro
  7. http//www.supercomputingonline.com/ads.php

21
Questions
Write a Comment
User Comments (0)
About PowerShow.com