TRIUMF Computing Services - PowerPoint PPT Presentation

1 / 9
About This Presentation
Title:

TRIUMF Computing Services

Description:

Steve McDonald. 4. What Type ... When no PBS jobs are queued the PBS post execution script turns on the openMosix ... Thereby dedicating node to PBS jobs only ... – PowerPoint PPT presentation

Number of Views:172
Avg rating:3.0/5.0
Slides: 10
Provided by: legacywe
Category:

less

Transcript and Presenter's Notes

Title: TRIUMF Computing Services


1
TRIUMF Computing Services
Steven McDonald
  • TRIUMF public compute cluster
  • OpenMosix PBS
  • Wide area networking
  • Traffic shaping bandwidth mgmt
  • TRIUMF wireless network

2
TRIUMF Public Compute Cluster
3
The State of Public Computing _at_TRIUMF
  • Since the slow and very painful demise of the VMS
    clustering during the 90s, TRIUMF has not
    provided an alternative to true cluster computing
  • .
  • Began using LINUX in the mid 90s, typically a
    public server was purchased, latest-greatest cpu
    and everyone got an an account
  • The lifetime was typically 2years before the
    hardware need to be replaced, disk capacity
    doubled and accounts migrated.
  • This continued for a while until we started using
    small NIS clusters, but typically the
    tear-down-replace approach continued.
  • The challenge was find a cluster that could be
    useful to the majority of our users yet
    affordable and maintainable with our limited
    resources.

4
What Type of Cluster Do We Need
  • Focus on the configuration of the cluster and how
    it has addressed a number of issues that were
    important to TRIUMF
  • Satisfies a spectrum of user requirements
  • Most efficient use of its resources
  • Manageable and maintained by one/two people
  • What type of cluster is required?
  • hpc, load-balancing, fail-over, CluMPs,
    parallel, storage, database, SSI
  • What type of use is expected?
  • Interactive use 30-50 users - Web browsing -
    E-mail - etc
  • Program development
  • Batch jobs (long short)
  • WESTGRID removed the need for large batch cluster

5
Cluster Hardware Software Config.
  • IBM x330s 12 x 1.4 GHz cpus
  • 1TB SCSI attached IDE RAID5 disks
  • Red Hat 7.3
  • OpenMosix kernel
  • Transparent process migration
  • OpenPBS with Maui scheduler (same as Westgrid)
  • Batch queue
  • xCat,Ganglia, openmosixview, xpbs for monitoring
    status

6
How is it Different from others
  • So what is unique
  • It is both
  • traditional batch cluster
  • Interactive load sharing cluster with transparent
    process migration
  • Configuration
  • Head node 2 compute nodes always dedicated to
    openMosix with load balancing
  • 3 compute nodes (6 processors) allocated to PBS
    batch queues
  • When no PBS jobs are queued the PBS post
    execution script turns on the openMosix
    properties of the kernel to allow membership in a
    mosix load sharing cluster

7
(No Transcript)
8
Cluster Logic
5. Else, submit job to another PBS node and
remove that node from openMosix membership also.
2. User Submits job
  • 1 Initial Condition
  • All nodes running openMosix with process
    migration turned ON
  • PBS turned OFF on Head 2 compute nodes,
    prevents PBS from ever submitting jobs to these
    nodes

3. PBS prologue script removes node from
participating in openMosix load balancing.
Thereby dedicating node to PBS jobs only
6. When PBS jobs end, PBS epilogue script checks
if any other PBS jobs are running on that node,
if not return node to participating again in
openMosix membership
4. If more PBS jobs are submitted and a CPU is
available on existing allocated PBS node submit
job.
9
Final Words
  • While by no means an impressive cluster in terms
    of size we have managed to combine a traditional
    batch queuing cluster using OpenPBS with a load
    sharing cluster such as OpenMosix in a unique
    way.
  • However, it is scalable thereby solving the
    consuming tear-down-replace approach of the past
  • It satisfies a spectrum of user requirements, and
    is only one cluster to manage and maintain with
    limited resources.
Write a Comment
User Comments (0)
About PowerShow.com