Design - PowerPoint PPT Presentation

About This Presentation
Title:

Design

Description:

Provide 2 passes of 1st level reconstruction at average incoming data rate (10 MB/s) ... ASUS motherboards, 256 MB, ~40 GB SCSI, IDE, 100 Mbit ... – PowerPoint PPT presentation

Number of Views:123
Avg rating:3.0/5.0
Slides: 25
Provided by: ianb8
Category:
Tags: asus | design

less

Transcript and Presenter's Notes

Title: Design


1
Design Management of the JLAB Farms
  • Ian Bird, Jefferson Lab
  • May 24, 2001
  • FNAL LCCWS

2
Overview
  • JLAB clusters
  • Aims
  • Description
  • Environment
  • Batch software
  • Management
  • Configuration
  • Maintenance
  • Monitoring
  • Performance monitoring
  • Comments

3
Clusters at JLAB - 1
  • Farm
  • Support experiments reconstruction, analysis
  • 250 (? 320) Intel Linux CPU ( 8 Sun Solaris)
  • 6400 ? 8000 SPECint95
  • Goals
  • Provide 2 passes of 1st level reconstruction at
    average incoming data rate (10 MB/s)
  • (More recently) provide analysis, simulation, and
    general batch facility
  • Systems
  • First phase (1997) was 5 dual Ultra2 5 dual IBM
    43p
  • 10 dual Linux (PII 300) acquired in 1998
  • Currently 165 dual PII/III (300, 400, 450, 500,
    750, 1GHz)
  • ASUS motherboards, 256 MB, 40 GB SCSI, IDE, 100
    Mbit
  • First 75 systems towers, 50 2u rackmount, 40 1u
    (½u?)
  • Interactive front-ends
  • Sun E450s, 4-proc Intel Xeon, (2 each), 2GB RAM,
    Gb Ethernet

4
Intel Linux Farm
5
Clusters at JLAB - 2
  • Lattice QCD cluster(s)
  • Existing clusters in collaboration with MIT, at
    JLAB
  • Compaq Alpha
  • 16 XP1000 (500 MHz 21264), 256 or 512 MB, 100
    Mbit
  • 12 Dual UP2000 (667 MHz 21264), 256 MB, 100 Mbit
  • All have Myrinet interconnect
  • Front-end (login) machine has GB Ethernet, 400 GB
    fileserver for data staging and transfers MIT ?
    JLAB
  • Anticipated (funded)
  • 128 cpu (June 2001), Alpha or P4(?) in 1u
  • 128 cpu (Dec/Jan ?) identical to 1st 128
  • Myrinet

6
LQCD Clusters
7
Environment
  • JLAB has central computing environment (CUE)
  • NetApp fileservers NFS CIFS
  • Home directories, group (software) areas, etc.
  • Centrally provided software apps
  • Available in
  • General computing environment
  • Farms and clusters
  • Managed desktops
  • Compatibility between all environments home and
    group areas available in farm, library
    compatibility, etc.
  • Locally written software provides access to farm
    (and mass storage) from any JLAB system
  • Campus network backbone is Gigabit Ethernet, with
    100 Mbit to physicist desktops, OC-3 to ESnet

8
DST/Cache File Servers 15 TB RAID 0
Jefferson Lab Mass Storage and Farm Systems 2001
Tape Servers
Farm Cache File Servers 4 x 400GB
DB Server
Work File Servers 10 TB RAID 5
From CLAS DAQ
From Hall A,C DAQ
100 Mbit/s
1000 Mbit/s
Batch and Interactive Farm
FCAL
SCSI
9
(No Transcript)
10
Batch Software
  • Farm
  • Use LSF (v 4.0.1)
  • Pricing now acceptable
  • Manage resource allocation with
  • Job queues
  • Production (reconstruction, etc)
  • Low-priority (for simulations), High-priority
    (short jobs)
  • Idle (pre-emptable)
  • User group allocations (shares)
  • Make full use of hierarchical shares - allows
    single undivided cluster to be used efficiently
    by many groups
  • E.g.

11
Batch software - 2
  • Users do not use LSF directly, use Java client
    (jsub), that
  • Is available from any machine (does not need LSF)
  • Provides missing functionality, e.g.
  • Submit 1000 jobs in 1 command
  • Fetches files from tape, pre-stages before job
    queued for execution (dont block farm with jobs
    waiting for data),
  • Ensures efficient retrieval of files from tape -
    e.g. sort 1000 files by tape and by file no. on
    tape.
  • Web interface (via servlet) to monitor job status
    and progress (as well as host, queue, etc.)

12
View job status
13
View host status
14
Batch software - 3
  • LQCD clusters use PBS
  • JLAB written scheduler
  • 7 stages mimic LSF hierarchical behaviour
  • Users access PBS commands directly
  • Web interface (portal) authorization based on
    certificates
  • Used to submit jobs between JLAB MIT clusters

15
(No Transcript)
16
Batch software - 4
  • Future
  • Combine jsub LQCD portal features to wrap both
    LSF and PBS
  • XML-based description language
  • Provide web-interface toolkit to experiments to
    enable them to generate jobs based on expt. run
    data
  • In context of PPDG

17
Cluster management
  • Configuration
  • Initial configuration
  • Kickstart, 2 post-install scripts for
    configuration, sw install (LSF etc), driven by a
    floppy
  • Looking at PXE DHCP (available on newer
    motherboards)
  • Avoids need for floppy just power on
  • System working (last week)
  • Software PXE standard bootprom
    (www.nilo.org/docs/pxe.html) talks to DHCP,
  • bpbatch pre-boot shell (www.bpbatch.org) -
    downloads vmlinux, kickstart etc
  • Alphas configured by hand kickstart
  • Updates etc.
  • Autorpm (especially for patches)
  • New kernels by hand with scripts
  • OS upgrades
  • Rolling upgrades use queues to manage
    transition
  • Missing piece
  • Remote, network-accessible console screen access
  • Have used serial console, KVM switches, monitor
    on a cart
  • Linux Networks Alphas have remote power
    management dont use!

18
System monitoring
  • Farm systems
  • LM78 to monitor temp fans via /proc
  • This was our largest failure mode for Pentiums
  • Mon (www.kernel.org/software/mon)
  • Used extensively for all our systems page
    on-call
  • For batch farm checks mostly fan, temp, ping
  • Mprime (prime number search) has checks on memory
    and arithmetic integrity
  • Used in initial system burn-in

19
Monitoring
20
Performance monitoring
  • Use variety of mechanisms
  • Publish weekly tables and graphs based on LSF
    statistics
  • Graphs from mrtg/rrd
  • Network performance, jobs, utilization, etc

21
(No Transcript)
22
(No Transcript)
23
Comments Issues
  • Space very limited
  • Installing a new STK silo, moved all sys admins
    out
  • Now have no admins in same building as machine
    room
  • Plans to build a new Computer Center
  • Have always been lights-out

24
Future
  • Accelerator and experiment upgrades
  • Expect first data in 2006, full rate 2007
  • 100 MB/s data acquisition
  • 1 3 PB/year (1 PB raw, gt 1 PB simulated)
  • Compute clusters
  • Level 3 triggers
  • Reconstruction
  • Simulation
  • Analysis PWA can be parallelized, but needs
    access to very large reconstructed and simulated
    datasets
  • Expansion of LQCD clusters
  • 10 Tflops by 2005
Write a Comment
User Comments (0)
About PowerShow.com