Farms and Datastorage - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Farms and Datastorage

Description:

Babar working on Linux release for New Year. Runs close to capacity for significant periods. ... Start work shortly on new cheap Linux/IDE disk server for ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 34
Provided by: johng236
Category:

less

Transcript and Presenter's Notes

Title: Farms and Datastorage


1
Farms and Datastorage
  • Andrew Sansum
  • CUF_at_UCL
  • November 1999

A.Sansum_at_rl.ac.uk
2
What Im going to Talk about
  • Linux Farm
  • Sun Service/BABAR JREI
  • HP Farm
  • NT Farm
  • Storage
  • New Year Plans
  • Security
  • Lifetime of HP Service

3
HEP Unix Services
?
100 Megabit Switched Network
4
Linux
  • Full Production Service Started in August
  • 20 Dual Processors in production
  • Stability is now very good - only a couple of
    minor problems
  • Occasional hangs on Front end - caused by ARLA
    (AFS)
  • Occasional crashes on csflnx02 - caused by
    network load
  • System monitoring and operations procedures in
    place
  • Even a User Guide!

5
H/W and Software Configuration
  • SuperMicro Motherboard (with SCSI H/W
    monitoring)
  • Dual Pentium II 450
  • 10GB 5400rpm IDE HDA
  • 256MB ECC memory
  • 100Mbit Ethernet (tulip or Intel)
  • Redhat 5.2 (2.0.36)
  • CERN ASIS (5.1 vsn)
  • Generic NQS 3.50.5
  • ARLA 0.20-1 (AFS)
  • lm_sensors (monitoring)

6
Good Take Up
  • Production Service for H1 and ZEUS
  • Several theory users are heavy users
  • ATLAS, CMS and ALEPH codes also running
  • CDF and Minos have code ported
  • ANTARES - code now compiles on Linux
  • OPAL Planning to use Linux service soon
  • Babar working on Linux release for New Year.
  • Runs close to capacity for significant periods.

7
(No Transcript)
8
ATHLON Benchmarking
  • Evaluation 600MHz Athlon (aka AMD K7) system
    available on CSF for several weeks.
  • Configuration
  • 600MHz Athlon
  • Microstar 6167 motherboard
  • 128 MB 100MHz memory
  • Athlon interesting because
  • Superior Floating Point Performance
  • Improved memory -gt CPU bandwidth (200MHz
    effectively)
  • Probably cheaper - Faster CPUs at the moment
  • Multiprocessor options look interesting

9
Benchmark Results
  • Credits Peter Faulkner (H1), John Baines
    (ATLAS), Stefano Moretti (RAL theory), Peter
    Richardson (Oxford Theory), Glenn Patrick (CMS)
  • Caviats We are not exploiting Full H/W
    potential
  • Compiler optimised for Pentium H/W not exploiting
    the multiple floating point units. New compilers
    coming.
  • Standard 100MHz memory - not DDR (Double Data
    Rate)

10
Benchmark Results Athlon 600MHz / Pentium 450
11
Athlon Benchmark Conclusion
  • Good speedup already for floating point
    application
  • Most HEP codes see no speedup over and above
    clock frequency - mainly integer based
  • We will wait for dual/quad processor
    motherboards, Faster memory and improved
    compilers and try again.
  • Upgrade farm with Pentium III 600MHz dual
    processors

12
PLANS
  • Order 30 Dual Pentium 600 Nodes ASAP. In service
    early next year.
  • Provided demand exists and suitable H/W is
    available - schedule the next capacity increase
    for early in FY 2000 - April/May probably.
  • O/S Upgrade. What should we upgrade to and when.
    RH 5.2 is old but some experiments still need it.
  • Evaluate new batch system - What do people think
    about NQS??

13
Linux






Growth at Current Spend
Predicted Profile of Linux Capacity
14
Conclusion (Linux)
  • Farm is stable and well used
  • Further upgrade late this year.
  • Capacity will soon exceed ten times HP capacity
  • Will continue to grow
  • Big thank you to Alex Martin and Paul Dixon at
    QMW for lots of useful advice!

15
Sun Service and BaBar JREI Disk Server
  • We already run Sun Enterprise 3500 (4 CPU),
    mainly used by Babar but a few Atlas and CMS
    users.
  • BaBar awarded large JREI award
  • A central server at RAL with several TB which
    will receive data from SLAC.
  • Server and disk in 10 UK universities
  • Co-operating databases across the UK
  • One extra staff member to achieve this

16
Actual Equipment
  • RAL
  • 5 Universities (Bristol, Edinburgh,Imperial,
    Liverpool, Manchester) with
  • 4 Universities (Birmingham, Brunel, QMW, RHBNC)
    with

17
Setup at RAL (early experience)
  • Equipment delivered and installed
  • Filesystems limited to 1TB
  • used 4xA1000 gt 720GB
  • 7 Million events brought from SLAC
  • E3500 acts as a front-end, E4500 holds data, both
    runs batch jobs
  • E4500 also AMS server to other systems.
  • LSF Batch System on 2 Suns.

18
OOSS
  • Andy Hanushevsky (SLAC) visited in September and
    installed his OOFS and OOSS
  • This provides a layer which interfaces
    Objectivity to the Atlas Datastore (cf HPSS at
    SLAC)
  • All the disk space runs under the control of OOSS
    which acts as a cache manager

19
NT Farm Update
  • New hardware commissioned - October
  • 10 dual-CPU (450MHz) PCs
  • Batch capacity increased fivefold
  • 9 of new batch of PCs
  • total 27 CPUs - 9 _at_ 200 MHz 18 _at_ 450 MHz
  • New Front-end
  • 1 PC
  • running NT Terminal Server
  • Operating - but not yet in production service
  • Issues with application installation in
    multi-user environment

20
LHCb Work
  • LHCb now major user
  • Ported simulation environment to NT Farm
  • web front-end
  • Java servlets build scripts submit jobs
  • Simulation code SICB running
  • 1000 events/job written to RAL datastore
  • 100k events generated in testing
  • now entering first production phase
  • Can use full capacity of farm
  • about 100k events/week if queues kept full

21
Compute Farms HP
  • Farm remains well used, however I expect to see
    demand fall shortly as move to Linux proceeds.
  • Will discuss our plans to close the HP farm later
  • Moving HP service out from core of cluster. No
    longers acts as master fileserver node.
  • Recent O/S upgrade for Y2K compliance
  • More system development shortly to add disk
    mirroring for interactive service

22
(No Transcript)
23
(No Transcript)
24
Disk Space
  • Home filesystem moved to new dedicated disk
    server. Improved data integrity (RAID 5 Sun A1000
    disk array). Disk quotas expanded for everyone.
    Performance excellent.
  • Half Terrabyte disk space available for
    experiments data storage. Contact us if you have
    requirement.
  • Start work shortly on new cheap Linux/IDE disk
    server for cheaper data storage. Testers wanted
    soon!

25
Tape Service
  • Possible major upgrades to the Tape Robots are
    being considered.
  • Aware of long term experiment requirements

26
New Year Closedown
  • CSF service will close from afternoon 31st
    December.
  • Service will resume Tuesday 4th January

27
Security
  • Network security situation continues to
    deteriorate.
  • New RAL Site security Policy
  • Places many obligations on system managers
  • Some changes to users working practices will be
    needed.
  • Most significant change
  • LOAN OF PASSWORDS AND SHARED IDS WILL NEED TO BE
    MANAGED DIFFERENTLY

28
Conclusions
  • Lots of changes taking place in the Service
  • Linux will grow rapidly
  • Major new 5TB disk server for Babar. Also begin
    looking at cheap disk server solutions.
  • NT farm increased in capacity and running
    production work for LHCB collaboration
  • HP Farm being moved from the heart of the
    enterprise.
  • Planning for HP farm rundown commencing

29
RUNDOWN OF HP SERVICEWHY?WHAT?WHEN?
  • Andrew Sansum

30
WHY CLOSE THE HPS
  • By New Year Linux capacity will be at least ten
    times HP capacity.
  • Bulk of experiments have either ported or will
    soon port their code.
  • Fewer and fewer experiments support HP-UX
  • Because of need to maintain security we are
    unable to freeze service.
  • Thus while HPs continue to run they will consume
    manpower which would otherwise go into new
    developments.

31
What Should We Close
  • Irrespective of eventual closedown dates we will
    reduce batch capacity if load falls.
  • Raises average performance of cluster and reduces
    system management overhead.
  • Question is when should we switch off the last of
    the machines

32
When should we close the HPs
  • Not decided yet! Waiting for comments!!
  • So far a number of people have stated that their
    migration to Linux is not finished but they
    expect to be completed within 6 months to one
    year.
  • Two people have said its useful having HPs to
    test code releases but only if someone else had a
    real requirement.
  • One experiment states that they have a continuing
    requirement (duration unspecified)

33
Conclusion
  • We will run the service while there is a genuine
    demand that CNAP believe is worth committing the
    effort to support
  • Wont close the service if sufficient demand
  • So far very little evidence of such a need
  • Expect service to be at least severely truncated
    within 12 months.
  • Hopefully make announcement before Christmas
Write a Comment
User Comments (0)
About PowerShow.com