Farms and Datastorage - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

Farms and Datastorage

Description:

Babar working on Linux release for New Year. Runs close to capacity for significant periods. ... Start work shortly on new cheap Linux/IDE disk server for ... – PowerPoint PPT presentation

Number of Views:30

Avg rating:3.0/5.0

Slides: 34

Provided by: johng236

Category:

more less

Transcript and Presenter's Notes

Title: Farms and Datastorage

1
Farms and Datastorage

Andrew Sansum
CUF_at_UCL
November 1999

A.Sansum_at_rl.ac.uk
2
What Im going to Talk about

Linux Farm
Sun Service/BABAR JREI
HP Farm
NT Farm
Storage
New Year Plans
Security
Lifetime of HP Service

3
HEP Unix Services
?
100 Megabit Switched Network
4
Linux

Full Production Service Started in August
20 Dual Processors in production
Stability is now very good - only a couple of
minor problems
Occasional hangs on Front end - caused by ARLA
(AFS)
Occasional crashes on csflnx02 - caused by
network load
System monitoring and operations procedures in
place
Even a User Guide!

5
H/W and Software Configuration

SuperMicro Motherboard (with SCSI H/W
monitoring)
Dual Pentium II 450
10GB 5400rpm IDE HDA
256MB ECC memory
100Mbit Ethernet (tulip or Intel)

Redhat 5.2 (2.0.36)
CERN ASIS (5.1 vsn)
Generic NQS 3.50.5
ARLA 0.20-1 (AFS)
lm_sensors (monitoring)

6
Good Take Up

Production Service for H1 and ZEUS
Several theory users are heavy users
ATLAS, CMS and ALEPH codes also running
CDF and Minos have code ported
ANTARES - code now compiles on Linux
OPAL Planning to use Linux service soon
Babar working on Linux release for New Year.
Runs close to capacity for significant periods.

7
(No Transcript)
8
ATHLON Benchmarking

Evaluation 600MHz Athlon (aka AMD K7) system
available on CSF for several weeks.
Configuration
600MHz Athlon
Microstar 6167 motherboard
128 MB 100MHz memory
Athlon interesting because
Superior Floating Point Performance
Improved memory -gt CPU bandwidth (200MHz
effectively)
Probably cheaper - Faster CPUs at the moment
Multiprocessor options look interesting

9
Benchmark Results

Credits Peter Faulkner (H1), John Baines
(ATLAS), Stefano Moretti (RAL theory), Peter
Richardson (Oxford Theory), Glenn Patrick (CMS)
Caviats We are not exploiting Full H/W
potential
Compiler optimised for Pentium H/W not exploiting
the multiple floating point units. New compilers
coming.
Standard 100MHz memory - not DDR (Double Data
Rate)

10
Benchmark Results Athlon 600MHz / Pentium 450
11
Athlon Benchmark Conclusion

Good speedup already for floating point
application
Most HEP codes see no speedup over and above
clock frequency - mainly integer based
We will wait for dual/quad processor
motherboards, Faster memory and improved
compilers and try again.
Upgrade farm with Pentium III 600MHz dual
processors

12
PLANS

Order 30 Dual Pentium 600 Nodes ASAP. In service
early next year.
Provided demand exists and suitable H/W is
available - schedule the next capacity increase
for early in FY 2000 - April/May probably.
O/S Upgrade. What should we upgrade to and when.
RH 5.2 is old but some experiments still need it.
Evaluate new batch system - What do people think
about NQS??

13
Linux

Growth at Current Spend
Predicted Profile of Linux Capacity
14
Conclusion (Linux)

Farm is stable and well used
Further upgrade late this year.
Capacity will soon exceed ten times HP capacity
Will continue to grow
Big thank you to Alex Martin and Paul Dixon at
QMW for lots of useful advice!

15
Sun Service and BaBar JREI Disk Server

We already run Sun Enterprise 3500 (4 CPU),
mainly used by Babar but a few Atlas and CMS
users.
BaBar awarded large JREI award
A central server at RAL with several TB which
will receive data from SLAC.
Server and disk in 10 UK universities
Co-operating databases across the UK
One extra staff member to achieve this

16
Actual Equipment

5 Universities (Bristol, Edinburgh,Imperial,
Liverpool, Manchester) with

4 Universities (Birmingham, Brunel, QMW, RHBNC)
with

17
Setup at RAL (early experience)

Equipment delivered and installed
Filesystems limited to 1TB
used 4xA1000 gt 720GB
7 Million events brought from SLAC
E3500 acts as a front-end, E4500 holds data, both
runs batch jobs
E4500 also AMS server to other systems.
LSF Batch System on 2 Suns.

18
OOSS

Andy Hanushevsky (SLAC) visited in September and
installed his OOFS and OOSS
This provides a layer which interfaces
Objectivity to the Atlas Datastore (cf HPSS at
SLAC)
All the disk space runs under the control of OOSS
which acts as a cache manager

19
NT Farm Update

New hardware commissioned - October
10 dual-CPU (450MHz) PCs
Batch capacity increased fivefold
9 of new batch of PCs
total 27 CPUs - 9 _at_ 200 MHz 18 _at_ 450 MHz
New Front-end
1 PC
running NT Terminal Server
Operating - but not yet in production service
Issues with application installation in
multi-user environment

20
LHCb Work

LHCb now major user
Ported simulation environment to NT Farm
web front-end
Java servlets build scripts submit jobs
Simulation code SICB running
1000 events/job written to RAL datastore
100k events generated in testing
now entering first production phase
Can use full capacity of farm
about 100k events/week if queues kept full

21
Compute Farms HP

Farm remains well used, however I expect to see
demand fall shortly as move to Linux proceeds.
Will discuss our plans to close the HP farm later
Moving HP service out from core of cluster. No
longers acts as master fileserver node.
Recent O/S upgrade for Y2K compliance
More system development shortly to add disk
mirroring for interactive service

22
(No Transcript)
23
(No Transcript)
24
Disk Space

Home filesystem moved to new dedicated disk
server. Improved data integrity (RAID 5 Sun A1000
disk array). Disk quotas expanded for everyone.
Performance excellent.
Half Terrabyte disk space available for
experiments data storage. Contact us if you have
requirement.
Start work shortly on new cheap Linux/IDE disk
server for cheaper data storage. Testers wanted
soon!

25
Tape Service

Possible major upgrades to the Tape Robots are
being considered.
Aware of long term experiment requirements

26
New Year Closedown

CSF service will close from afternoon 31st
December.
Service will resume Tuesday 4th January

27
Security

Network security situation continues to
deteriorate.
New RAL Site security Policy
Places many obligations on system managers
Some changes to users working practices will be
needed.
Most significant change
LOAN OF PASSWORDS AND SHARED IDS WILL NEED TO BE
MANAGED DIFFERENTLY

28
Conclusions

Lots of changes taking place in the Service
Linux will grow rapidly
Major new 5TB disk server for Babar. Also begin
looking at cheap disk server solutions.
NT farm increased in capacity and running
production work for LHCB collaboration
HP Farm being moved from the heart of the
enterprise.
Planning for HP farm rundown commencing

29
RUNDOWN OF HP SERVICEWHY?WHAT?WHEN?

Andrew Sansum

30
WHY CLOSE THE HPS

By New Year Linux capacity will be at least ten
times HP capacity.
Bulk of experiments have either ported or will
soon port their code.
Fewer and fewer experiments support HP-UX
Because of need to maintain security we are
unable to freeze service.
Thus while HPs continue to run they will consume
manpower which would otherwise go into new
developments.

31
What Should We Close

Irrespective of eventual closedown dates we will
reduce batch capacity if load falls.
Raises average performance of cluster and reduces
system management overhead.
Question is when should we switch off the last of
the machines

32
When should we close the HPs

Not decided yet! Waiting for comments!!
So far a number of people have stated that their
migration to Linux is not finished but they
expect to be completed within 6 months to one
year.
Two people have said its useful having HPs to
test code releases but only if someone else had a
real requirement.
One experiment states that they have a continuing
requirement (duration unspecified)

33
Conclusion

We will run the service while there is a genuine
demand that CNAP believe is worth committing the
effort to support
Wont close the service if sufficient demand
So far very little evidence of such a need
Expect service to be at least severely truncated
within 12 months.
Hopefully make announcement before Christmas

Write a Comment

User Comments (0)