HPCC Status, 4172009

1 / 39

About This Presentation

Title:

HPCC Status, 4172009

Description:

Construction continues, should be done by May ... pre-emption. HPCC Meeting. 35. Interested? Contact Kelly Osborn at kosborn_at_msu.edu ... – PowerPoint PPT presentation

Number of Views:22

Avg rating:3.0/5.0

Slides: 40

Provided by: billp74

more less

Transcript and Presenter's Notes

Title: HPCC Status, 4172009

1
HPCC Status, 4/17/2009

Buyin
Scheduling

2
Changes in HPCC world

SGI bankruptcy and subsequent purchase by
Rackable
Apparent end of Western Scientific
Sun is rumored to be up for sale (IBM the latest)
Economy is stressing many companies

3
Changes in OUR HPCC world

Construction continues, should be done by May
Working on a number of software issues, in
particular networking
Systems have been running fairly busy in the
90-95 recently, overall over 75.

4
Recent issues

SGI SMP
Green lost memory DIMM, went down
White (Green frontend) went down
Weird power loss (50 seconds, 300 kvA offline)
Construction/planned downtime Thur.

5
Recent changes

new lustre file system online
new user file system online (allows for Samba
mounts!)
NFS 4 running on Brody and infrastructure (help
with networking problems)
better testing suite to find problems sooner

6
Coming soon

a single disk image (same copy of the OS) is
being developed to be run on every system. It
will make using the different clusters much
easier
the environment will be the same on every system
(cluster, fat nodes, whatever)
ssh test-amd05, soon will have an intel version
as well.

7
GLCPC

www.greatlakesconsortium.org
recently had a survey which 7 MSU members had
filled out (thanks!)
will hold summer sessions, likely remotely to the
various institutions
Really want to find out who will use the Blue
Waters machine and what they need to know to do
so.
Please let me know any questions.

8
Staff plus issues

Staff will present some of the present issues the
Center is working on

9
Home Directory StorageEd Kryda, Manager

Currently 100TB available / 50GB default quota
Sun X4540 customized
Performance 200 MB/s write, 1 GB/s read Max
Initial reliability issues
NFS v4
Samba/CIFS file sharing!
snapshots

10
Lustre StorageGreg Mason, System Administrator

Old Lustre retirement 5/1/09 (/mnt/lustre)
Eventually repurposed
New Shared Scratch Space (/mnt/ls09)
33 TB
/mnt/lustre_scratch_2009
/mnt/scratch
ONLY TEMPORARY FILES
Future automatic deletion

11
User Education and AssistanceDirk Colbry,
Academic Specialist
http//wiki.hpcc.msu.edu/

Research Collaborations
System Level Debugging
System Level Testing
University Level Training Classes
Research Group Level Training Classes
Face-to-Face Individual Training and Debugging
Up-to-date Documentation

12
Better testingJim Leikert, System Administrator

New scripts for testing node health
New measures to keep jobs in line
Job state messages
Slowly being rolled out

13
User vignettesKelly Osborn, Administrative
Assistant

improves our public face
currently have 12 vignettes
looking for additional research to showcase
kosborn_at_msu.edu

14
SMP and WhiteAndy Keen, System Administrator

SMP off support, down twice, repairing by hand
White was down to two processors
SMP days numbered
Need to transition to newer fat nodes
it will require recompilation to use the new
library links. Queue brody_4s

15
Shorter term issues

Discussion items

16
Buy replacement SMP nodes

We have previously discussed buying replacement
nodes
Sweet spot is a box with 32 cores, 256GB
Would like to buy on the order of 4-5 of these as
replacements for the SMP
Note you would have to recompile!
same OS image as the clusters however!
Your opinions? Wed like to buy soon.

17
more storage

roll our own has been a lot of work.
Transition to NFS4 has improved performance and
reliability but we need more storage
Continue with the cheaper, expandable version or
go with a turnkey solution (such as NetApp)?

18
Rack
Chassis
Nodes
Processors / Sockets
Cores
Examples
19
Job Scheduling Example
Queue
ID1
of cores
duration
ID2
Priority
ID3
ID4
ID5
Current Jobs
New Schedule
1
Node 1
Node 1
4
Backfill
Current Time
Current Time
20
Isolating long running jobs

Working now on isolating long running jobs.
long running jobs clog the nodes, especially
long running, single cpu jobs.
users would prefer to run on a single node for
better efficiency

21
Current Scheduling Problem
1 week
Node
Current Time

Long single core jobs take over nodes.
Middle sized jobs (8-64 cores) can not be
reliably scheduled on dedicated nodes.
Very large core jobs can not be scheduled at all.

22
Changing the scheduling of long jobs

We propose grouping long-term jobs in the system
Could involve capping the number
For example, reserve ¼ of each cluster (128
256 for 384)
Improve scheduling of larger jobs, with potential
few side effects
Discussion?

23
Discussion, buy-in priority
24
Reinstitute buy-in

Would like to reinstitute buy-in, users buying
nodes to be run by the HPCC
the recent renovations allow for expansion of the
centers facilities for shared, HPCC
infrastructure (no user hosting!)
we believe there are many users with equipment
money who would like to buy-in

25
Rack
Chassis
Nodes
Processors / Sockets
Cores
Examples
26
Users will buy chassis

Increment of a chassis for purchase
Price to be determined, but roughly 1000/core
box will be 8 or 16 cores, depends on deals and
prices
example deal 8 core Nehalem, 48 GB memory, about
8000 (varies).
better deals with larger purchases

27
HPCC will provide

support the hardware, networking, disks, power
and cooling
software, OS, access
3 or 5 years (need feedback)
most support contracts are 3 years, could be 5
but there are issues with this

28
HPCC will also purchase chassis

HPCC does have some funds to purchase general use
nodes as well
For the next 5 years will continue to expand
within the bounds of ICER budget.
However, ICER budget is sliding scale, providing
more support, less hardware, over time.

29
Priority scheduling of buy-in

These are points of discussion
need your feedback.
Couple of models, all of which allow non-used
nodes to get scheduled for larger jobs, but still
give buy-in users access

30
First, really two systems

HPCC provides public nodes for anyone with an
HPCC account to schedule
first come, first served (mostly)
The researchers who buy-in would have reserved
access to their nodes, and the slack of other
buy-in users
no general scheduling in this part of the system
(mostly)

31
In the buy-in system, three issues

How quickly
How many
How long

32
How quickly Purdue model

guarantee access to number of purchased nodes
within X hours (could be 1 hour, 4 hours, 8
hours). Purdue is now 4
Buy-in users can get more than they ask for if
they dont run longer than 4 hours (1 hour, 8
hours, ).
cannot guarantee the big job will go within some
time period, but the timeslice above provides
an opportunity

33
How many Dial-in nodes

Users can dial-in how many nodes of those
purchased they need within some time slice (1
day, 1 week, )
Dialing-in low gets higher priority or future
credit, but other nodes now available for
larger jobs outside of what was purchased
Must have a reasonable time-slice to get good
scheduling (a week?)

34
How long Dial-in Area

Buy-in users get their nodes 24x7
Could use the area model under some timeslice.
For example
You bought 100cores x168hours (1 week timeslice)
Could use 200cores x 84hours (then wait 84 hours
before you schedule again)
Only resets every timeslice

35
Others on buy-in nodes

We would still like to keep utilization up on
buy-in nodes, so it would is possible that
general users get access to those nodes under two
conditions
very short time jobs (especially single cpu)
pre-emption

36
Interested?

Contact Kelly Osborn at kosborn_at_msu.edu
Required Information
Account Number
Approximate Amount (unit amount unknown)
Deadlines on spending?
Contact name

37
Short, single cpu jobs

Very short jobs can be used as backfill in the
scheduler to fill holes
if short, no one has to wait very long (5 minutes
say)
only if there is slack in the schedule
low priority

38
Preemption

Jobs that label themselves are preemptible get
very high priority and can run anywhere at
anytime
preemptible means that they can be stopped at any
time
once stopped, re-queued at high priority
User must recover state of stopped job!

39
What about non buy-in researchers