Title: The Sharing and Training of HPC Resources at the University of Arkansas
1The Sharing and Training of HPC Resources at the
University of Arkansas
- Amy Apon, Ph.D.
- Oklahoma Supercomputing Symposium
- October 4, 2006
2Outline of Talk
- HPC at the University of Arkansas
- Current status
- A New Mechanism for Sharing Resources
- AREON
- HPC Training
- New course delivery via HDTV collaboration with
LSU - Collaboration opportunities and challenges
- GPNGrid and SURAGrid
- Resource allocation issues
3High Performance Computing Resourcesat the
University of Arkansas
- Red Diamond supercomputer
- NSF MRI grant, August, 2004
- Substantial University match
- Substantial gift from Dell
- First supercomputer in Arkansas
- Number 379 on the Top 500 list, June, 2005
- 128 node (256 processor), 1.349 TFlops
4More Resources
- Prospero cluster
- 30 dual processor PIII nodes
- SURAGrid resource
- Ace cluster
- 4 dual processor Opteron
- Our entry point to the GPNGrid/Open Science Grid
- Trillion cluster
- 48 dual processor Opteron
- Owned by Mechanical Engineering
- About 1TFlop
5 6We are seeing research results
- Computational Chemistry and Materials Science
(NSF) - New formulas for new drugs
- Nanomaterials
- Chemistry, Physics, Mechanical Engineering
- Over 95 of our usage is in these areas
7Research results in other areas, also
- Multiscale Modeling
- DNA Computing
- Middleware and HPC Infrastructure
- Tools for managing data for large-scale
applications (NSF) - Performance modeling of grid systems (Acxiom)
8We have done some significant upgrades
- For the first year we used SGE on half the
computer and half of the computer was
self-scheduled PVM jobs - LSF scheduler installed
- May 2006
- About 60 users, about 10 very active users
9Thanks to LSF, we are busy
- LSF Daily Pending Parallel Job Statistics
- by Queue (jobs waiting)
10And jobs have to wait
- LSF Hourly Turnaround Time of
- Normal Queue
11-
- We have something exciting to share
12AREON
Arkansas Research and Education Optical Network
TULSA
Fayetteville
Jonesboro
Russellville
Fort Smith
Conway
MEMPHIS
We ARE ON !
Little Rock
Pine Bluff
Arkadelphia
Monticello
Magnolia
DALLAS
25-July-2006
MONROE
13AREON
Arkansas Research and Education Optical Network
- The first bond issue (last fall) failed
- Governor Huckabee of Arkansas granted 6.4M
- (PI Zimmerman)
- MBO loop between Tulsa and Fayetteville fiber is
in place, network hardware is being shipped - The campus (last mile) connections are in
progress - All is on target for a demo to the Governor on
12/5/06!
TULSA
Fayetteville
Jonesboro
Russellville
Fort Smith
Conway
MEMPHIS
Little Rock
Pine Bluff
Arkadelphia
Monticello
Magnolia
DALLAS
MONROE
14AREON
Arkansas Research and Education Optical Network
- This fall, Uark will have connectivity to
Internet2 and - the National Lambda Rail
- The bond issue is on the ballot again this
coming fall - If it passes then the other research
institutions - will be connected to AREON
- We hope this happens!
- The timeframe for this is about a year and a
half
15Opportunities for collaboration with OneNet,
LEARN, LONI, GPN, and others
16A Demonstration Application
- High Performance Computing
- New course in Spring 2007
- In collaboration with LSU and Dr.
Thomas Sterling - We are exploring new methods of course delivery
using streaming high-definition TV - We expect about 40 students at five locations
this time - Taught live via Access Grid and HDTV
- over AREON and LONI,
- A test run for future delivery of HPC education
17Collaboration via GPN Grid
- Active middleware collaboration for almost 3
years - GPNGrid is in the process of making application
as a new Virtual Organization in Open Science
Grid - Sponsored by University of Nebraska Lincoln,
includes participants from Arkansas, UNL,
Missouri, KU, KSU, OU - Hardware grant from Sun and NSF provide 4 small
Opteron clusters for the starting grid
environment - Applications are in the process of being defined
18Collaboration via SURA Grid
- Uark has a 30-node Pentium cluster in SURAGrid
- Some differences with GPN
- CA is different
- Account management, discovery stacks are
different - AUP policy is different
- SURA Grid applications are increasing. Uark can
run coastal - modeling and is open to running other SURA
applications
19More Collaboration Mechanisms
- Arkansas is participating with the recently
awarded CI-TEAM award to OU, PI Neeman - Will deploy Condor across Oklahoma and with
participating collaborators - LSF Multicluster provides another mechanism for
collaboration - AREON will give the University of Arkansas great
bandwidth
20UofA Current HPC Challenges
- We have some I/O infrastructure challenges
- The system was designed to have a large amount of
storage, but it is not fast - Supercomputing operations
- AC, power, and UPS need to be upgraded
- Funding models for on-going operations
- How will basic systems administration and project
director be funded?
21Collaboration and sharing bring a challenge
- Usage policies
- How do you partition usage fairly
- among existing users?
- How do you incorporate usage from new faculty?
- Current policy uses a fair-share scheduling
policy. - Dynamic Priority
- ( shares) / (slotsF1 cpu_timeF2
run_timeF3) - Shares divided among largest users groups chem,
phys, others
22Collaboration and sharing bring a challenge
- Are max run times needed?
- Almost everyone has them
- Requires checkpointing of jobs
- which is hard to do with our current I/O
infrastructure - Requires user education and a change of culture
- Are user allocations and accounting of usage
needed? - Your suggestions here
23 Questions?
- Contact information
- http//hpc.uark.edu
- http//comp.uark.edu/aapon
- aapon_at_uark.edu