The Sharing and Training of HPC Resources at the University of Arkansas - PowerPoint PPT Presentation

About This Presentation
Title:

The Sharing and Training of HPC Resources at the University of Arkansas

Description:

MEMPHIS. MONROE. 25-July-2006. AREON. Arkansas Research and Education Optical Network ... MEMPHIS. MONROE. AREON. Arkansas Research and Education Optical Network ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 24
Provided by: amya4
Category:

less

Transcript and Presenter's Notes

Title: The Sharing and Training of HPC Resources at the University of Arkansas


1
The Sharing and Training of HPC Resources at the
University of Arkansas
  • Amy Apon, Ph.D.
  • Oklahoma Supercomputing Symposium
  • October 4, 2006

2
Outline of Talk
  • HPC at the University of Arkansas
  • Current status
  • A New Mechanism for Sharing Resources
  • AREON
  • HPC Training
  • New course delivery via HDTV collaboration with
    LSU
  • Collaboration opportunities and challenges
  • GPNGrid and SURAGrid
  • Resource allocation issues

3
High Performance Computing Resourcesat the
University of Arkansas
  • Red Diamond supercomputer
  • NSF MRI grant, August, 2004
  • Substantial University match
  • Substantial gift from Dell
  • First supercomputer in Arkansas
  • Number 379 on the Top 500 list, June, 2005
  • 128 node (256 processor), 1.349 TFlops

4
More Resources
  • Prospero cluster
  • 30 dual processor PIII nodes
  • SURAGrid resource
  • Ace cluster
  • 4 dual processor Opteron
  • Our entry point to the GPNGrid/Open Science Grid
  • Trillion cluster
  • 48 dual processor Opteron
  • Owned by Mechanical Engineering
  • About 1TFlop

5
  • How are we doing

6
We are seeing research results
  • Computational Chemistry and Materials Science
    (NSF)
  • New formulas for new drugs
  • Nanomaterials
  • Chemistry, Physics, Mechanical Engineering
  • Over 95 of our usage is in these areas

7
Research results in other areas, also
  • Multiscale Modeling
  • DNA Computing
  • Middleware and HPC Infrastructure
  • Tools for managing data for large-scale
    applications (NSF)
  • Performance modeling of grid systems (Acxiom)

8
We have done some significant upgrades
  • For the first year we used SGE on half the
    computer and half of the computer was
    self-scheduled PVM jobs
  • LSF scheduler installed
  • May 2006
  • About 60 users, about 10 very active users

9
Thanks to LSF, we are busy
  • LSF Daily Pending Parallel Job Statistics
  • by Queue (jobs waiting)

10
And jobs have to wait
  • LSF Hourly Turnaround Time of
  • Normal Queue

11
  • We have something exciting to share

12
AREON
Arkansas Research and Education Optical Network
TULSA
Fayetteville
Jonesboro
Russellville
Fort Smith
Conway
MEMPHIS
We ARE ON !
Little Rock
Pine Bluff
Arkadelphia
Monticello
Magnolia
DALLAS
25-July-2006
MONROE
13
AREON
Arkansas Research and Education Optical Network
  • The first bond issue (last fall) failed
  • Governor Huckabee of Arkansas granted 6.4M
  • (PI Zimmerman)
  • MBO loop between Tulsa and Fayetteville fiber is
    in place, network hardware is being shipped
  • The campus (last mile) connections are in
    progress
  • All is on target for a demo to the Governor on
    12/5/06!

TULSA
Fayetteville
Jonesboro
Russellville
Fort Smith
Conway
MEMPHIS
Little Rock
Pine Bluff
Arkadelphia
Monticello
Magnolia
DALLAS
MONROE
14
AREON
Arkansas Research and Education Optical Network
  • This fall, Uark will have connectivity to
    Internet2 and
  • the National Lambda Rail
  • The bond issue is on the ballot again this
    coming fall
  • If it passes then the other research
    institutions
  • will be connected to AREON
  • We hope this happens!
  • The timeframe for this is about a year and a
    half

15
Opportunities for collaboration with OneNet,
LEARN, LONI, GPN, and others
16
A Demonstration Application
  • High Performance Computing
  • New course in Spring 2007
  • In collaboration with LSU and Dr.
    Thomas Sterling
  • We are exploring new methods of course delivery
    using streaming high-definition TV
  • We expect about 40 students at five locations
    this time
  • Taught live via Access Grid and HDTV
  • over AREON and LONI,
  • A test run for future delivery of HPC education

17
Collaboration via GPN Grid
  • Active middleware collaboration for almost 3
    years
  • GPNGrid is in the process of making application
    as a new Virtual Organization in Open Science
    Grid
  • Sponsored by University of Nebraska Lincoln,
    includes participants from Arkansas, UNL,
    Missouri, KU, KSU, OU
  • Hardware grant from Sun and NSF provide 4 small
    Opteron clusters for the starting grid
    environment
  • Applications are in the process of being defined

18
Collaboration via SURA Grid
  • Uark has a 30-node Pentium cluster in SURAGrid
  • Some differences with GPN
  • CA is different
  • Account management, discovery stacks are
    different
  • AUP policy is different
  • SURA Grid applications are increasing. Uark can
    run coastal
  • modeling and is open to running other SURA
    applications

19
More Collaboration Mechanisms
  • Arkansas is participating with the recently
    awarded CI-TEAM award to OU, PI Neeman
  • Will deploy Condor across Oklahoma and with
    participating collaborators
  • LSF Multicluster provides another mechanism for
    collaboration
  • AREON will give the University of Arkansas great
    bandwidth

20
UofA Current HPC Challenges
  • We have some I/O infrastructure challenges
  • The system was designed to have a large amount of
    storage, but it is not fast
  • Supercomputing operations
  • AC, power, and UPS need to be upgraded
  • Funding models for on-going operations
  • How will basic systems administration and project
    director be funded?

21
Collaboration and sharing bring a challenge
  • Usage policies
  • How do you partition usage fairly
  • among existing users?
  • How do you incorporate usage from new faculty?
  • Current policy uses a fair-share scheduling
    policy.
  • Dynamic Priority
  • ( shares) / (slotsF1 cpu_timeF2
    run_timeF3)
  • Shares divided among largest users groups chem,
    phys, others

22
Collaboration and sharing bring a challenge
  • Are max run times needed?
  • Almost everyone has them
  • Requires checkpointing of jobs
  • which is hard to do with our current I/O
    infrastructure
  • Requires user education and a change of culture
  • Are user allocations and accounting of usage
    needed?
  • Your suggestions here

23
Questions?
  • Contact information
  • http//hpc.uark.edu
  • http//comp.uark.edu/aapon
  • aapon_at_uark.edu
Write a Comment
User Comments (0)
About PowerShow.com