Title: SHARCNET 2
1SHARCNET 2
2Partner Institutions
- Academic
- Brock University
- McMaster University
- University of Guelph
- University of Ontario Institute of Technology
- University of Waterloo
- University of Western Ontario
- University of Windsor
- Wilfred Laurier University
- York University
- Research Institutes
- Robarts Institute
- Fields Institute
- Perimeter Institute
- Private Sector
- Hewlett Packard
- SGI
- Quadrics Supercomputing World
- Platform Computing
- Nortel Networks
- Bell Canada
- Government
- Canada Foundation for Innovation
- Ontario Innovation Trust
- Ontario RD Challenge Fund
- Optical Regional Advanced Network of Ontario
(ORANO)
3Philosophy
- A multi-university and college,
interdisciplinary institute with
academic-industry-government partnerships,
enabling computational research in critical areas
of science, engineering and business. - SHARCNET provides access to and support for high
performance computing resources for the
researcher community - Goals
- reduce time to science
- provision of otherwise unattainable compute
resources - remote collaboration
4SHARCNET Resources Three Perspectives
- People
- User support
- System Administrator, HPC Analyst
- Administrative support
- Site Leader
- Hardware
- machines, processors, networking
- Software
- design, compilers, libraries, development tools
5High Performance Computing Analyst
- A point of contact for development support and
education - central resource
- analysts have natural areas of expertise---address
issues to one with the requisite knowledge to
best assist you - http//www.sharcnet.ca
- Analysts role
- development support
- analysis of requirements
- development/delivery of educational resources
- research computing consultations
6System Administrator
- Administration and maintenance of installations
- responsible for specific cluster(s)
- typically focus on particular clusters or
packages - Administrators role
- user accounts
- system software and middleware
- hardware and software maintenance
- research computing consultations
7Site Leader
- Liaison between SHARCNET and user community at a
specific site - primary point of contact for the research
community at a site - Site Leaders Role
- site coordination
- representative for local researchers
- user comments and questions
- event organization
- political intrigue
8Hardware ResourcesNetworking
- Sites are interconnected by dedicated high
bandwidth fiber links - fast access to all hardware regardless of
physical location - common file access (distributing file systems)
- shared resources
- dedicated channel for Access Grid
- 10Gbps/1Gbps dedicated connection between all
sites
Installation All Sites
ETA Q4 2005
9HardwareCapability Cluster
- Architecture
- substantial number of 64-bit CPUs emphasizing
large, fast memory and high bandwidth/low latency
interconnect - dual-processor systems (2-way nodes --- 1500
compute cores) - Opteron processors, 4GB RAM per CPU, 70TB onsite
disk storage - Interconnect
- fast, extremely low latency, high bandwidth
(Quadrics) - Intended use
- large scale, fine grained parallel,
memory-intensive MPI jobs
Installation McMaster University
ETA Q3 2005
10HardwareUtility Parallel Cluster
- Architecture
- reasonable number of 64-bit CPUs with mid-range
performance across the board for general purpose
parallel applications - dual-processor/core systems (4-way nodes -- 1000
compute cores) - Opteron processors, 2GB RAM per CPU, 70TB onsite
disk storage - Interconnect
- low latency, good bandwidth (InfiniBand/Myrinet/Qu
adrics) - Intended use
- small to medium scale MPI, arbitrary parallel
jobs small scale SMP
Installation University of Guelph
ETA Q4 2005
11HardwareThroughput Cluster
- Architecture
- large number of 64-bit CPUs in a standard
configuration - dual-processor/core systems (4-way nodes ---
3000 compute cores) - Opteron processors, 2GB RAM per CPU, 70TB onsite
disk storage - Interconnect
- standard network (gigabit Ethernet)
- Intended use
- serial or loosely-coupled, latency-tolerant
parallel jobs - small-scale SMP
Installation University of Waterloo
ETA Q3 2005
12HardwareSMP-Friendly Cluster
- Architecture
- moderate number of 64-bit CPUs in 'fat' nodes to
suit small-scale SMP jobs - quad-processor systems (4-way nodes --- 384
compute cores) - Opteron processors, 8GB RAM per CPU, 70TB onsite
disk storage - Interconnect
- good latency, high bandwidth (Quadrics)
- Intended use
- small to medium scale MPI, high memory/bandwidth
parallel jobs - small-scale, high memory demand SMP
Installation University of Western Ontario
ETA Q3 2005
13HardwareMid-range SMP System
- Architecture
- moderate number of CPUs with shared memory
- 128 processors single system image (NUMA SMP)
- Itanium2 processors, 256GB RAM, 4TB local disk
storage - Interconnect
- extremely low latency, high bandwidth (NUMAlink)
- makes all memory shared among all processors
- Intended use
- moderate sized SMP jobs (OpenMP, pthreads, etc.)
- jobs with very large memory requirements
Installation Wilfred Laurier University
ETA Q3 2005
14HardwarePoint of Presence Clusters
- Architecture
- modest number of 64-bit CPUs configured as a
general purpose cluster - dual-processor systems (2-way nodes --- 32
compute cores) - Opteron processors, 2GB RAM per CPU, 4TB onsite
disk storage - small number of visualization workstations
- Interconnect
- probably InfiniBand or Myrinet
- Intended use
- local storage, access and development
- visualization, AccessGrid node
Installation All sites
ETA 2005
15Software Resources
- Compilers
- C, C, Fortran
- Key parallel development support
- MPI (Message Passing Interface)
- Multi-threading (pthreads, OpenMP)
- Libraries and Tools
- BLAS, LAPACK, FFTW, PETSc,
- debugging, profiling, performance tools
- Common between clusters
- Some cluster specific tools
16Unified Account System
- User accounts unified across all clustersweb
- Your files are available no matter which cluster
you log into - PI accounts disabled if no research is reported
- SHARCNET must report results to maintain our
funding - Sponsor must re-enable subsidiary accounts
annually - Prevent old student accounts from building up
- Files will be archived
- One account per person (no sharing!)
- Accounts are free and easy to obtain
17Filesystems
- Single home directory visible on any machine
- Per-cluster /work and /scratch per-node /tmp
- /home quota is 200 MB (tentative)
- Source code only
- On raid file system
- Will be backed up and replicated
- put/get interface for archiving other files to
long term storage - Environment variables to help users organize
their work - ARCH CLUSTER SCRATCH WORK
18Running jobs
- Unified user commands submit, show, kill...
- Same interface to scheduler on every cluster
- Fairshare based on usage across all clusters
- Ensures all users get fair access to all
resources - Large projects can apply for cycle grants
- Increased priority for a period of time
- Priority bias for particular jobs, depending on
the specialty of the cluster - Jobs should be directed to a cluster best suited
to their requirements
19Conclusion
- Huge increase in resources
- Common tools and interface on all clusters
- Efficient access to all resources regardless of
actual location