SHARCNET 2 - PowerPoint PPT Presentation

About This Presentation
Title:

SHARCNET 2

Description:

'A multi-university and college, interdisciplinary institute ... BLAS, LAPACK, FFTW, PETSc, ... debugging, profiling, performance tools. Common between clusters ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 20
Provided by: lieMath
Category:
Tags: sharcnet | blas | brock

less

Transcript and Presenter's Notes

Title: SHARCNET 2


1
SHARCNET 2
  • Moving Forward

2
Partner Institutions
  • Academic
  • Brock University
  • McMaster University
  • University of Guelph
  • University of Ontario Institute of Technology
  • University of Waterloo
  • University of Western Ontario
  • University of Windsor
  • Wilfred Laurier University
  • York University
  • Research Institutes
  • Robarts Institute
  • Fields Institute
  • Perimeter Institute
  • Private Sector
  • Hewlett Packard
  • SGI
  • Quadrics Supercomputing World
  • Platform Computing
  • Nortel Networks
  • Bell Canada
  • Government
  • Canada Foundation for Innovation
  • Ontario Innovation Trust
  • Ontario RD Challenge Fund
  • Optical Regional Advanced Network of Ontario
    (ORANO)

3
Philosophy
  • A multi-university and college,
    interdisciplinary institute with
    academic-industry-government partnerships,
    enabling computational research in critical areas
    of science, engineering and business.
  • SHARCNET provides access to and support for high
    performance computing resources for the
    researcher community
  • Goals
  • reduce time to science
  • provision of otherwise unattainable compute
    resources
  • remote collaboration

4
SHARCNET Resources Three Perspectives
  • People
  • User support
  • System Administrator, HPC Analyst
  • Administrative support
  • Site Leader
  • Hardware
  • machines, processors, networking
  • Software
  • design, compilers, libraries, development tools

5
High Performance Computing Analyst
  • A point of contact for development support and
    education
  • central resource
  • analysts have natural areas of expertise---address
    issues to one with the requisite knowledge to
    best assist you
  • http//www.sharcnet.ca
  • Analysts role
  • development support
  • analysis of requirements
  • development/delivery of educational resources
  • research computing consultations

6
System Administrator
  • Administration and maintenance of installations
  • responsible for specific cluster(s)
  • typically focus on particular clusters or
    packages
  • Administrators role
  • user accounts
  • system software and middleware
  • hardware and software maintenance
  • research computing consultations

7
Site Leader
  • Liaison between SHARCNET and user community at a
    specific site
  • primary point of contact for the research
    community at a site
  • Site Leaders Role
  • site coordination
  • representative for local researchers
  • user comments and questions
  • event organization
  • political intrigue

8
Hardware ResourcesNetworking
  • Sites are interconnected by dedicated high
    bandwidth fiber links
  • fast access to all hardware regardless of
    physical location
  • common file access (distributing file systems)
  • shared resources
  • dedicated channel for Access Grid
  • 10Gbps/1Gbps dedicated connection between all
    sites

Installation All Sites

ETA Q4 2005
9
HardwareCapability Cluster
  • Architecture
  • substantial number of 64-bit CPUs emphasizing
    large, fast memory and high bandwidth/low latency
    interconnect
  • dual-processor systems (2-way nodes --- 1500
    compute cores)
  • Opteron processors, 4GB RAM per CPU, 70TB onsite
    disk storage
  • Interconnect
  • fast, extremely low latency, high bandwidth
    (Quadrics)
  • Intended use
  • large scale, fine grained parallel,
    memory-intensive MPI jobs

Installation McMaster University

ETA Q3 2005
10
HardwareUtility Parallel Cluster
  • Architecture
  • reasonable number of 64-bit CPUs with mid-range
    performance across the board for general purpose
    parallel applications
  • dual-processor/core systems (4-way nodes -- 1000
    compute cores)
  • Opteron processors, 2GB RAM per CPU, 70TB onsite
    disk storage
  • Interconnect
  • low latency, good bandwidth (InfiniBand/Myrinet/Qu
    adrics)
  • Intended use
  • small to medium scale MPI, arbitrary parallel
    jobs small scale SMP

Installation University of Guelph

ETA Q4 2005
11
HardwareThroughput Cluster
  • Architecture
  • large number of 64-bit CPUs in a standard
    configuration
  • dual-processor/core systems (4-way nodes ---
    3000 compute cores)
  • Opteron processors, 2GB RAM per CPU, 70TB onsite
    disk storage
  • Interconnect
  • standard network (gigabit Ethernet)
  • Intended use
  • serial or loosely-coupled, latency-tolerant
    parallel jobs
  • small-scale SMP

Installation University of Waterloo

ETA Q3 2005
12
HardwareSMP-Friendly Cluster
  • Architecture
  • moderate number of 64-bit CPUs in 'fat' nodes to
    suit small-scale SMP jobs
  • quad-processor systems (4-way nodes --- 384
    compute cores)
  • Opteron processors, 8GB RAM per CPU, 70TB onsite
    disk storage
  • Interconnect
  • good latency, high bandwidth (Quadrics)
  • Intended use
  • small to medium scale MPI, high memory/bandwidth
    parallel jobs
  • small-scale, high memory demand SMP

Installation University of Western Ontario

ETA Q3 2005
13
HardwareMid-range SMP System
  • Architecture
  • moderate number of CPUs with shared memory
  • 128 processors single system image (NUMA SMP)
  • Itanium2 processors, 256GB RAM, 4TB local disk
    storage
  • Interconnect
  • extremely low latency, high bandwidth (NUMAlink)
  • makes all memory shared among all processors
  • Intended use
  • moderate sized SMP jobs (OpenMP, pthreads, etc.)
  • jobs with very large memory requirements

Installation Wilfred Laurier University

ETA Q3 2005
14
HardwarePoint of Presence Clusters
  • Architecture
  • modest number of 64-bit CPUs configured as a
    general purpose cluster
  • dual-processor systems (2-way nodes --- 32
    compute cores)
  • Opteron processors, 2GB RAM per CPU, 4TB onsite
    disk storage
  • small number of visualization workstations
  • Interconnect
  • probably InfiniBand or Myrinet
  • Intended use
  • local storage, access and development
  • visualization, AccessGrid node

Installation All sites

ETA 2005
15
Software Resources
  • Compilers
  • C, C, Fortran
  • Key parallel development support
  • MPI (Message Passing Interface)
  • Multi-threading (pthreads, OpenMP)
  • Libraries and Tools
  • BLAS, LAPACK, FFTW, PETSc,
  • debugging, profiling, performance tools
  • Common between clusters
  • Some cluster specific tools

16
Unified Account System
  • User accounts unified across all clustersweb
  • Your files are available no matter which cluster
    you log into
  • PI accounts disabled if no research is reported
  • SHARCNET must report results to maintain our
    funding
  • Sponsor must re-enable subsidiary accounts
    annually
  • Prevent old student accounts from building up
  • Files will be archived
  • One account per person (no sharing!)
  • Accounts are free and easy to obtain

17
Filesystems
  • Single home directory visible on any machine
  • Per-cluster /work and /scratch per-node /tmp
  • /home quota is 200 MB (tentative)
  • Source code only
  • On raid file system
  • Will be backed up and replicated
  • put/get interface for archiving other files to
    long term storage
  • Environment variables to help users organize
    their work
  • ARCH CLUSTER SCRATCH WORK

18
Running jobs
  • Unified user commands submit, show, kill...
  • Same interface to scheduler on every cluster
  • Fairshare based on usage across all clusters
  • Ensures all users get fair access to all
    resources
  • Large projects can apply for cycle grants
  • Increased priority for a period of time
  • Priority bias for particular jobs, depending on
    the specialty of the cluster
  • Jobs should be directed to a cluster best suited
    to their requirements

19
Conclusion
  • Huge increase in resources
  • Common tools and interface on all clusters
  • Efficient access to all resources regardless of
    actual location
Write a Comment
User Comments (0)
About PowerShow.com