HighPerformance Computing on the Windows Server Platform - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

HighPerformance Computing on the Windows Server Platform

Description:

Commoditized HPC Systems are Affecting Every Vertical ... Shuttle _at_ NewEgg.com. Sun HPC10000. Cray Y-MP C916. System. 2005. 1998. 1991 ... – PowerPoint PPT presentation

Number of Views:138

Avg rating:3.0/5.0

Slides: 36

Provided by: downloadM

Category:

more less

Transcript and Presenter's Notes

Title: HighPerformance Computing on the Windows Server Platform

1
High-Performance Computing on the Windows Server
Platform

Marvin Theimer
Software ArchitectWindows Server HPC
Grouphpcinfo _at_ microsoft.com
Microsoft Corporation

2
Session Outline

Brief introduction to HPC
Definitions
Market trends
Overview of V1 version of Windows Server 2003 CCE
Features
System architecture
Key challenges for future HPC systems
Too many factors affect performance
Grid computing economics
Data management

3
Brief Introduction to HPC
4
Defining High Performance Computing (HPC)
HPC Definition Using compute resources to solve
computationally intensive problems
Different Platforms for Achieving Results
HPC Role in Science
Computational Modeling
Sensors
Persist (DB, FS, ..)
Technical andScientificComputing
HPC Use
Mining
Interpretation
5
Cluster HPC Scenario
Head Node
User Mgmt
Cluster Mgmt
Resource Mgmt
Job Mgmt
Web service
Job
Policy, reports
User
Web page
Admin
Management
Input
Cmd line
Job
Sensors, Workflow, Computation
Data
Data mining, Visualization, Workflow Remote query
DB or FS
Cluster Node
High speed, low latency interconnect (1GE,
Infiniband, Myricom)
Job Mgr
User App
MPI
Resource Mgr
Node Mgr
6
Top 500 Supercomputer Trends
Clusters over 50
Industry usage is rising
GigE is gaining
IA is winning
7
Commoditized HPC Systems are Affecting Every
Vertical

Leverage Volume Markets of Industry Standard
Hardware and Software.
Rapid Procurement, Installation and Integration
of systems
Cluster Ready Applications Accelerating Market
Growth
Engineering
Bioinformatics
Oil Gas
Finance
Entertainment
Government/Research

The convergence of affordable high performance
hardware and commercial apps is making
supercomputing a mainstream market
8
Supercomputing Yesterday vs. Today
9
Cheap, Interactive HPC Systems Are Making
Supercomputing Personal
Grids of personal departmental clusters
Personal workstations departmental servers
Minicomputers
Mainframes
10
The Evolving Nature of HPC
11
Windows Server HPC
12
Windows based HPC Today

Technical Solution
Partner Driven Solution Stack

LSF
PBSPro
DataSynapse
MSTI
Management
Parallel Applications
Applications
MPI/Pro
MPICH-1.2
WMPI
MPI-NT
Middleware
WINDOWS
Visual Studio
OS
TCP
Protocol
Gigabit Ethernet
Fast Ethernet
Interconnect
Intel (32bit 64bit) AMD x64
Platform

Ecosystem
Partnerships with ISV to develop on Windows
Partnership with Cornell Theory Center

13
What Windows-based HPC needs to provide

Users require
An integrated supported solution stack leveraging
the Windows infrastructure
Simplified job submission, status and progress
monitoring
Maximum compute performance and scalability
Simplified environment from desktops to HPC
clusters
Administrators require
Ease of setup and deployment
Better cluster monitoring and management for
maximum resource utilization
Flexible, extensible, policy-driven job
scheduling and resource allocation
High availability
Secure process startup and complete cleanup
Developers Require
Programming environment that enables high
productivity
Availability of optimized compilers (Fortran) and
math libraries
Parallel debugger, profiler, and visualization
tools
Parallel programming models (MPI)

14
V1 Plans

Introduce compute cluster solution
Windows Server 2003 Compute Cluster Edition based
on Windows Server 2003 SP1 x64 Standard Edition
Features for Job Management, IT Admin and
Developers
Build partner eco-system around the Windows
Server Compute Cluster Edition from day one
Establish Microsoft credibility in the HPC
community
Create worldwide Centers of Innovation

15
Technologies

Platform
Windows Server 2003 SP1 64 bit Edition
x64 processors (Intel EM64T AMD Opteron)
Ethernet, Ethernet over RDMA and Infiniband
support
Administration
Prescriptive, simplified cluster setup and
administration
Scripted, image-based compute node management
Active Directory based security, impersonation
and delegation
Cluster-wide job scheduling and resource
management
Development
MPICH-2 from Argonne National Labs
Cluster scheduler accessible via DCOM, http, and
Web Services
Visual Studio 2005 Compilers, Parallel Debugger
Partner delivered compilers and libraries

16
Windows HPC Environment
Microsoft Operations Manager
Head Node
Active Directory
User Mgmt
Cluster Mgmt
Resource Mgmt
Job Mgmt
Web service
Job
Policy, reports
User
Web page
Admin
Management
Input
Cmd line
Job
Sensors, Workflow, Computation
Windows Server 2003, Compute Cluster Edition
Data
Data mining, Visualization, Workflow Remote query
DB or FS
Cluster Node
High speed, low latency interconnect (Ethernet
over RDMA, Infiniband)
Job Mgr
User App
MPI
Resource Mgr
Node Mgr
17
Architectural Overview
User Workstation
Cluster
Application
Job Scripts
Data
WSE
Head Node
COM
Windows XP
Job Sched UI
Job Scheduler
HTTP
GigE
X86/64
Disk
IIS6
WSE3
MSDE
RIS
AD
Whidbey
Developer Workstation
Windows Server 2003 CCE
Application
SFU
HPC SDK MPI Sched WS Policy API
COM
Compilers
Libs
WSE
Whidbey
WSE
HTTP
Cluster Nodes
Cluster Nodes
Windows XP
Node Manager
Node Manager
GigE
X86/64
Disk
HPC Application
HPC Application
MPI-2
MPI-2
Legend
MPI-2
MPI-2
TCP
SHM
WSD/SDP
TCP
SHM
WSD/SDP
Application
Windows Server 2003 CCE
Windows Server 2003 CCE
3rd Party
GigE/RDMA
Infiniband
GigE/RDMA
Infiniband
Windows OS
MS Component
HPC Component
18
Key Challenges for Future HPC Systems
19
Difficult to Tune Performance

Example Tightly-coupled MPI applications
Very sensitive to network performance
characteristics
Communication times measured in microseconds
O(10 usecs) for interconnects such as Infiniband
O(100 usecs) for GigE
OS network stack is a significant factor Things
like RDMA can make a big difference
Excited about the prospects of industry-standard
RDMA hardware
We are working with InfiniBand and GigE vendors
to ensure our stack supports them
Driver quality is an important facet
We are supporting the OpenIB initiative
Considering the creation of a WHQL program for
InfiniBand
Very sensitive to mismatched node performance
Random OS activities can add millisecond delays
to microsecond communication times

20
Need self-tuning systems

Application configuration has a significant
impact
Incorrect assumptions about hardware/communication
s architecture can dramatically affect
performance
Choice of communication strategy
Choice of communication granularity
Tuning is an end-to-end issue
OS support
ISV library support
ISV application support

21
Computational Grid Economics

What 1 will buy you (roughly)
Computers cost 1000 (roughly)
? 1 cpu day ( 10 Tera-ops) 1
(roughly, assuming 3 yr use cycle)
? 10TB network transfer costs 1
(roughly, assuming 1Gbps interconnect)
Internet bandwidth costs roughly 100
/mbps/month (not including routers and
management)
? 1GB network transfer costs 1 (roughly)
Some observations
HPC cluster communication is 10,000x cheaper
than WAN communication
Break-even point for instructions computed per
byte transferred
Cluster O(1) instrs/byte
WAN O(10,000) instrs/byte

22
Computational Grid Economics Implications

Small data, high compute applications work well
across the Internet, such as SETI_at_home and
Folding_at_home
MPI-style parallel, distributed applications work
well in clusters and across LANs, but are
uneconomic and do not work well in wide-area
settings
Data analysis is usually best done by moving the
programs to the data, not the data to the
programs.
Move questions and answers, not petabyte-scale
datasets
The Internet is NOT the cpu backplane (Internet-2
will not change this)

23
Exploding Data Sizes

Experimental data TBs ? PBs
Modeling data
Today 10s to 100s of GB is the common case
Tomorrow TBs
Near-future example CFD simulation of a turbine
engine
109 mesh nodes, each containing 16
double-precision variables
? 128 GB / time-step
Simulate 1000s of time steps ? 100s TBs /
simulation
Archived for future reference

24
Whole-System Modeling and Workflow

Today mostly about computation
Stand-alone static simulations of individual
parts/phenomena
Mostly batch
Simple workflows short, deterministic pipelines
(though some are massively parallel)
Future mostly about data that is produced and
consumed by computational steps
Dynamic whole-system modeling via multiple,
interacting simulations
More complex workflows (don't yet know how
complex)
More interactive analysis
More sharing

25
Whole-System Modeling Example Turbine Engine

Interacting simulations
CFD simulation of dynamic airflow through turbine
FE stress analysis of engine wing parts
"Impedance" issues between various simulations
(time steps, meshes, ...)
Serial workflow steps
Crack analysis of engine wing parts
Visualization of results

26
Interactive Workflow Example

Base CFD simulation produces huge output
Points of interest may not be easy to find
Find and then focus on important details
Data analysis/mining of output
Restart simulation at a desired point in
time/space.
Visualize simulation from that point forward.
Modify simulation from that point forward (e.g.
higher fidelity)

27
Data Analysis and Mining

Traditional approach
Keep data in flat files
Write C or Perl programs to compute specific
analysis queries
Problems with this approach
Imposes significant development times
Scientists must reinvent DB indexing and query
technologies
Results from the astronomy community
Relational databases can yield speed-ups of one
to two orders of magnitude
SQL application/domain-specific stored
procedures greatly simplify creation of analysis
queries

28
Combining Simulation with Experimental Data Drug
Discovery

Clinical trial database describes toxicity side
effects observed for tested drugs.
Simulation searches for candidate compounds that
have a desired effect on a biological system.
Clinical data searched for drugs that contain a
candidate compound or "near neighbor" toxicity
results retrieved and used to decide if the
candidate compound should be rejected or not.

29
Sharing

Simulations (or ensembles of simulations) mostly
done in isolation
No sharing except for archival output
Some coarse-grained sharing
Check-out/check-in of large components
Example automotive design
Check-out component
CAE-based design simulation of component
Check-in with design rule checking step
Data warehouses typically only need
coarse-grained update granularity
Bulk or coarse-grained updates
Modeling simulations done in the context of
particular versions of the data
Audit trails and reproducible workflows becoming
increasingly important

30
Data Management Needs

Cluster file systems and/or parallel DBs to
handle I/O bandwidth needs of large, parallel,
distributed applications
Data warehouses for experimental data and
archived simulation output
Coarse-grained geographic replication to
accommodate distributed workforces and workflows
Indexing and query capabilities to do data mining
analysis
Audit trails, workflow recorders, etc.

31
Windows HPC Roadmap
32
Call To Action

IHVs
Develop Winsock Direct drivers for your RDMA
cards
Automatically let our MPI stack take advantage of
low latency
Develop support for diskless scenarios (e.g.
iScsi)
OEMs
Offer turn-key clusters
Pre-wired for management and RDMA networks
Support boot from net diskless scenarios
Support WS-Management
Consider noise and power requirements for
personal and workgroup configurations

33
Community Resources

Windows Hardware Driver Central (WHDC)
www.microsoft.com/whdc/default.mspx
Technical Communities
www.microsoft.com/communities/products/default.msp
x
Non-Microsoft Community Sites
www.microsoft.com/communities/related/default.mspx
Microsoft Public Newsgroups
www.microsoft.com/communities/newsgroups
Technical Chats and Webcasts
www.microsoft.com/communities/chats/default.mspx
www.microsoft.com/webcasts
Microsoft Blogs
www.microsoft.com/communities/blogs

34
Related WinHEC Sessions

TWNE05005 Winsock Direct Value
Proposition-Partner Concepts
TWNE05006 Implementing Convergent
Networking-Partner Concepts

35
To Learn More

Microsoft
Microsoft HPC website http//www.microsoft.com/hp
c/
Other Sites
CTC Activities http//cmssrv.tc.cornell.edu/ctc/w
inhpc/
3rd Party Windows Cluster Resource Centre
www.windowsclusters.org
HPC related-links web site http//www.microsoft.c
om/windows2000/hpc/miscresources.asp
Some useful articles presentations
Supercomputing in the Third Millenium, by
George Spix http//www.microsoft.com/windows2000
/hpc/supercom.asp
Introduction of the book Beowulf Cluster
Computing with Windows by Thomas Sterling,
Gordon Bell, and Janusz Kowalik
Distributed Computing Economics, by Jim Gray
MSR-TR-2003-24 http//research.microsoft.com/res
earch/pubs/view.aspx?tr_id655
Web Services, Large Databases, and what
Microsoft is doing in the Grid Computing Space,
presentation by Jim Gray http//research.microsof
t.com/Gray/talks/WebServices_Grid.ppt
Send questions to hpcinfo _at_ microsoft.com