Title: Datacentre Management
1Datacentre Management HPC practices
- Dr. P. Sambath Narayanan
- Senior technology Architect
- Customer Experience centre
- Sun Microsystems
2The Networked Data Center
No longer about just deploying more
technology Computer Centre has a strategic role
in the Organization IT infrastructure must
provide competitive advantage Data is perceived
as an asset The right data, right place, right
time, right cost
3 The Growing Challenge
- Decreasing costs of HPC class computing allows
more systems level research to be done in all
fields - However, the complexity of today's HPC
environments keep many from fully participating - Open standards allow innovation across
communities of researchers -
4Leveraging InnovationThroughout Sun
Sun Grid
Sun JavaStorEdgeTMSoftware
SolarisTM 10 Operating System
Ubiquity
JavaEnterpriseSystem
Data ServicesPlatform
Sun StorEdgeTMSAM-FS QFSSoftware
JXTA
RISC
NetworkIdentity
J2EE, J2ME
NFS
XML
TCP/IP inEvery System
Customize
Standardize
Utilize
1980
1990
2000
5Suns Open Source Initiatives
Now on Java.netmonthly snapshots New JRL
license, open dialog
500M and 3,000 person/years largest EVER
contributed body of code
J2SE 6
Over 1,300 projects, 18 communities Hosts
JSRs,over 110 user groups
850 Members, 250 JSRs 3 Complete J2EE/J2SE
versions 2 Complete J2ME versions
7.5M Lines of code,2nd largest contribution
EVER(after Solaris) Translated into 45 languages
First Java IDE to support J2SE 5.0 language
features
Over 150 members Sun 1st Liberty-enabled
IdentityManagement offering 400M
Liberty-enabled identitiesand clients forecast
by Y.E. 2005
6 The Growing Challenge
- Grand challenge problems are too big for any
single institution -
7(No Transcript)
8(No Transcript)
9(No Transcript)
10Towards A Single Campus Grid
Untapped resources are available for everyone.
11(No Transcript)
12(No Transcript)
13(No Transcript)
14(No Transcript)
1563 Less Power Consumption
Sun Fire X4100
1470W
550W
vs.
16(No Transcript)
1775 Smaller Rackmount Size
Sun Fire X4100
1U
4U
vs.
18(No Transcript)
19What is InfiniBand (IB)?
- High performance interconnect
- High bandwidth - 8/16 Gb/s today, roadmap to 96
Gb/s - Low latency - less than 10 microseconds
- Low overhead - RDMA transport engine moves data
reliably between applications - Becoming standard in HPC
- Datacenter - Distributed DBs (Oracle RAC) other
apps, rack systems
20IB in Solaris
- Basic IB infrastructure in Solaris 10
- Continuing development driven by
- New interface HW with improved capabilities
- RAS requirements
- New services, e.g storage connection
- Parity with IB on Linux - OpenIB
21N1 Definition
Software and services for lifecycle management of
compute services and infrastructure
22N1 Methodology
ReleaseManagement
Billing
ServiceManagement
ChangeManagement
Orchestration
BusinessServices
Network
Storage
Servers
23N1 System Manager Provision
Discover bare metal servers Discover systems
with OS Provision Solaris, Linux Create
multiple OS Profiles Provision Solaris patches
Provision RedHat RPMs Provision and update
firmware
24N1 System Manager Monitor
Hardware monitoring OS Monitoring Monitor server
reachability Define thresholds Log Events Send
notifications Industry standards (SNMP, IPMI)
25N1 System Manager Manage
Remote power on/off Remote command
execution Hybrid UI Scriptable CLI Role-based
access control Remote serial console
26Hybrid UI
27ILOM Integrated Lights Out Manager
- Lights Out Management for Sun Fire systems
- Provides full local or remote access for setup,
maintenance and on-going monitoring/management of
a single system - Full remote KVM functionality
- Including remote media support
- Browser-based UI and full CLI
- Access via Management Ethernet port,
Serial port or Host OS (with suitable driver) - Standards supported include HTTPS, LDAP, SSH 2.0,
SNMP v1, v2c, v3, IMPI 2.0, DMTF 'SMASH' CLI
28OS Options
The No. 1 Unix Operating System. Available for
All Sun Systems.
Red Hat and SuSE Enterprise LinuxAvailable for
All Sun x64 Systems.
All Sun x64 Systems are certified to run
Microsoft Windows
29Solaris 10 Blazing Performance
27 Performance World Records V20Z's to
E25K's Multiple Workloads Single Core
Platforms Multi Core Platforms
30Solaris 10 Enterprise-class Ecosystem
2,300,000 Licenses 1,000's of Applications 400
Supported Platforms Follow the Sun
Support Guaranteed Compatibility Vibrant Open
Source Community
31Solaris 10 and Sun's Opteron Servers
- Performance
- Platform specific optimizations
- Optimized Memory management
- 20 years of Multi-thread-tuning
- Near-linear scalability
- DTrace for massive performance opportunities
- Consolidation
- Limitless partitioning with one license
- Multi-Core Support
- Tools, Predictive Self-Healing, Scheduler
- Compatibility Guaranteed
32Enterprise-class x64 Features
Dynamic Tracing (DTrace) Solaris
Containers Predictive Self-Healing ZFS Secure
Execution Open Source Application Stack
33Sun Grid Rack System
Grids Made Easy
- Easy and fast way to buy and deploy grids
- Integrated Racks directly from Sun's factory
- Any combination of x64 servers
- Sample configurations with rules derived from
real grid experience - Several HPTC sample configurations
34Sun Grid Rack System
Updated for Sun Fire x2100, Sun Fire x4100 Sun
Fire x4200 Servers
- Easy-to-use web configurator
- Sample Configurations for industry applications
- Server nodes
- Sun Fire x2100, Sun Fire x4100, Sun Fire x4200
servers - Solaris 10 OS for X86, or Linux
- Infrastructure
- Interconnect 3rd Party (Cisco and others)
- Software Sun N1 System Manager, Sun N1 Grid
Engine - Web Services option N1000 series switches, Sun
N1 Service Provisioning System, Sun Java
Enterprise System
35Sun Grid Rack System
Sun Customer Ready Systems Program Sun CRS
Delivered
- Higher Quality and Lower Risk
- Helps reduce cost and risk in the deployment of
horizontally scaled architectures - Agile Deployment
- Accelerate the deployment of grid-enabled
applications by up to 90, and and reduce initial
installation issues by up to 80 - Higher Utilization
- With Sun N1 Grid Engine software, customers
experience up to 90 system utilization rate - Easier to Manage
- Redefining the entire rack system as the building
block for the grid - Lower Power and Cooling Costs
36Faster risk analysis,30 less heat
New Energy uses Sun Fire x64 servers and Solaris
10 to create a compute grid for faster Monte
Carlo analysis, while generating 30 less heat
than competing alternatives.
Logo Here
37HPC Cluster based on Opteron
Industry Standard Design
Lower Power and Cooling Costs
Enterprise-ClassFeatures
HPC cluster based on Opteron architecture
High Performance
Advanced Remote Management
Flexible Choices of Multiple OSes
38Sun Top500 Systems, 6-2005
- 37, USC, V60x, 2640 Xeon CPU (also some IBM
Xeon and Dell x64) - Fell from 31 despite increase from 5.7-gt7.2 TF
- 109, Nottingham, UK, V20z, 1024 Opteron CPU
- 172, Aachen, Germany, SF25K, 672 USIV CPU
- 347, Idaho National Labs, V20z, 460 Opteron CPU
- 404, DLR Germany, V20z, 384 Opteron CPU
- 446, Cambridge UK, SF15K, 900 USIII CPU
39You Can Participate Too
- Sun HPC Consortium at SC05
- Seattle, November 2005
- Sun Application Tuning Seminar HPCC
- Aachen Germany, Spring 2006
- Sun HPC Consortium at CCGrid
- Singapore, May 2006
40Sun's HPC Solution Center in OregonRibbon-Cutting
November, 2005
- Demonstrates renewed Sun focus on Technical
Computing and return to TOP500 leadership - 6 TFLOPS (1536 cores of x64) for large-scale
cluster testing and benchmarks - Deployed in record time with Sun Grid Rack System
- Offers customers more flexibility and choice
Puts Sun on short list of Terascale compute
vendors
41Datacentre Management HPC practices
- Dr. P. Sambath Narayanan
- sambath.narayanan_at_sun.com
42FS/Volume Model vs. ZFS
ZFS I/O Stack
FS
ZFS
- Object-Based Transactions
- Make these 7 changesto these 3 objects
- All-or-nothing
- Block Device Interface
- Write this block,then that block, ...
- Loss of power loss of on-disk consistency
- Workaround journaling,which is slow complex
DMU
- Transaction Group Commit
- Again, all-or-nothing
- Always consistent on disk
- No journal not needed
Volume
- Block Device Interface
- Write each block to each disk immediately to keep
mirrors in sync - Loss of power resync
- Synchronous and slow
Storage Pool
- Transaction Group Batch I/O
- Schedule, aggregate,and issue I/O at will
- No resync if power lost
- Runs at platter speed
43The Sun x64 Advantage
- Enterprise focus
- World record performance w/ near linear
scalability - Design features streamlining RAS for the x64
space - Superior energy-efficiency advantage over
Intel-based systems - Dual core ready
- Only x64 single RU server with a list price below
750 USD - An integrated system solution from a single
vendor - Hardware
- Networking
- Middleware
- Solaris OS