Title: Windows Server Platform: Overview and Roadmap
1Windows Server PlatformOverview and Roadmap
- Sean McGrane
- Program Manager
- Windows Server Platform Architecture
Groupsmcgrane _at_ microsoft.com - Microsoft Corporation
2Session Outline
- Server Hardware Trends
- Technology and Processor Trends
- Form Factors Blade Servers
- Windows Longhorn Server Direction
- Reliability
- Hardware error handling
- Hardware partitioning
- Application Consolidation
- Virtualization
- Call to Action
- Resources
3Server Technology Trends
- Processors
- More processing units per physical package
- In this presentation 1P means one physical
processor - Point-to-point bus architecture
- Direct attached memory
- Memory capacity continues to increase
- Memory technology is a feature of the processor
- Fully Buffered DIMM (FBD) by 2007/2008
- I/O moves to PCI Express
- Increased IO bandwidth and reliability
- Firmware
- Increased adoption of Extensible Firmware
Interface (EFI) - Platforms
- Increased adoption of blades for 1P/2P
application loads - Scale up moves to the commodity space
- Large number of processing units on high-end
servers (256 or more)
4Processor Trends
WindowsLonghorn Server
Server 2003 Compute Cluster Edition
Server 2003 SP1
Higher number of cores per processor
All new server processors are 64-bit capable
Core
Core
Core
Core
Performance
Cache
Dual Core
Dual Thread
Quad Core
Core
Core
Cache
Time
5What will customers do with Multi-Core?
- Typical application scaling cant keep up
- 1P and 2P servers are often under utilized today
- Future 1P servers will be more compute capable
than todays 8P - Few customer loads fully utilize an 8P server
today - Application consolidation will be the volume
solution - Multiple application loads deployed to each
processor - Scale up apps can be accommodated on volume
servers - How will form factors be affected?
- IO memory capability must match compute
capability - IO expansion isn't available in today's volume
server form factors - Larger form factors may be required for these
servers - Can RAS scale with performance?
- Consolidation and scale-up apps raise RAS
requirements - Mid- to high-end RAS features are needed on
volume servers
6Typical Blade Platform Today
Chassis midplane
Compute Blades
Network switches
FC switches
Chassis Management Module (CMM)
- Current models are typically 6U to 7U chassis
with 10 to 14 1P/2P x64 blades - Each blade is like a server motherboard
- IDE/SCSI attached disks, network and IO Daughter
card on the blade - Midplane is passive routing is very complex IO
switches provided in the chassis - SAN attached rate is high, 40
- Initial problems with adoption
- Costs were too high
- Limited vendor network switches available
- Data center infrastructure not ready, cabling,
management, power, etc - Aggregated server management potential not
achieved - Proprietary interfaces to the management module
- Static blade configuration
- OS state on the blade complicates repurposing
7Future Blade Platform
Compute Blades
Chassis midplane
Network IO/switches
FC IO/switches
Switch
Chassis Management Module (CMM)
- Similar chassis configuration, e.g. 6U to 7U
chassis with 10 to 14 1P/2P x64 blades - The compute blade becomes stateless
- All IO and direct attached disks are removed
- Consolidated storage on FC or iSCSI SAN
- More reliable storage solution, reduces cost and
simplifies management - Simplifies blade failover and repurposing
- The chassis contains a set of configurable
components - The midplane is PCIe only and contains a
programmable PCIe switch - All IO devices and switches are at the far end of
the midplane - The CMM programs the PCIe switch to assign IO to
compute blades, i.e. configure servers - Aggregated server management potential is
realized - Standardized management interfaces implemented in
the CMM - Flexible and dynamic configuration of blade
servers - Simplified server repurposing on error Failed
components can be configured out
8Blade Support - Remote Boot
- Microsoft supports remote boot with Server 2003
- Supported for both FC and iSCSI SAN
- SAN boot requires a Host Bus Adapter (HBA)
- Windows install processes work with this
configuration - iSCSI creates a new low end SAN market
- Software initiated install and boot is complex
- A low-cost HBA is a simpler approach
- Enables faster time to market solution
- Provides a solution for exiting OSs, e.g. Server
2003 - SAN management is too complex
- Must be simplified to create a volume solution
- Simple SAN program addresses this simplification
- Packaged SAN solutions with a single point of
management - Initial focus is simplifying SAN deployment
- SAN boot simplification is a longer term goal
9Power and Cooling
- Processor power ratings and server density
continue to rise - High-end processors will have 130W footprint
- Blade servers can populate up to 168 procs per
rack - Existing data center infrastructure cant cope
- At 65-95W per sq foot, can supply about 6-7KW per
rack - A single fully loaded blade chassis can be rated
at gt5KW - Power management can help
- Processor p-states supported in Server 2003 SP1
- Balances power consumption to real time
utilization - Transparent to the user and applications
- Can lower processor power consumption up to 30
- More is needed, new power initiatives are
emerging - More efficient power supplies with monitoring
capability - Silicon advances to reduce processor power
leakage - Tools to accurately rate server power
- Power and cooling are a huge customer problem
- Power management alone can't solve the problem
- Upgrades to legacy data center infrastructure
will be required
10Longhorn Server Platform Direction
- Move the industry to 64-bit (x64) Windows
- Compatibility for 32-bit apps on x64
- Broad coverage for 64-bit drivers
- Enable Windows on Itanium for scale up solutions
- Consolidate multiple applications per server
- Homogeneous consolidation for file, print, web,
email, etc - Virtualization for heterogeneous low to mid-scale
application loads - Hardware partitions for heterogeneous scale up
application loads - Improve Reliability, Availability, and
Serviceability - Hardware error handling infrastructure
- Enhanced error prediction and redundant hardware
features - Continue progress on Windows performance
- Improved support for Windows operation on an
iSCSI or FC SAN
11Windows Hardware Error Architecture (WHEA)
- Motivation - Improve reliability of the server
- Consolidation raises server RAS requirements
- Server 2003 bugcheck analysis
- 10 are diagnosed as hardware errors
- Others exhibit corruption that could be hardware
related - Hardware errors are a substantial problem on
server - Silent hardware errors are a big concern
- OS participation in error handling is
inconsistent - Improved OS integration can raise server RAS
level - Goals
- Provide information for all hardware error events
- Make the information available to management
software - Reduce mean time to recovery for fatal errors
- Enable preventative maintenance using health
monitoring - Reduce crashes using error prediction and
recovery - Utilize standards based hardware, e.g. PCIe AER
12WHEA The Problem
- Lack of coordinated hardware error handling
- Disparate error sources with distinct mechanisms
- Error signaling and processing is architecture
specific - Poor I/O error handling capability improved with
PCIe AER - Lack of OS integration lowers server RAS
- Lack of a common data format restricts OS
participation - No mechanism to discover error sources
- Some hardware errors are not reported to the OS
- No way to effectively utilize platform-specific
capabilities - WHEA is a common hardware error handling
infrastructure for Windows - Error source identification, configuration and
management - Common hardware error flow in Windows
- Platform driver model to provide
hardware/firmware abstraction - Common hardware error record format for all
platforms - Standard interface to persist error records
- Hardware error events provided to management
software
13Dynamic Hardware Partitioning (DHP)
Longhorn dynamic hardware partitioning features
are focused on improving server RAS
Memory
Memory
Memory
Memory
Service Processor
. . .
. . .
. . .
. . .
Partition Manager
PCI Express
Future Hardware Partitionable Server
1. Partition Manager provides the UI for
partition creation and management 2. Service
Processor controls the inter processor and IO
connections
3. Hardware partitioning to the socket level.
Virtualization for sub socket partitioning 4.
Support for dynamic hardware addition and
replacement in Longhorn Server
14DHP Hot Addition
- Addition of hardware to a running partition with
no downtime - Processors, memory and IO subsystems may be added
- Scenarios supported by Hot Addition
- Expansion of server compute resources
- Addition of I/O extension units
- Enable unused capacity in the server
- Hot Addition sequence
- Hardware is physically plugged into the server
- Administrator or management software initiates a
Hot Addition - The firmware initiates an ACPI Hot Add notify to
the OS in the partition - The OS reads the ACPI tables and utilizes the
unit described by the notify - Operations are not transparent to applications or
device drivers - A notification API will be made available for
both user and kernel mode - Drivers cannot assume hardware resources are
static - Units are added permanently
- To subsequently remove the unit requires a reboot
of the partition
15DHP Hot Replace
- A processor/memory unit is replaced with a
redundant spare - Implemented with no OS downtime
- The details of the Hot Replace sequence are being
defined - System requirements
- One or more spare units in the server
- Hardware assistance can improve efficiency of the
swap process - Scenarios supported with no downtime
- Replacement of a unit initiated by hardware
failure prediction - Replacement of a unit by service engineers during
maintenance - Hot Replace sequence
- Administrator or management software initiates a
Hot Replace - A spare unit is brought online and mapped into
the partition view - FW initiates an ACPI replace notify to the OS
which identifies the unit - The context of the unit to be replaced is
migrated to the spare unit - The OS provides notification once the operation
is completed - Firmware maps out the replaced hardware without
interruption to the OS - The OS completes the initialization of the new
processors and continues - The operation is transparent to applications and
device drivers
16Microsoft View on Partitioning
- Used for server consolidation
- Server consolidation hosting multiple
application loads on a single server - Microsoft offers homogeneous consolidation
programs for - File, print, email, web, database, etc
- Heterogeneous side by side application execution
is problematic - Applications tend to collide with each other
- Testing is required to validate different
application combinations - Partitioning offers out of the box server
consolidation solutions - Hardware Partitions
- High levels of isolation and reliability with low
perf overhead - Ideal for scale up application consolidation
- Granularity of hardware is large Removal of
hardware is very complex - Software Partitions (Virtualization)
- Preferred direction for application consolidation
- Flexible partition configuration granular
dynamic Resource Management - Ideal solution for consolidation of volume
Windows applications - Future Direction
- Provide a hypervisor based virtualization
solution - Expand the application environments supported
under virtualization
17Virtualization and Hardware Partitions
Software Partitions using Virtual Server (VS) 2005
Hardware Partitions
App
App
App
App
App
App
NT4
Win2K
Win2K3
Win2K3
NT4
Win2K
Virtual Server
Virtual Server
App
Windows Host OS
Windows Host OS
Win2K3
Windows compliant partitionable server
Windows compliant server
- Volume 32-bit application solution
- Out of the box consolidation
- Heterogeneous OS/App consolidation
- Supported on standard servers
- Highly flexible and configurable solution
- 64-bit Host support with VS 2005 SP1
- Host OS model not preferred forproduction
deployment
- Hardware partitioning provides physical isolation
- Software partitions may be used within a hardware
partition - Enables software partitions and scale up
application consolidation on a single server - Requires partitionable hardware
18Virtualization Futures
- OS virtualization layer replaced by a thinner
hypervisor layer - Significant reduction in performance overhead and
maintenance - Mutli-processor support in the guest environment
- 64-bit hypervisor to enable scaling
- Devices can be assigned to a partition
- Requires isolation protection support in the
hardware (IO Virtualization) - Partitions can share assigned device resource
with other partitions - Higher levels of reliability and availability
- Snapshot of guest environment with no downtime
enables high availability solutions - WHEA provides hardware health monitoring and
higher levels of RAS - Guests can be moved between physical servers with
no downtime - Granular and dynamic management of hardware
resources - Management becomes a key differentiator in this
environment - Enables heterogeneous high-availability and
legacy production application consolidation on a
non-hardware partitioned server
App
App
App
App
App
App
Win2K
Win2K3
Win2K3
Longhorn
Win2K3
Win2K
Hypervisor
Windows compliant server
Storage
Storage
Network
Storage
Storage
Network
19Call to Action
- Server vendors
- Consider the effect of multi core on volume
servers - Consider hardware partitions on mid range servers
- Provide management flexibility in blade chassis
- Implement power saving technologies
- Provide WHEA extensions to improve server RAS
- Implement dynamic hardware partitioning features
to improve RAS - Implement emerging virtualization hardware
assists - Device vendors
- Provide 64-bit drivers for all devices
- Validate compatibility in a dynamic hardware
environment - ISVs hardware management
- Implement to emerging standards based management
interfaces - Provide flexible blade chassis management
- Utilize emerging power management standards
- Provide enhanced RAS features based on WHEA
information
20Community Resources
- Windows Hardware Driver Central (WHDC)
- www.microsoft.com/whdc/default.mspx
- Technical Communities
- www.microsoft.com/communities/products/default.msp
x - Non-Microsoft Community Sites
- www.microsoft.com/communities/related/default.mspx
- Microsoft Public Newsgroups
- www.microsoft.com/communities/newsgroups
- Technical Chats and Webcasts
- www.microsoft.com/communities/chats/default.mspx
- www.microsoft.com/webcasts
- Microsoft Blogs
- www.microsoft.com/communities/blogs
21Resources
- Blades and SAN
- Storage track - Storage Platform leadership
- Storage track Simplifying SAN deployments on
Windows - Networking track - Implementing convergent
networking - Networking track - Network IO Architectures
- http//www.microsoft.com/windowsserversystem/stora
ge/simplesan.mspx - Reliability - Fundamentals track
- Windows Error Hardware Architecture (WHEA)
- Error management solutions synergy with WHEA
- Dynamic Hardware Partitioning
- Virtualization
- Server track Virtual Server Overview and
Roadmap - Fundamentals track Windows Virtualization
Architecture - Fundamentals track Virtualization Technology
for AMD Architecture - Fundamentals track Virtualization Technology
for Intel Architecture - http//www.microsoft.com/windowsserversystem/virtu
alserver/default.mspx