Title: Clustering%20Technology%20In%20Windows%20NT%20Server,%20Enterprise%20Edition%20Jim%20Gray%20Microsoft%20Research%20Gray@Microsoft.com%20research.Microsoft.com/~gray
1Clustering Technology In Windows NT Server,
Enterprise EditionJim GrayMicrosoft
ResearchGray_at_Microsoft.comresearch.Microsoft.com
/gray
2Todays Agenda
- Windows NT clustering
- MSCS (Microsoft Cluster Server) Demo
- MSCS background
- Design goals
- Terminology
- Architectural details
- Setting up a MSCS cluster
- Hardware considerations
- Cluster application issues
- QA
3Extra Credit
- Included in your presentation materials but not
covered in this session - Reference materials
- SCSI primer
- Speakers notes included
- Hardware Certification
4MSCS In Action
5High Availability Versus Fault Tolerance
- High Availability mask outages through service
restoration - Fault-Tolerance mask local faults
- RAID disks
- Uninterruptible Power Supplies
- Cluster Failover
- Disaster Tolerance masks site failures
- Protects against fire, flood, sabotage,..
- Redundant system and service at remote site
6Windows NT ClustersWhat is clustering to
Microsoft?
- Group of independent systems that appear as a
single system - Managed as a single system
- Common namespace
- Services are cluster-wide
- Ability to tolerate component failures
- Components can be added transparently to users
- Existing client connectivity is not effected by
clustered applications
7Microsoft Cluster Server
- 2-node available 97Q3
- Commoditize fault-tolerance (high availability)
- Commodity hardware (no special hardware)
- Easy to set up and manage
- Lots of applications work out of the box.
- Multi-node Scalability in NT5 timeframe
8MSCA Initial Goals
- Manageability
- Manage nodes as a single system
- Perform server maintenance without affecting
users - Mask faults, so repair is non-disruptive
- Availability
- Restart failed applications and servers
- Un-availability MTTR / MTBF , so quick repair
- Detect/warn administrators of failures
- Reliability
- Accommodate hardware and software failures
- Redundant system without mandating a dedicated
stand by solution
9MSCS Cluster
Client PCs
Server A
Server B
Heartbeat
Cluster management
Disk cabinet A
Disk cabinet B
10Failover Example
Server 1
Server 2
Web site
Web site
Database
Database
Web site files
Database files
11Basic MSCS Terms
- Resource - basic unit of failover
- Group - collection of resources
- Node - Windows NT Server running cluster
software - Cluster - one or more closely-coupled nodes,
managed as a single entity
12MSCS NamespaceCluster view
Cluster name
Node name
Node name
Virtual server name
Virtual server name
Virtual server name
Virtual server name
13MSCS NamespaceOutside world view
Cluster
Node 1
Node 2
Virtual server 1
Virtual server 2
Virtual server 3
Internet Information Server SQL
MTS Falcon
Microsoft Exchange
IP address 1.1.1.1 Network name WHECCLUS
IP address 1.1.1.2 Network name WHECNode1
IP address 1.1.1.3 Network name WHECNode2
IP address 1.1.1.4 Network name WHEC-VS1
IP address 1.1.1.5 Network name WHEC-VS2
IP address 1.1.1.6 Network name WHEC-VS3
14Windows NT ClustersTarget applications
- Application Database servers
- E-mail, groupware, productivity applications
server - Transaction processing servers
- Internet Web servers
- File and print servers
15MSCS Design Philosophy
- Shared nothing
- Simplified hardware configuration
- Remoteable tools
- Windows NT manageability enhancements
- Never take a cluster down shell game rolling
upgrade - Microsoft BackOffice product support
- Provide clustering solutions for all levels of
customer requirements - Eliminate cost and complexity barriers
16MSCS Design Philosophy
- Availability is core for all releases
- Single server image for administration, client
interaction - Failover provided for unmodified server
applications, unmodified clients (cluster-aware
server applications get richer features) - Failover for file and print are default
- Scalability is phase 2 focus
17Non-Features Of MSCS
- Not lock-step/fault-tolerant
- Not able to move running applications
- MSCS restarts applications that are failed over
to other cluster members - Not able to recover shared state between client
and server (i.e., file position) - All client/server transactions should be atomic
- Standard client/server development rules still
apply - ACID always wins
18Setting Up MSCS Applications
19Attributes Of Cluster- Aware Applications
- A persistence model that supports orderly state
transition - Database example
- ACID transactions
- Database log recovery
- Client application support
- IP clients only
- How are retries supported?
- No name service location dependencies
- Custom resource DLL is a good thing
20MSCS Services For Application Support
- Name service mapper
- GetComputerName resolves to virtual server name
- Registry replication
- Key and underlying keys and values are replicated
to the other node - Atomic
- Logged to insure partitions in time are handled
21Application Deployment Planning
- System configuration is crucial
- Adequate hardware configuration
- You cant run Microsoft BackOffice on a 32-MB
75mhz Pentium - Planning of preferred group owners
- Good understanding of single-server performance
is critical - See Windows NT Resource Kit performance planning
section - Understand working set size
- What is acceptable performance to the business
units?
22Evolution Of Cluster- Aware Applications
- Active/passive - general out-of- the-box
applications - Active/active - applications that can run
simultaneously on multiple nodes - Highly scalable - extending the active/active
through I/O shipping, process groups, and other
techniques
23Application Evolution
Application Node 1 Node 2 Microsoft SQL
Server ? Microsoft Transaction ? Server
(MTS) Internet Information ? Server
(IIS) Microsoft Exchange ? Server
24Evolution Of Cluster- Aware Applications
Application Node 1 Node 2 Node 3 Node 4 Microsoft
SQL Server ? ? ? ? Microsoft Transaction ? ?
? ?Server (MTS) Internet Information ? ? ?
?Server (IIS) Microsoft Exchange ? ? ?
?Server
25ResourcesWhat are they?
- Resources are basic system components such as
physical disks, processes, databases, IP
addresses, etc., that provide a service to
clients in a client/server environment - They are online in only one place in the cluster
at a time - They can fail over from one system in the
cluster to another system in the cluster
26Resources
- MSCS includes resource DLL support for
- Physical and logical disk
- IP address and network name
- Generic service or application
- File share
- Print queue
- Internet Information Server virtual roots
- Distributed Transaction Coordinator (DTC)
- Microsoft Message Queue (MSMQ)
- Supports resource dependencies
- Controlled via well-defined interface
- Group offers a virtual server
27Cluster Service To Resource
Windows NTcluster service
Resourcemonitor
Initiate changes
Resource events
Physical diskresource DLL
IP addressresource DLL
Generic appresource DLL
Databaseresource DLL
Disk
Network
App
Database
28Cluster Abstractions
Resource
Cluster
Resource Group
Resource program or device managed by a
cluster e.g., file service, print service,
database server can depend on other resources
(startup ordering) can be online, offline,
paused, failed Resource Group a collection of
related resources hosts resources belongs to a
cluster unit of co-location involved in naming
resources Cluster a collection of nodes,
resources, and groups cooperation for
authentication, administration, naming
29Resources
Resource
Cluster
Group
- Resources have...
- Type what it does (file, DB, print, Web)
- An operational state (online/offline/failed)
- Current and possible nodes
- Containing Resource Group
- Dependencies on other resources
- Restart parameters (in case of resource failure)
30Resource
- Fails over (moves) from one machine to another
- Logical disk
- IP address
- Server application
- Database
- May depend on another resource
- Well-defined properties controlling its behavior
31Resource Dependencies
- A resource may depend on other resources
- A resource is brought online after any resources
it depends on - A resource is taken offline before any resources
it depends on - All dependent resources must fail over together
32Dependency Example
Generic application resource DLL
Database resource DLL
Drive E resource DLL
IP address resource DLL
Drive F resource DLL
33Group Example
Payroll group
Generic application resource DLL
Database resource DLL
Drive E resource DLL
IP address resource DLL
Drive F resource DLL
34MSCS Architecture
Cluster administrator
ClusterAPI
Cluster API DLL
Cluster API stub
Cluster.Exe
Cluster API DLL
Global Update Manager
LogManager
Database Manager
MembershipManager
Event Processor
CheckpointManager
ObjectManager
Node Manager
FailoverManager
ResourceManager
Applicationresource DLL
Resource API
Physicalresource DLL
Logicalresource DLL
Applicationresource DLL
Reliable ClusterTransport Heartbeat
Network
35MSCS Architecture
- Cluster service is comprised of the following
objects - Failover Manager (FM)
- Resource Manager (RM)
- Node Manager (NM)
- Membership Manager (MM)
- Event Processor (EP)
- Database Manager (DM)
- Object Manager (OM)
- Global Update Manager (LM)
- Checkpoint Manager (CM)
- More about these in the next session
36Setting Up An MSCS Cluster
37MSCS Key Components
- Two servers
- Multi versus uniprocessor
- Heterogeneous servers
- Shared SCSI bus
- SCSI HBAs, SCSI RAID HBAs, HW RAID boxes
- Interconnect
- Many types can be supported
- Remember, two NICs per node
- PCI for cluster interconnect
- Complete MSCS HCL configuration
38MSCS Setup
- Most common problems
- Duplicate SCSI IDs on adapters
- Incorrect SCSI cabling
- SCSI Card order on PCI bus
- Configuration of SCSI Firmware
- Lets walk through getting a cluster operational
39Test Before You Build
- Bring each system up independently
- Network adapters
- Cluster interconnect
- Organization interconnect
- SCSI and disk function
- NTFS volume(s)
40Top Ten Setup Concerns
- 10. SCSI is not well known. Please use the MSCS
and IHV setup documentation. Consider the SCSI
book reference for this session - 9. Build a support model that will support
clustering requirements. For example, in
clustering components are paired exactly (i.e.,
SCSI bios revision levels. Include this in your
plans) - 8. Build extra time into your deployment planning
to accommodate cluster setup, both for hardware
and software. Hardware examples include SCSI
setup. Software issues would include
installation across cluster nodes - 7. Know the certification processand its support
implications
41Top Ten Setup Concerns
- 6. Applications will become more cluster-aware
throughtime. This will include better setup,
diagnostics, and documentation. In the meantime,
plan and test accordingly - 5. Clustering will impact your server
maintenanceand upgrade methodologies. Plan
accordingly - 4. Use multiple network adapters and hubs to
eliminatesingle points of failure (everywhere
possible) - 3. Todays clustering solutions are more
complexto install and configure than single
servers. Plan your deployments accordingly - 2. Make sure that your cabinet solutions and
peripherals both fit and function well. Consider
the serviceability implications - 1. Cabling is a nightmare. Color coded,
heavilydocumented, Y cable inclusive,
maintenance-designed products are highly desirable
42Cluster Management Tools
- Cluster administrator
- Monitor and manage cluster
- Cluster CLI/COM
- Command line and COM interface
- Minor modifications to existing tools
- Performance monitor
- Add ability to watch entire cluster
- Disk administrator
- Add understanding of shared disks
- Event logger
- Broadcast events to all nodes
43MSCSReference Materials
- In Search of Clusters The Coming BattleIn Lowly
Parallel Computing - Gregory F. Pfister ISBN 0-13-437625-0
-
- The Book of SCSI
- Peter M. Ridge ISBN 1-886411-02-6
44The Basics Of SCSI
- Why SCSI?
- Types of interfaces?
- Caching and performance
- RAID
- The future
45Why SCSI?
- Faster then IDE - intelligent card/drive
- Uses less processor time
- Can transfer data up to 100 MB/sec.
- More devices on a single chain - up to 15
- Wider variety of devices
- DASD
- Scanners
- CD-ROM writers and optical drives
- Tape drives
46Types Of Interfaces
- SCSI and SCSI II
- 50-pin, 8-bit, max transfer 10 MB/s (early 1.5
to 5 MB/s ) - Internal transfer rate 4 to 8 MB/s
- Wide SCSI
- 68-pin, 16-bit, max transfer 20 MB/s
- Internal transfer rate 7 to 15.5 MB/s
- Ultra SCSI
- 50-pin, 8-bit, higher transfer rate, max
transfer 20 MB/s - Internal transfer rate 7 to 15.5 MB/s
- Ultra wide
- 68-pin, 16-bit, max transfer rate 40 MB/s
- Internal transfer rate 7 to 30 MB/s
47Performance Factors
- Cache on the drive or controller
- Caching in the OS
- Different variables
- Seek time
- Transfer rates
48Redundant Array Of Inexpensive Disks (RAID)
- Developed from paper published in 1987at
University of California Berkeley - The idea is to combine multiple inexpensive
drives (eliminate SLED - single large expensive
drive) - Provided redundancy by storing parity information
49The Future For SCSI
- Faster interfaces - why?
- Fibre Channel
- Optical standard
- Proposed as part of SCSI III (not final)
- Up to 100 MB/s transfer
- Still using ultra-wide SCSI inside enclosures
- Drives with optical interfaces not available yet
in quantity, higher cost than SCSI
50The Future Of SCIS
- Fibre Channel-arbitrated loop
- Ring instead of bus architecture
- Can support up to 126 devices/hosts
- Hot pluggable through the use of a port bypass
circuit - No disruption of the loop as devices are
added/removed - Generally implemented using a backplane design
51HCL List For MSCS
- Servers on normal Windows NT HCL
- Self-test of MP machines soon
- MSCS SCSI component HCL
- Tested by WHQL
- Must pass Windows NT HCT as well
- MSCS interconnect HCL
- Tested by WHQL
- Not required to pass 100 of HCT
- I.e., point-to-point adapters
52MSCS System Certification Process
Windows NT 4.0 Server HCL
Complete MSCS configuration ready for self-test
Windows NT 4.0 SCSI HCL
Windows NT 4.0 MSCS SCSI HCL
Windows NT 4.0 Network HCL
53Testing Phases
- HW compatibility (24 hours)
- SCSI and interconnect testing
- One-node testing (24 hours)
- Eight clients
- Two-node with failover (72 hours)
- Eight-client with asynchronous failovers
- Stress testing (24 hours)
- Dual initiator I/O, split-brain problems
- Simultaneous reboots
54Final MSCS HCL
- Only complete configurationsare supported
- Self test results sent to Microsoft
- Logs checked and configuration reviewed
- HCL updated on Web and fornext major Windows NT
release - For more details see the MSCS Certification
document