Title: Exchange 2003 High Availability
1Exchange 2003 High Availability Site Redundancy
- Wil Westwick Dedicated Supportability Services
- EMEA eXchange Center of Excellence (UK Competence
Centre Lead) - MICROSOFT CORPORATION
2Welcome to this TechNet Event
-
- FREE bi-weekly technical newsletter
- FREE regular technical events hosted across the
UK - FREE weekly UK US led technical webcasts
- FREE comprehensive technical web site
- Monthly CD / DVD subscription with the latest
technical tools resources - FREE quarterly technical magazine
We would like to bring your attention to the key
elements of the TechNet programme the central
information and community resource for IT
professionals in the UK
To subscribe to the newsletter or just to find
out more, please visit www.microsoft.com/uk/techne
t or speak to a Microsoft representative during
the break
3Progression in Messaging
- History of Exchange (5.0, 5.5, 2000, 2003, E12)
- Mission Critical
- Email Evolution
- Corner Stone of many businesses/industries
- Primary professional communication mechanism
- Increased investment (?)
- Increased Development
- (3rd party investments)
4Provision of Service
- Greater Demands placed upon service
- High Availability.
- Business Continuity.
- High Availability
- A highly available system is usable when the
customer needs it. - Planned and Unplanned.
- Business Continuity
- Providing the continuity or uninterrupted
provision of operations and services but more
importantly the ability for BUSINESS RECOVERY!. - Microsofts current/future investments.
- Alignment to business Service Level Agreements
and a measurement of availability ( of uptime). - What are the 4 9s and how do I measure them ?
- Microsoft Exchange 2003 and Partner technologies
help IT Professionals provide solutions to these
modern day business requirements.
5Exchange 2003 Highly Available
- Clustering solutions
- Shared Nothing
- Models (A/A, A/P, Multi-Node, MNS)
- Service Provision (end-to-end) (Application
Dependencies) - ExRes.dll architectural changes (MSExchangeSA)
- Interoperability with Storage Abstraction Layer
(CLX, Geo-Span)
6Exchange 2003 Highly Availablecont
- Operational Excellence (ITIL MOF)
- People, process, technology
- Non-Clustered Solutions
- Outlook 2003 Cached Mode
- Portable Databases (Replication Technology)
- Clone Technology
- Microsoft Windows 2003 VSS Framework
7Exchange 2003 Business Continuity
- Solutions
- Geo-Graphically Distributed Clusters
- (Non) Geo-Clustered Site Resilient
- Design Scenario (A)
- Design Scenario (B)
- Design Scenario (C)
- Design Fundamental Multi-Site Data Availability
- - Replication
- - Clones (VSS)
8Multi-site Data Replication
- What is Multi-site Data Replication ?
- Replication Mechanisms
- Asynchronous Replication
- Data Loss
- Data Integrity
- Synchronous Replication
- Distance
- handling of replication link failure
- Solutions
- Geographically Distributed Clusters
- Others(Standby Solutions)
- Exchange Data to Replicate
- .edb, .stm, .chk, .log (Mandatory)
- SMTP Queue Data MTA Queue Data (Recommended)
- Tracking Logs (Optional)
- Best Practices for Configuring Replication
Mechanisms - Configure replication at the logical/mount point
volume level. - Create many replication points.
- Keep transaction logs on different logical
volumes.
9Multi-site Data Replication
- Exchange Product Group Support Policy
- In summary, Microsoft Exchange supports the data
being replicated synchronously where in an
asynchronous replication environment, the third
party vendors will provide support for the
replicated data. - Short Common Questions
- 1. Do Microsoft discourage customers from
deploying an Exchange asynchronous replication
solution? - Microsoft Exchange does not encourage nor
discourage customers from deploying asynchronous
data replication solutions. - 2. What are the important tests that need to be
covered before deployment? - Testing should be done in each of these
categories - Storage Reliability
- Performance
10Multi-site Data Replication
- Backup strategy replication is no backup
solution - Disaster Recovery plan Replicating data is only
the first step in a disaster recovery plan. It is
necessary to have a disaster recovery plan that
describes step by step how to bring the
replicated data online in the time window defined
by your SLA. - What tools can I use for the testing ?
- Jetstress Loadsim
- Hot and Cold Data
- What is Microsofts support for replicating cold
data?
11Exchange 2003 Business Continuity
- Geo-Graphically Distributed Clusters
- What is a true stretch cluster ?
- Qualification
- Storage Abstraction Layer
- (CLX, GEO-SPAN)
- Connections
- Latency
- Multi-Node
12Exchange 2003 Business Continuitycont
13The Alternate Designs (Others)
- Design Scenario (A) Single Leg DR Clusters
(Dial-Tone) - Environment
- 20 Exchange 2003 A/P clusters in the production
environment - 30K mailboxes (Outlook 97/98/2000/XP)
- Disaster recovery site located 100Km away from
the production datacenter - 100Mbit link between the 2 sites
- Disaster recovery requirements for Exchange
- To provide email service continuity in case of a
complete cluster failure OR in case of temporary - unavailability of the production datacenter.
- Data recovery is not required, users can work
with empty mailboxes (dial-tone). - Geo-Clustering can not be considered because
storage replication infrastructure can not be - afforded.
- Solution (Stand-by dial-tone clusters)
- The network is configured so the VLANS are
extended to the DR site, so we have the same IP - subnets on the DR site.
- DC/GC/DNS servers are installed on the DR site,
being members of the same domain/site and are - online.
- Public Folder servers are installed in the same
AG/RG and replicating data and are online - Bridgehead and connector servers are deployed on
the DR site. Secondary connectors are
14The Alternate Designs (Others)
- How the standby cluster is configured
- Installed on the same subnet as the production
cluster. - Distinct computer name and IP and is online.
- Distinct cluster name and cluster IP and is
online. - Physical disk configuration that correspond to
the same drive letters of the production cluster, - however smaller size since we dont need space
for data restore. - Exchange 2000 binaries pre-installed Service
packs applied. - How we switch from production to DR
- Lets say EVS1 is running on CLUSTER1 which is
composed by SRV1 and SRV2, located on the - production datacenter.
- EVS1 entire cluster goes down.
- The standby cluster for CLUSTER1 is CLUSTER11,
composed by SRV11 only, which is online - On CLUSTER11, we create the Exchange IP and
Exchange Network Name resources with same - values of the production clusters (same IP and
same name EVS1). - Bring the resources online.
- Create the Exchange System Attendant resource.
That will bring EVS1 back online on - CLUSTER11.
- We go to Exchange System Manager and manually
mount the mailbox stores forcing the creation - of empty databases.
15The Alternate Designs (Others)
- How we switch back from DR to production
- Take all resources offline on CLUSTER11
- Restore CLUSTER1 to its original state (whatever
the cause of the failure was) - Bring all the resources online on CLUSTER1
- EVS1 will be back online on CLUSTER1
- Users are back to the state they were on the
moment of the failure - EXMERGE the data our of the standby clusters and
EXMERGE the data into the production mailboxes - Solutions such as this are in place today and
provides a 510min switch time from the
production to the standby cluster - and meets customer requirements.
- For Exchange 2003 and Outlook 2003 in cache mode
we have the following behavior when switching
back and forth - between the production and standby clusters.
- OL2003 users are working in cache mode against
EVS1, - EVS1 goes down. - OL2003 is now in Disconnected state and the
user continues to work normally offline. - We switch EVS1 to the standby cluster and bring
the empty databases online. - OL2003 users sees a popup saying that there has
been a change and Outlook needs to be restarted. - User restarts OL2003 and sees a dialog saying
that Exchange is currently running in recovery
mode and you can either - Connect or Work Offline.
- If you choose to Work Offline, you will see your
regular cache mode OST, with all your data and
work offline as usual. - If you choose to Connect, you are going to see
your empty mailbox and will begin to send and
receive new mail on the new
16The Alternate Designs (Others)
- Design Scenario (B) Single Leg DR Clusters
(Data Available) - Disaster recovery requirements for Exchange
- To provide email service continuity in case of a
complete cluster failure OR in case of
temporary unavailability of the production
datacenter - Data recovery IS required
- Geo-Clustering can not be considered because
storage replication - infrastructure can not be afforded/qualified.
- Solution
- Introduction of Sync Replication Clone based
copies (VSS). - Async/Sync replicate Transaction Logs to DR
Site. - Clone presentation to DR Site.
- Log Shipping. Is this log shipping ?
17The Alternate Designs (Others)
- Design Scenario (C) Non-Clustered (Dial Tone or
Data Available) - Current Environment
- 3 Exchange 2003 Servers in the production
environment (Site A) - 3 Exchange 2003 Servers in the DR environment
(Site B) - 15K mailboxes (Outlook 97/98/2000/XP)
- Site (A) located 50M away from Site (B)
- Dark-Fiber link between the 2 sites
- Disaster recovery requirements for Exchange
- To provide email service continuity in case of a
complete server failure OR in case of temporary
unavailability of one of the datacenters. - Data recovery IS required. (but dial tone is also
possible). - Geo-Clustering can not be considered because of
internal political issues and qualification
difficulties.
18The Alternate Designs (Others)
- Solution
- 3 Exchange Severs all located in Site A.
- Synchronous replication will replicate Exchange
IO to remote data center (Site B). 40Miles apart. - Site B will provide Business Continuity in the
event of a primary site failure by offering three
additional Exchange 2003 Servers. - DB and Log File Paths
- Org and Admin Group Membership
- Upon failure of Site A the replicated database
and log volumes of each of the three production
Exchange Sevres will be presented to their - corresponding standby server in Site B. For
example - EXC1 -gt EXC04 EXC2 -gt EXC05 EXC3 -gt EXC06
-
- The Exchange Servers in Site B will take
ownership/responsibility of serving all corporate
messaging requirements and provide users with
access to all mailbox data with no data loss. - - AD attributes such as HOMEMDB and HOMEMATA
will become incorrect. The following process
details the steps required - SCENARIO (10)
- 1. Site A goes down.
- 2. Open up ADUC and use the multiple select
options to select each of the mail-enabled user
objects that were homed on Exchange Servers. - 3. Right Click the combined group selection and
choose Exchange Tasks. The Exchange Task Wizard
will launch. Follow through the wizard to - delete each of the mail-enabled user objects
mailboxes. - NOTE Pre-defined LDAP queries (querying the AD
for HOMEMDB and HOMEMTA) can be created and saved
into the ADUC MMC. This will
19The Alternate Designs (Others)
- SCENARIO (20) Fail Back
- Fail-Back (Return messaging service and data to
primary site Site A). - Procedure is identical to fail-over, however in
reverse. - Ensure each of the standby servers are shut down
prior to beginning fail back procedure. - Process should occur at a time of managed/planned
downtime.
20Product Roadmap Futures
- E12
- Improve cluster failover operation
- Log Shipping Support
- Out of the Box Local Replication
- I/O Operations Management
21Questions from the Audience
Recommended Links
Multi-site data replication support for Exchange
2003 and Exchange 2000
(http//support.microsoft.com/default.aspx?scidkb
en-us895847http//support.microsoft.com/default.
aspx?scidkben-us895847)
Deployment Guidelines for Exchange Server
Multi-Site Data Replication
(http//www.microsoft.com/technet/prodtechnol/exch
ange/guides/E2k3DataRepl/bedf62a9-dff7-49a8-bd27-b
2f1c46d5651.mspx)
Jetstress Tool is available from
http//go.microsoft.com/fwlink/?LinkId27883.
Achieving High Availability with Exchange Server
at Microsoft
(http//www.microsoft.com/technet/itsolutions/msit
/operations/exchhighavailTSB.mspx)
Windows Server Catalogue Geographically
Dispersed Cluster Solutions
(http//www.microsoft.com/windows/catalog/server/d
efault.aspx?subID22xsltcategoryProductpgnb550
95f4-71f3-4b26-98b1-05f3a9506d0d)