Title: Advanced Management Technologies for Exchange 5.5
1Advanced Management Technologies For Exchange
5.5 Greg ToddProgram ManagerNT Solutions
GroupBMC Software, Inc.
2(No Transcript)
3Agenda
- Current issues with problem diagnosis
- Application availability timeline
- Theory of root cause analysis (RCA)
- Primer on RCA
- How RCA can help you today
- Demos of RCA on Exchange 5.5
- Systems management vision
- Management maturity curve
- The future of Exchange management
4The Business Problem
- Event automation 1 priority of IT executives
- Problem diagnosis is a critical aspect that
requires attention - Wasted Time80 of down time spent diagnosing20
of time spent fixing - Wasted ResourcesDiagnosis often a
finger-pointing exercise - Frustrated UsersUsers have no idea what to expect
Gartner, 1998
5Application Availability Timeline
6Application Availability Timeline
time
PoF
PoN
PoD
PoR
PoP
Root Cause Analysis
Monitoring
Recovery
Evolution
7Application Availability Timeline
Application Violating Service LevelSignificant
Decrease
FasterServiceRestoration
time
PoF
PoN
PoD
PoR
PoP
Root Cause Analysis Diagnosis Time Reduced
Monitoring
Recovery
Evolution
8Benefits Of RCA
- Based on well-established theories
- Quicker problem resolution
- Problem isolation saves resources to address the
real problem - Symptom filtering allows administrator to ignore
sympathetic events - Performs tests to find the root cause
- Far superior to rules-based approach
- Key enabler to make systems self-sufficient
- Provides impact analysis capability
9RCAKey concepts
- Symptoms are problems tobe investigated
- Faults are the root causes ofthese symptoms
- Tests are active tasks whichgather information
RCA is a problem analysis methodology geared
towards finding the real cause of a problem and
preventing it from happening again.
10Rules-Based Approach Vs. RCA
- Rules-Based
- Symptom received
- Possible causes looked up in a fixed table of
rules - Set of possible causes presented to user
- Only suggested actions can be provided to user
- Root Cause Analysis
- Symptom received
- Possible causes determined from a generic fault
model - Each cause is tested against suspects
- Actual root cause is presented to user after
suspects are eliminated - Specific actions can be provided to user
11Root Cause AnalysisFor Exchange Server
Three components that work synergistically
Exchange Server
Windows NT
IP Network
12High Level RCA Architecture
EnterpriseConsole
Mid-LevelManager
ManagedNode
ManagedNode
ManagedNode
13RCA Architecture BMC PATROL
Exchange Server and OS KMs
14Root Cause AnalysisSample problem
Remote Office Exchange Server
Inbound Server
T1 Link to Remote Office
Exchange Server D
Inbound Messages
BridgeheadServer
To Internet
BridgeheadServer
Firewall
Exchange Server A
Outbound Messages
Exchange Server B
Outbound Server
Exchange Server C
Legend
Internal Mail
Internal Internet Mail
Internet Mail
15PATROL RCASample problem
- Symptom received by model
- Queue Growth Alarms from multiple Exchange
Servers
- Suspected root causes found in model
16PATROL RCASample problem
- Suspected root causes tested
- Root cause isolated
- CPU usage high on bridgehead
17Demo
18Sample Generic Fault Model
19Sample Specific Fault Model
20Sample Specific Fault ModelClose-up
21Demo
- RCA Engine
- Causal Directed Graphs
22Demo
- Root Cause Analysis
- Exchange, NT, IP Network
23Demo
- Impact Analysis
- Exchange, NT, IP Network
24Benefits Of RCA
- Based on well-researched theories
- Quicker problem resolution
- Problem isolation saves resources to address the
real problem - Symptom filtering allows administrator to ignore
sympathetic events - Performs tests to find the root cause
- Far superior to rules-based approach
- Key enabler to make systems self-sufficient
- Provides impact analysis capability
25Systems Management Vision
- Wheres all this stuff going?
26Phases Of Management Maturity
Based on commonly known process control theory
Applies directly to management of complex
software systems
27Maturity Phases
MONITOR
- Monitoring is plumbing
- Included with Windows 2000 and Exchange 2000
- Server-centric data and event collection
- Monitors component and system data
- No awareness of other systems or apps
- Basic alerting, scripting, and actions
- WMI, PerfMon, HealthMon,Exchange 2000 monitoring
28Maturity Phases
MANAGE
- Application-specific and server-centric
- View and take action on components
- Availability and performance monitoring
- Rich reporting
- Application SLA definition
- ASAP resolution when out of compliance
- Most correlation done in your head
- Some tools have reached this level
- Key enabler to Control phase
29Maturity Phases
CONTROL
- Places system automation in control
- Provides holistic view of systems
- Enables high level of SLA compliance
- Quick problem diagnosis
- Action Reaction
- Proactive correction before users feel impact
- Management automation maturing
30Maturity Phases
STABILIZE
- Provides utility-level service
- Reliable as electric, telephone, water
- Assures continuous application service
- Clusters
- Built-in fault tolerance, re-routing, workload
management - Failure does not impact service
- Prediction / impact analysis
- Awareness of impact on SLAs caused by planned
changes
31Maturity Phases
VIRTUALIZE
- The system learns how to intelligently deal with
various issues - Automatic everything
- Actions and responses for the IT group
- Alerts and communications
- Acquires and stores knowledge for future
reference - Uses policy engines to control actions
- Systems become truly self-sufficient
- User becomes self-serviced
32Virtualization ExampleProblem Research Assistant
- Correlates problem root cause diagnoses with
- Previous resolutions - presents the user with
previous remedies based on exact matches or best
guess - On-line technical documentation - integrates with
vendor-supplied support documentation (e.g.
Microsoft Knowledge Base articles) - Technical Support Request Generator - formats
required user information and diagnosed fault
into a support request, according to vendor-
specific templates
33Virtualization ExampleProblem Research Assistant
SupportRequests
DiagnosedFaults
Problem ResearchAssistant
Correlation Backend
Bridge
Previous Resolutions
Help
RCA Server
Domain Model
Domain Model
OnlineTechnicalArticles
ProblemResponseHistoryRepository
Domain Model
IP Reachability Analyzer
34RCA Takes Management To The Next Level
VIRTUALIZE
STABILIZE
Many Players Many Choices
CONTROL
MANAGE
RootCauseAnalysis
MONITOR
35Summary
- GOAL No interruptions in service
- RCA is key to Exchange availability
- Accelerates the diagnosis process
- Can assess impact of failures before-hand
- Not unreasonable to achieve five 9s
- RCA paves the way to virtualization
- Managed systems that learn and adapt
- You never have to intervene
- Free to invest more time in pro-activity
- RCA is in beta now!!
36Call To Action
- Demand sophistication and simplicity in Exchange
management solutions - Solutions that learn
- Solutions that are easy to use
- Start thinking of Exchange availability in terms
of utility-level service - Consider where to implement RCA in your current
environment - Bring along those whom you service
- Take care of your users
- Communicate with them as you progress
37(No Transcript)