Title: Performance Management (Best Practices)
1Performance Management(Best Practices)
- REFwww.cisco.com
- Document ID 15115
2Introduction
- Performance Management involves optimization of
network response time and management of
consistency and quality of individual and overall
network services - Need to measure the user/application response time
3Performance management issues
- User performance
- Application performance
- Capacity planning
- Proactive fault management
-
- It is important to note that with newer
application like video and voice performance
management is the key success
4Indicators for performance management (1/3)
- Document the network management business
objectives - Create detailed and measurable service level
objectives - Provide documentation the service level agreement
(SLA) with charts or graphs that show the success
or failure of how these agreements are met over
the time
5Indicators for performance management (2/3)
- Collect a list of the variables for the baseline
such as polling interval, network management
overhead incurred, possible trigger threshold - Have a periodic meeting that reviews the analysis
of the baseline and trends.
6Indicators for performance management (3/3)
- Have a what-if analysis methodology documented.
- When thresholds are exceed, develop documentation
on the methodology used to increase network
resources.
7Performance management process flow (1/3)
8Performance management process flow (1/3)
- 1 develop a network management concept of
operation - Define the required features Services,
Scalability and Availability objectives - Define availability and network management
objectives - Define performance SLAs and Metrics
- Define SLA
9Performance management process flow (2/3)
- 2 Measure Performance
- Gather network baseline data
- Measure availability
- Measure response time
- Measure accuracy
- Measure utilization
- Capacity planning
10Performance management process flow (3/3)
- 3 perform a proactive fault analysis
- Use threshold for proactive fault management
- Network management implementation
- Network operation metrics
11Performance management process flow
12Develop a network management concept of operation
(1/3)
- The purpose is to describe the overall desired
system characteristics from an operational
standpoint - The use of this document is to coordinate the
overall business goals of network operation,
engineering, design other business units and the
end users.
13Define the required features Services,
Scalability objectives (1/2)
- Define services objectivesWhat services the
network provide - to understand applications, basic traffic flows,
users and site counts - Define scalability objectives How many users to
use the network, also the capacity consumed on
the network - media capacity, number of routes and users
14Define the required features Services,
Scalability objectives (2/2)
- These are the standard performance goals
- Response time
- Utilization
- Throughput
- Capacity (maximum throughput rate)
15Define availability and network management
objectives (1/2)
- Define Availability objectives
- define the level of services (service level
requirements) - define different class of service for a
particular organization - Higher availability objective might necessitate
increased redundancy and support procedures
16Define availability and network management
objectives (2/2)
- Define manageability objectives
- To ensure that overall network management does
not lack management functionality - Must understand the process and tools for
organization - Uncover all important MIB or network tool
information
17Define performance SLAs and Metrics
- The performance SLAs metrics such as
- average expected volume of traffic,
- peak volume of traffic,
- average response time and maximum response time
allowed - Availability
- Down Time
18Define SLAs
- SLA (Service Level Agreement) - enterprise
- SLM (Service Level Management) service provider
- SLM include definitions for problem types and
severity and help desk responsibilities - Escalation path, time before escalation at each
tier support level - Time to start work on the problem
- Time to close target based on priority
- Service to provide in the area of capacity
planning, hardware replacement
19Performance management process flow
20Measure Performance
- Gather Network Baseline data
- Perform a baseline of the network before and
after a new solution deployment - A typical router/switch baseline report includes
capacity issues related to CPU, memory, buffer,
link/media utilization, throughput - Application baseline bandwidth used by app per
time period
21Measure availability
- Availability is the the measure of time for which
a network system or application is available to a
user - Coordinate the help desk phone calls with the
statistics collected from managed devices - Check scheduled outages
- Etc
22Measure Response Time
- Network response time is the time required to
travel between two points - Simple level pings from the network management
station to key points I the network. (not
accuracy) - Server-centric polling SAA (Service Assurance
Agent) on router (Cisco) to measure response time
to a destination device - Generate traffic that resembles the particular
application or technology of interest
23Measure accuracy
- Accuracy is the measure of interface traffic that
does not result in error and can be expressed in
term of percentage - Accuracy 100 error rate
- Error rate ifInErrors 100 / (ifInUcastPkts
IfInNUcastPkts)
24Measure Utilization (1)
- Utilization measure the use of a particular
resource over time - Percentage in which the usage of a resource is
compared with its maximum operational capacity - High utilization is not necessarily bad
- Sudden jump in utilization can indicate unnormal
condition
25Measure Utilization (2)
- Input utilization
- ifInOctets 8100/(time in second)ifSpeed
- Output Utilization
- ifOutOctets 8100/(time in second)ifSpeed
26Capacity planning
- The following are potential areas for concern
- CPU
- Backplane or I/O
- Memory
- Interface and pip sizes
- Queuing, latency and jitter
- Speed and distance
- Application characteristics
27Performance management process flow
28Perform a Proactive fault analysis
- One method to perform fault management is through
the use of RMON alarms and event groups - Distributed management system that enables
polling at a local level with aggregation of data
at a manager to manager
29Use threshold for proactive fault management (1/2)
- Threshold is the point of interest in specific
data stream and generate event when threshold is
triggered - 2 classes of threshold for numeric data
- Continuous threshold apply to continuous or time
series data such as data stored in SNMP counter
or gauges - Discrete threshold apply to enumerated objects or
discrete numeric data such as Boolean objects
30Use threshold for proactive fault management (2/2)
- 2 different forms of continuous threshold
- Absolute use with gauges
- Relative (delta) use with counter
- Step to determine threshold
- 1 select the objects
- 2 select the devices and interfaces
- 3 determine the threshold values for each object
or interface - 4 determine the severity for the event generated
by each threshold
31Network management implementation
- The organization should have an implemented
network management system. - SNMP/RMON or other network management system tools