Title: Network Availability Management and Reporting
1Network Availability Management and Reporting
- ESnet Network Management
- Mike OConnor
- ESnet Network Engineering Group
- Lawrence Berkeley National Lab
- moc_at_es.net
2Objective
- Develop and deploy a system which will track
ESnet network availability, producing clear and
concise reports. - Accurately reflect both planned and unplanned
network outages for ESnet, its customer sites,
contracted carriers and network peers. - Provide a measure of "uptime" for any given site
and for the ESnet Backbone. - Improve maintenance planning and eliminate or
improve response time to repetitive systematic
network failures. - Increase network availability for ESnet customers.
3Network Management Systems Fall Short
- Commercial off the shelf tools dont provide the
level of integration necessary to correlate and
categorize planned maintenance with the high
volume of alarms reported by an NMS. - Producing accurate customer centric, availability
reports based on alarm data requires a
comprehensive and well-integrated planning,
scheduling and reporting system. - An NMS will report what alarmed and how long it
alarmed, but almost never why it alarmed.
4Over Reporting Outages
- Network management systems typically over-report
events. - Outages are usually reported from multiple
devices and multiple subsystems of the same
device. - Device centric, no correlation with maintenance
or effect on specific customers.
Where when Yes
Why? No
5- Formulate objective
- Devise a solution
- Allocate resources
- Select a maintenance
- window and record in calendar
- Schedule resources
- Send out advance Notification
- Execute the plan
- Verify results
- Review planned outages
- Investigate unplanned outages
- Feedback into planning process, contracted
external internal
- Capture network alarms
- Correlate with planned events
- Format report
6Prior to report and review integration
7(No Transcript)
8Planned Maintenance Calendar Month View
9Planned Maintenance Calendar Day View
10Planning Notification Tools
- Event Input Forms
- Outage Footprint Calculator (OFC)
- Email Templates
- Contact Database
- Automatically referenced by the OFC.
11Outage Footprint Calculator
12(No Transcript)
13(No Transcript)
14Email Templates
ESnet Outage Notification ltTTSgt-ltTitlegt Begin
ltDategt ltGMTgt End ltEndgt
ltEndGMTgt Location ltLocationgt This outage has
been estimated to be ltEOD-hourgt(hrs)
ltEOD-mingt(min) within the above maintenance
window Description ltdescriptiongt Affected
Devices ltaffectedgt -----------------------------
------------------------------------------------
ENERGY SCIENCES NETWORK
(ESnet) 24x7 NOC (510)486-7607 Email
trouble_at_es.net http//www.es.net --------------
--------------------------------------------------
-------------
15Planned Maintenance CalendarQueued Email View
16Event Correlation ExampleSunnyvale Circuit
Maintenance
- Bechtel Path to ESnet Core
- bechtel-rt1 - bechtel-ga.es.net
- bechtel-rt1, Serial0/0
- gac-rt2, Serial5/0/0
- gac-rt2 - gac-pos-snv.es.net
- gac-rt2, POS0/1/0
- snv-cr1, so-3/1/0.0 (maintenance)
- snv-cr1 ESnet core router in Sunnyvale CA.
17Event CorrelationSNV so-3/1/0 example
18Event Correlation Engine
EVENT 08/01/2003 225857 08/01/2003 233857
000004000 snv-cr1 so-0/1/0 EVENT 08/02/2003
082130 08/02/2003 100017 000013847
snll-rt1 rtr_cisco START 08/02/2003 100000
US/Pacific 2003 SNV-to-GACmaintenance EVENT
08/02/2003 100153 08/02/2003 101153
000001000 snv-cr1 so-3/1/0 EVENT 08/02/2003
100159 08/02/2003 100811 000000612 gac-rt2
POS0/1/0 EVENT 08/02/2003 100159 08/02/2003
100811 000000612 gac-rt2 Serial5/0/0 EVENT
08/02/2003 100235 08/02/2003 100810
000000535 bechtel-rt1 rtr_cisco EVENT
08/02/2003 100235 08/02/2003 100811
000000536 gac-rt2 rtr_cisco EVENT 08/02/2003
100311 08/02/2003 100811 000000500
bechtel-rt1 Serial0/0 EVENT 08/02/2003
102751 08/02/2003 103912 000001121
snll-rt1 rtr_cisco STOP 08/02/2003 103001
US/Pacific 2003 SNV-to-GACmaintenance EVENT
08/02/2003 104739 08/02/2003 105027
000000248 llnl-rt3 rtr_cisco EVENT
08/02/2003 104746 08/02/2003 105027
000000241 llnl-rt3 gen_if_port Outage
Footprint Calculation affected -i
snv-cr1,so-3/1/0 Maintenance Interfaces snv-cr1,so
-3/1/0 Affected Routers bechtel-rt1 gac-rt2
19Event Review
- Review planned events
- Maintenance window
- Expected outage duration
- Outage footprint
- Investigate unplanned events
- Circuit providers
- Hardware Vendors
- The NMS
20Event Categorization Outage Reports
- Accurate availability statistics in a clear
concise format suitable for distribution - Planned vs. Unplanned
- ESnet, Site, Carrier, Peer categorization
- Consolidation of multiple reporters
- Network regional reports
- Backbone specific report
21Future Work
- Unplanned outage categorization tools.
- Customer version of the planned maintenance
calendar. - Achieve 100 event categorization.
- Uptime metrics
22Questions?