Enterprise Network Troubleshooting - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Enterprise Network Troubleshooting

Description:

Three Disjoint Views of the Network. Policy: The operator's 'wish list' ... 'WorldCom Inc...suffered a widespread outage on its Internet backbone that ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 14
Provided by: nickf166
Category:

less

Transcript and Presenter's Notes

Title: Enterprise Network Troubleshooting


1
Enterprise Network Troubleshooting
  • Nick FeamsterGeorgia Tech(joint with Russ
    Clark, Yiyi Huang,Anukool Lakhina, Manas
    Khadilkar, Aditi Thanekar)

2
Three Disjoint Views of the Network
Error Checking and Deployment
Generation
Policy
Static
Dynamic
  • ping- traceroute-
  • rancid/rcc- FIREMAN/Lumeta

Independent analyses!
  • Policy The operators wish list
  • Static What the configurations say
  • Dynamic The behavior that users witness

3
A Closer Look
  • Proactive analysis
  • Fault avoidance
  • Policy conformance
  • Reactive diagnosis
  • Correcting network faults
  • Detection
  • Localization
  • Active and passive measurements
  • Need users perspective
  • Two studies
  • Routing
  • Firewalls

Idea These analyses should inform each other
4
Catastrophic Configuration Faults
a glitch at a small ISP triggered a major
outage in Internet access across the country.
The problem started when MAI Network
Services...passed bad router information from one
of its customers onto Sprint. --
news.com, April 25, 1997 Microsoft's websites
were offline for up to 23 hours...because of a
router misconfigurationit took nearly a day to
determine what was wrong and undo the changes.
-- wired.com, January 25, 2001 WorldCom
Incsuffered a widespread outage on its Internet
backbone that affected roughly 20 percent of its
U.S. customer base. The network problemsaffected
millions of computer users worldwide. A
spokeswoman attributed the outage to "a route
table issue." -- cnn.com, October 3,
2002 "A number of Covad customers went out from
5pm today due to, supposedly, a DDOS (distributed
denial of service attack) on a key Level3 data
center, which later was described as a route leak
(misconfiguration). --
dslreports.com, February 23, 2004
5
Case 1 Network-Wide Routing Analysis
  • Proactive routing configuration analysis
  • Idea Analyze configuration before deployment

Many faults can be detected with static analysis.
6
Operators Find Static Analysis Useful
Thats wicked! -- Nicolas Strina,
ip-man.net Thanks again for a great tool. --
Paul Piecuch, IT Manager ...good to finally see
more coverage of routing as distributed
programming. From my experience, the principles
of software engineering eliminate a vast majority
of errors. -- Joe Provo, rcn.com I
find your approach useful, it is really not fun
(but critical for the health of the network) to
keep track of the inconsistencies among different
routersa configuration verifier like yours can
give the operator a degree of confidence that the
sky won't fall on his head real soon now.
-- Arnaud Le Tallanter, clara.net
7
Yes, but Surprises Happen!
  • Link failures
  • Node failures
  • Traffic volumes shift
  • Network devices wedged
  • Two problems
  • Detection
  • Localization

8
Detection Analyze Routing Dynamics
  • Idea Routers exhibit correlated behavior

Blips across signals may be more operationally
interesting than any spike in one.
9
Detection Three Types of Events
  • Single-router bursts
  • Correlated bursts
  • Multi-router bursts
  • Common
  • Commonly missed using thresholds

10
Localization Joint Dynamic/Static
  • Which routers are border routers for that burst
  • Topological properties of routers in the burst

Proactive Analysis
Deployment
Static
Dynamic
Reactive Detection
Diagnosis/Correction
11
Case 2 Firewalls
  • Georgia Tech Campus Network
  • Research and Administrative Network
  • 180 buildings
  • 130 firewalls
  • 1700 switches
  • 55000 ports
  • Problem Availability/Reachability
  • Flux in firewall, router, switch configurations
  • No common authority over changes made

12
Specific Focus Firewall Configuration
  • Difficult to understand and audit configs
  • Subject to continual modifications
  • Roughly 1-2 touches per day
  • Federated policy, distributed dependencies
  • Each department has independent policies
  • Local changes may affect global behavior

13
(Immediate) Open Issues
  • Reachability and reliability of controller
  • Service-level probes
  • Diagnostic tools ! Service-level Happiness
  • Policy conformance
Write a Comment
User Comments (0)
About PowerShow.com