Fault Localization via Analysis of Network Dependency - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Fault Localization via Analysis of Network Dependency

Description:

State of the Art. Example Extracted Dependencies. On-Going Work ... Algorithm for extraction of dependency models. Sniffs and correlates packets between hosts ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 2
Provided by: johnp92
Category:

less

Transcript and Presenter's Notes

Title: Fault Localization via Analysis of Network Dependency


1
Fault Localization via Analysis of
Network Dependency http//pmon
Victor Bahl, Ranveer Chandra, Albert Greenberg,
Dave Maltz, Ming Zhang (MSR Redmond)
Failure of Management Systems
Mission
Automatically Localizing Faults
  • What we have today
  • Interdependent distributed systems with hidden
    and unknown dependencies
  • Plethora of tools for graphing SNMP values,
    paucity of tools for tracking relationships
  • Little visibility into effect of network on
    applications
  • What we want
  • Method to map the IT infrastructure - determining
    which components affect a given client activity
  • Method to localize problems that affect users

Response time of 17 servers
Response time of 1 web server
10
10
  • 10 of requests to internal servers take 10x
    longer than normal
  • Persistent user frustration and high care costs
  • Invisible to current management systems

Automatically Creating Models of Dependencies
Challenges
  • A typical large enterprise
  • 100,000 client desktops
  • 10,000 servers
  • 10,000 apps/services
  • 10,000 network devices
  • Service alerts for 10 days
  • 120,000 housekeeping
  • 2,000 missed heartbeats from 160 servers
  • 18,000 alerts from 194 categories and 877 hosts

Results
  • Algorithm for extraction of dependency models
  • Sniffs and correlates packets between hosts
  • Algorithm for flexible accurate fault
    localization
  • Scalable to size of large enterprises
  • Localizes both hard and performance faults
  • Finds problems in network, even without data
    from network routers
  • Deployed and evaluated on testbed and several
    MSIT applications (e.g., msw, itweb)

State of the Art
  • Management systems do not provide a big picture
  • Tools are box-centric not service-centric
  • Relationships among severs often undocumented
  • Fragmentation results in more mistakes outages
  • Tools do not directly measure user experience

Example Extracted Dependencies
On-Going Work
  • Read/Write SML models of applications
  • Automatically generate SML for legacy apps
  • Complement expert-generated SML
  • Explore other applications of Inference Graph
  • Upgrade management (who will be affected)
  • Availability analysis (who is being impacted)

Model is probabilistic to cope with caching, load
balancing and failover techniques
Write a Comment
User Comments (0)
About PowerShow.com