Automated Fault diagnosis in VoIP - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Automated Fault diagnosis in VoIP

Description:

Automated Fault diagnosis in VoIP 31st March,2006 Vishal Kumar Singh and Henning Schulzrinne – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 23
Provided by: vs25
Category:

less

Transcript and Presenter's Notes

Title: Automated Fault diagnosis in VoIP


1
Automated Fault diagnosis in VoIP
  • 31st March,2006
  • Vishal Kumar Singh and Henning Schulzrinne

2
VoIP Diagnosis
  • What is automated VoIP diagnosis
  • Determining failures in network
  • Automatically finding the root cause of the
    failure
  • Why VoIP diagnosis
  • Networks are complex, making it difficult to
    troubleshoot problems
  • Automatic fault diagnosis reduces human
    intervention
  • Issues in VoIP diagnosis
  • Detecting failures/faults
  • Finding the cause of failure, determining
    dependency relationships among different
    components for diagnosis
  • Solution steps and approaches

3
Issues in Automated VoIP Diagnosis
  • Increasingly complex and diverse network elements
  • Complex interactions/relationships between
    different network elements
  • Different run time bindings for each application
    usage instance, e.g., different calls may use
    different DNS, SIP proxy servers, media path
  • Problem in one network element may manifest
    itself as user perceived failure of another
    element

4
Fault Identification
  • Service unavailability reporting
  • Node/Device/UA generates faults (failure events)
    e.g. SNMP Traps, failure messages
  • Monitoring application e.g., SNMP based
    application detects service unavailability and
    reports the failure event
  • Affected user reports service unavailability ,
    e.g., by e-mail, calling to helpdesk,
    automatically by pressing a button on phone while
    in a call and experiencing echo
  • Dependent application detects service
    unavailability and generates fault (failure
    events)

5
Fault Localization Determining the Source of
Problem
  • Fault Classification Local Vs. Global
  • (Does it affect only me or Does it affect
    others also)
  • Global failures
  • Server failure e.g. SIP proxy, DNS failure, DB
    failures
  • Network failures
  • Local failures
  • Specific Source failure e.g. node A cannot make
    call to anyone
  • Specific destination or participant failure e.g.
    No one can make call to node B
  • Locally observed but global failures e.g., DNS
    service failed, but only B observed it.

6
Solution Approach
  • DYSWIS Do you see what I see 1
  • Peers (Nodes) perform diagnostic tests when
    another peer reports or detects failure
  • Nodes can choose the diagnostic test depending on
    dependency encoded as decision tree
  • Nodes (at least some) will be initially preloaded
    with the dependency relationship in some format
    (e.g., XML based)
  • Nodes (at least some) may build and update the
    dependency relationship based on statistical and
    temporal analysis of failure events which they
    receive and diagnostic tests which they perform

7
Solution Approach
  • Store context information of past failures
    experienced by each node
  • E.g., specific server that was acting as the
    proxy server (for my call which failed)
  • Store locality of past failures instances
  • LAN, domain, subnet
  • First hop at each layer e.g., switch (MAC),
    default gateway (IP), domains proxy (Application
    layer),
  • Failure count for each network element
    (statistical)
  • Last failure timestamp for each network element
  • Last successfully seen timestamp for each network
    element (why do I need to test the proxy for you,
    my call just went through)
  • Temporal correlation of past failures (proxy
    seems to be failing after DNS fails)
  • Each node has a runtime dependency list based on
    past failures and diagnostic tests

8
Solution Architecture
Nodes in different domains cooperating to
determine cause of failure
9
Solution Architecture Logical View
Failures in Network
Dependency graph generation Bayesian network
based, Inference, other models
Test results
Decision Tree updates
Triggers to perform TESTS. (Peer selection
and Probe selection.
Dependency relationships and tests (XML)
The above figure shows logical entities and
separation of dependency graph generation and
Distributed diagnostic infrastructure (enclosed
in blue).
10
Solution Requirements
  • Request-Response protocol between the node which
    experiences the failure and the peer nodes
  • Nodes capability to perform diagnostic tests
    (probes), probe selection based on cost/result
  • Encoding the dependency relationship into a
    decision tree (giving as an input from an expert
    e.g., as XML)
  • Peer node discovery, based on
  • Location (local network, domain)
  • Capability to perform tests (based on specific
    tests)
  • Dependency graph generation and updation, based
    on
  • Network failure events
  • Diagnostic test results correlated with failures

11
Test/ Probe Selection
  • Which diagnostic probe to run network layer or
    application layer and for what kind of failures.
  • A probe covering broad range of failures can give
    faster and crude but less accurate results
  • E.g. PING vs TCP Connect vs. SIP PING tests
  • Cost of Probe

12
Dependency Classifications
  • Functional dependency
  • At generic service level e.g. SIP proxy depends
    on DB service, DNS service
  • Structural dependency
  • Configuration time e.g. Columbia CS SIP proxy is
    configured to use mysql database on metro-north
  • Operational dependency
  • Runtime dependencies or run time bindings, e.g.,
    the call which failed was using failover SIP
    server obtained from DNS which was running on
    host a.b.c.d in IRT lab

13
Dependency classifications Layered Approach
  • Vertical and Lateral dependencies Applications
    depends on other application layer services
    (e.g., SIP service depends on DB, DNS service) as
    well as lower layer services
  • OSI layers as service dependency layers
  • Application layer service also depends on
    transport layer service which in turn depends on
    network layer service
  • MAC layer Access point, Switch
  • Network layer Router
  • Application layer DNS, SIP, Database
  • Topology based dependency
  • e.g., calls from CS domain depends on specific
    SIP server, calls from lab phones depends on
    specific switches and routers

14
Dependency Graph
15
Dependency Graph Encoded to Decision Tree
16
Diagnostic Tests
  • SIP proxy
  • Proxy server availability
  • SIP PING
  • Call Routing availability
  • Invite tests
  • Call Path determination
  • SIP TraceRoute
  • Media path
  • Quality related
  • Speech quality degradation - MOS
  • Echo
  • jitter- MOS, PESQ
  • QoS RTCP
  • NAT/Firewall
  • Checking binding expiration.
  • Firewall failure to open a port - One way media.
  • How to determine which Firewall in the path ?
    SIP signaling ?

17
Diagnostic Tests
  • DNS tests
  • DHCP
  • Switch/Router
  • ARP/RARP/Multicast
  • BGP failures
  • Conference mixers
  • Gateway
  • Echo return loss- readings- Analysis
  • DB
  • XCAP server tests
  • Presence service availability tests

18
Example
  • Call Failure Possible Causes
  • SIP Proxy server
  • Database
  • Authentication
  • Media path failure
  • Gateway
  • Specific call legs ERL, Authentication, etc.
  • DNS server failure
  • End station failure
  • Network failure, e.g., router, switch failure
  • Different calls will have different run time
    dependencies

19
Mapping to a Human Medical System
  • Doctors perform diagnostic tests to find out the
    cause of disease when the symptoms are mentioned
    They may learn new things about the disease as
    a part of diagnostic tests
  • Failures and triggered tests update the
    dependency graph
  • Medical researchers do different types of tests
    to learn about new diseases, determine the cause
    and relationship of a disease with other
    physiological system
  • Set of tests that can run periodically and can be
    used to build dependency graph independent of
    failures

20
Solution Evolution
  • Learning the dependency graph from failure events
    and diagnostic tests
  • Learning using random/periodic testing to
    identify failures and determine relationships

21
Future Directions
  • Self healing
  • Predicting failures
  • Protocols for labeling event failures which
    would enable automatically incorporating new
    devices/applications to the dependency system
  • Decision tree (dependency graph) based event
    correlation

22
Reference
  • 1 User-oriented Management of VoIP Applications
    (http//www.ibr.cs.tu-bs.de/projects/nmrg/meeting
    s/2005/nancy/dyswis.pdf)
Write a Comment
User Comments (0)
About PowerShow.com