ABC Co. Network Implementation - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

ABC Co. Network Implementation

Description:

... if primary interface fails (or the hub fails) Update DNS ... Each system name in messages has code added to end to indicate interface address: (-p or -s) ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 16
Provided by: supportco9
Category:

less

Transcript and Presenter's Notes

Title: ABC Co. Network Implementation


1
ABC Co. Network Implementation
  • High reliability is primary concern
  • near 100 uptime required
  • Customer SLA has stiff penalty clauses
  • Everything is designed in a redundant fashion
  • Network redundancy not integrated with system
    design or application design.
  • Application and system design not integrated
  • Management added last (to fix problems)

2
The challenge is always politics
  • Politics prevents different parts of the company
    from working together.
  • Networking, Systems, and Applications are three
    different groups.
  • Systems group own the management issues.
  • Some requirements get in the way
  • e.g. Management station must keep its data on the
    database server.

3
Network design
  • Dual Everything is the design rule
  • Dual Routers/hubs (Cisco 5500s)
  • Dual Ethernet
  • Dual attached systems

4
A simple picture
Redundant net to customers
Rtr/Hub
Rtr/Hub
Dual rail Ethernet
Server a
TNG
DNS Wins
Server n
5
More detail
  • No actual Ethernet bus
  • Systems connect to 5500 via UTP
  • Each system connects to both 5500s
  • one connection is to primary LAN, other to
    secondary LAN
  • Half have left 5500 as primary, other have
    right as primary.
  • 5500s run OSPF and router cluster software

6
Problems...
  • Server OS (NT and Unix) do not switch off the
    primary interface if it fails and will keep
    trying to use it. Applications hang and
    connections time out.
  • DNS points only to one interface on each server.
  • No automatic failover built into applications.

7
Management software must
  • Detect NIC failures
  • Continue to monitor system agents in presence of
    network failures
  • Correct server routing tables if primary
    interface fails (or the hub fails)
  • Update DNS
  • Notify operations as required.

8
Challenges
  • Get each system to report all status via both
    NICs.
  • Monitor system over both NICs.
  • Prevent duplicate notifications.
  • Fail over as fast as possible.
  • Show connectivity of each system to both
    networks.

9
What needs to be done to do this?
  • Modify auto discovery scripts to add each system
    twice as independent systems.
  • Requires private host file for name/address
    translation (cannot depend on access to DNS)
  • Invent system to recognize which interface is
    active and block those from other Nic(s)

10
More work...
  • Duplicate any information in Object Repository
    that is needed to manage failover onto local
    system (cannot trust access to SQL server)
  • Store current connectivity state for all servers
    (added ILPs to class definitions).

11
Tricks used
  • Each system name in messages has code added to
    end to indicate interface address (-p or -s)
  • Most of the work is done in event message
    processing.
  • Each raw message is suppressed and a script
    evoked to process it.
  • Ping success/failures used to switch state
  • Agent messages dropped base on state and p/s flag

12
Basic set of flows
  • For each event, (other than pings)
  • If mode is P or S (kept in NT Registry), and
    message is from S or P, discard.
  • Else, reformat message with real server name,
    improve content (system class, etc.) and send
    back to event console as a new message

13
More Flow
  • For each Ping Success/Fail reported
  • Remember DSM has already done the retries
  • If failure, check to see if other port fails,
    too. If the other port is dead, too, then
    declare the node down, and reset state to
    primary.
  • If its primary, the do failover to secondary. If
    secondary, do a failure back to primary.
  • Update DNS in all cases.

14
Router / Hub failure
  • If the router/hub fails, invoke the primary
    failover script for each node connected to the
    primary side, and the secondary failover script
    for each node connected to the secondary side.
  • This is effectively all the nodes, so we dont
    have to wait for each to have a ping failure.
    The system will stabilize faster.

15
Does it work?
  • You bet! It required
  • Some special REXX scripts for failover
  • A few Basic programs
  • A hack to the auto discovery scripts.
  • Some magic with Trix and a few more basic
    programs.
Write a Comment
User Comments (0)
About PowerShow.com