Automatic Reconfiguration in Autonet Research by Thomas L. Rodeheffer and Michael D. Schroeder - PowerPoint PPT Presentation

About This Presentation
Title:

Automatic Reconfiguration in Autonet Research by Thomas L. Rodeheffer and Michael D. Schroeder

Description:

Automatic Reconfiguration in Autonet Research by Thomas L. Rodeheffer and Michael D. Schroeder Presented by Arnold Suvatne and Michael Shiau – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 35
Provided by: sjs64
Learn more at: http://www.cs.sjsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Automatic Reconfiguration in Autonet Research by Thomas L. Rodeheffer and Michael D. Schroeder


1
Automatic Reconfiguration in Autonet Research
byThomas L. Rodeheffer and Michael D. Schroeder
  • Presented by
  • Arnold Suvatne and Michael Shiau

2
What is Autonet?
  • Switch based LAN that automatically reconfigures
    itself when network changes are detected
  • Responds to both failures and additions of
    network components
  • Works well in a LAN configuration but cannot
    effectively scale to a WAN

3
Autonet Goals
  • Quick Reconfiguration
  • Ensure reconfiguration is quick enough so that
    upper level layers do not notice any disruption
  • High Availability
  • Utilize redundant topology to ensure full
    interconnection

4
Automatic Reconfiguration Mechanism
  • Processor in each switch performs 3 main tasks
  • Monitoring Task
  • Periodically check directly connected links to
    determine whether they are broken or working
  • Topology Acquisition Task
  • Collect and distribute topology description of
    the entire network to every node
  • Routing
  • Each switch uses the new topology description to
    compute their respective forwarding tables

5
Monitoring Task
  • Determines if neighborhood links are useful
  • What does it mean to be useful?
  • Allows packet transfer between two distinct nodes
    with low error rates
  • Two types of events
  • Link Failure responds quickly
  • Link Recovery not as quick of a response
  • Monitoring Task utilizes a method called the
    Skeptic

6
Skeptic
  • What does the Skeptic Do?
  • Prevents constant reconfiguration due to
    intermittent links or bursts of failures
  • Delays recovery of a link that has a history of
    failures
  • Skepticism Level
  • The higher the skepticism level the more
    skeptical we are about the reliability or quality
    of a link
  • Generally numbered from 0-20
  • Used in calculation of wait time (wtime) and good
    time (gtime)

7
Skeptic
  • Wait Time
  • wtime wbase wmult 2 level
  • wbase and wmult are policy parameters and level
    is the skepticism level
  • Time that a switch must wait before transitioning
    to a good state
  • Good Time (gtime)
  • gtime gbase gmult 2 level
  • gbase and gmult are policy parameters
  • Time that a switch must stay in a good state in
    order to forgive a skepticism level
  • forgive means reducing the skepticism level by
    1

8
Skeptic Internals
1
9
Skeptic
  • Skeptic Design Requirements 1
  • A link with a good history must be allowed to
    fail and recover several times without
    significant penalty
  • In the worst case, a links average long term
    failure rate must not be allowed to exceed some
    low rate
  • Common behaviors show by bad links should result
    in exceedingly low average long term failure
    rates
  • A link that stops being bad must eventually be
    forgiven its bad history

10
Monitoring Task Layers
  • Monitoring Task has 2 Layers
  • Transmission Layer
  • Deals with failure and recovery events at the
    hardware level
  • Connectivity Layer
  • Deals with failure and recovery events at the
    network layer

11
Transmission Layer
  • Purpose
  • Watches link hardware to determine if link is
    successful at sending and receiving data
  • Doesnt care where packets are coming from or
    going to
  • Passes its conclusion to the connectivity layer
  • Isolates broken link by setting switch to discard
    all incoming and outgoing packets on that link
  • Composed of
  • Fault Monitor
  • Skeptic
  • Three Error Detectors
  • Round Trip Verifier

12
Transmission Layer
1
13
Fault Monitor
  • Structure
  • Lowest level of the transmission layer
  • Takes three inputs
  • From error detectors (failure)
  • From round trip verifier (failure)
  • From dummy object (working)
  • Purpose
  • To pass input into skeptic for checking or
    rechecking link status

14
Corrupt Packet Detector
  • Examines packets received by switch control
  • Checks for invalid packet lengths and corruptions
    via CRC
  • Allows for some corruption
  • Uses Leaky Bucket Mechanism
  • Each time corrupt packet is detected, a token is
    placed inside bucket
  • Tokens leak out of bucket every 10 min
  • If adding token causes bucket to have gt 5 tokens,
    a fault is declared.

15
Stuck Link Detector
  • Discover links that become stuck in a state that
    prevents data transmission
  • This happens when a command code is
    miss-transmitted
  • Allow occasional occurrences of stuck links
  • Impose stuck link occurrence quota via Leaky
    Bucket Method

16
Violation Detector
  • Detects coding Violations which are caused by
    static on the line
  • Detects formation violations which are caused by
    lost or ill-formed information
  • Should allow for isolated instances of both types
    of errors as they are quite common
  • Acceptable rate depends on skeptic
  • Working link can have 3 errors
  • Broken links can not have any errors

17
Round Trip Verifier
1
  • Purpose
  • Allows confirmation between two nodes to decide
    whether a link is working or not
  • Filters the output of the skeptic by delaying
    working messages until confirmation
  • Implementation
  • Implemented with the 3 state machine above

18
Connectivity Layer
  • Reached when Transmission Layer declares a link
    as working
  • Does care where a packet is coming from or going
    to
  • Packets are communicated between the nodes to
    determine identity and usefulness.
  • Filters out links that do not connect anywhere or
    that connect back to the same switch

19
Connectivity Layer Structure
  • Fault Monitor receives working or broken result
    from transmission layer
  • Round trip verifier will delay working messages
    until it has exchanged packets and determined the
    identity of the remote node
  • Distinct Node verifier will make sure that the
    two nodes communicating are indeed different

1
20
Round Trip Verifier
  • Filters the output of the skeptic until it has
    determined the identity of the remote node
  • Exchanges connectivity packet with remote link to
    test communication between two links
  • Tests a link vigorously if there is reason to
    believe that testing might change result
  • How can it be done?
  • If a previously working link stops responding, a
    exponential decrease in packet tests are sent
    once communication is resumed
  • This exponential back off allows for occasional
    losses or delays in response without hindering
    the system performance

21
Distinct Node Verifier
  • Makes sure that the identity of the remote node
    is not the same as the local node.
  • Similar nodes should not be considered working
  • Change in status from working to broken or vice
    versa will cause a re-computation of topology.

22
Topology Acquisition Task
  • Responsibility of Topology Acquisition Task
  • Collect and distribute the new topology
    description of the entire network to every node
  • Three scenarios
  • Single node initiates the task and runs to
    completion
  • Multiple initiators
  • Topology changes while performing task

23
Basic method
  • Topology Acquisition Task consists of 3 phases
  • Propagation
  • From top down, construct a rooted spanning tree
    over the set of all reachable nodes
  • Collection
  • From bottom up, discover and merge descriptions
    of larger and larger sub-trees
  • Distribution
  • From root node down, send complete description of
    the network topology to every node

24
Propagation Phase
  • Starting at the initiator (root node)
  • Each node N offers each of its neighbors, M, an
    option to join the spanning tree as a child of N
  • If M has not already joined the tree, it accepts
    N as its parent and joins the tree
  • Otherwise it rejects the offer
  • The spanning tree is complete when all nodes have
    received responses, accept or reject, from all
    their neighbors

25
Collection Phase
  • Each node that accepted an offer to be a child
    during the propagation phase
  • Commits to providing to its parent a description
    of its neighborhood (sub-tree)
  • Collection phase begins at the leaves of the
    spanning tree and rises up to the root
  • A node knows its a leaf node when all of its
    neighbors refuse to be children

26
Collection Phase
  • Non-leaf nodes
  • Wait for all of their children to send their
    neighborhood descriptions
  • Merges the descriptions
  • Sends the result to their parent
  • Eventually the collection phase reaches the root
    node
  • A final merge produces the entire description of
    the network

27
Distribution Phase
  • The root node sends the entire network
    description to all of its children
  • Those nodes would then send it to their children,
    and so on
  • In the end, every node gets a full description of
    the network topology

28
Multiple Initiators
  • What if more than one node is running a Topology
    Acquisition Task at the same time?
  • Confusion!!
  • Problem is solved by using unique identifiers for
    each initiator
  • This identifier is included in each task packet
  • To keep things efficient, we do not want to have
    all the tasks run to completion
  • So we conduct a competition!

29
Multiple Initiators
  • Competition Rules
  • Each node is allowed to belong to at most one
    topology task at a time
  • Each time a node is offered to join a task, it
    joins the one with the lowest identifier that it
    has seen so far
  • When a topology task discovers that a node has
    already accepted to join a task with a lower
    identifier then that topology task just dies
  • Eventually, the task with the lowest identifier
    is the only task that runs to completion

30
Topology Changes
  • Topology changes during a topology task causes
    confusion as well
  • Problem is solved by using epoch numbers
  • Each node keeps an epoch number
  • Epoch number identifies the epoch in which their
    topology task is running in
  • When a node detects a change in a direct link it
  • Forgets all of its old topology task state
  • Increments the epoch number
  • Starts a new topology task

31
Topology Changes
  • When any node receives a topology task packet it
    compares the epoch number to its own epoch
    number
  • If older it ignores the packet
  • If the same it processes the packet
  • If newer it forgets its task state and processes
    the packet using the new epoch number
  • If any topology task runs to completion then you
    must have reached a stable network state

32
Conclusion
  • Autonet has shown, in a moderate sized LAN, that
    it can run itself with very little intervention
    by humans
  • Autonet succeeds in hiding network failures very
    well
  • Even allows switch repairs without network
    disruption
  • Skeptic mechanism prevents constant
    reconfiguration due to unreliable hardware
  • At the same time it allows quick recovery from
    isolated failures

33
Conclusion
  • Automatic reconfiguration becomes prohibitively
    slow in large sized WANs, even with todays
    processing power and bandwidth
  • It is clear that Autonet needs some serious
    optimizations
  • Further research has been done towards optimizing
    automatic reconfiguration 2

34
References
  • 1 T.L. Rodeheffer, and M.D. Schroeder,
    Automatic Reconfiguration in Autonet, Technical
    Report 77, SRC Research, Sept. 1991.
  • 2 T. M. Pinkston, R. Pang, J. Duato.
    "Deadlock-Free Dynamic Reconfiguration Schemes
    for Increased Network Dependability," IEEE
    Transactions on Parallel and Distributed Systems,
    vol. 14,  no. 8,  pp. 780-794,  August,  2003.
Write a Comment
User Comments (0)
About PowerShow.com