Automatic Reconfiguration in Autonet Research by Thomas L. Rodeheffer and Michael D. Schroeder - PowerPoint PPT Presentation

About This Presentation

Title:

Automatic Reconfiguration in Autonet Research by Thomas L. Rodeheffer and Michael D. Schroeder

Description:

Automatic Reconfiguration in Autonet Research by Thomas L. Rodeheffer and Michael D. Schroeder Presented by Arnold Suvatne and Michael Shiau – PowerPoint PPT presentation

Number of Views:80

Avg rating:3.0/5.0

Slides: 35

Provided by: sjs64

Learn more at: http://www.cs.sjsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Automatic Reconfiguration in Autonet Research by Thomas L. Rodeheffer and Michael D. Schroeder

1
Automatic Reconfiguration in Autonet Research
byThomas L. Rodeheffer and Michael D. Schroeder

Presented by
Arnold Suvatne and Michael Shiau

2
What is Autonet?

Switch based LAN that automatically reconfigures
itself when network changes are detected
Responds to both failures and additions of
network components
Works well in a LAN configuration but cannot
effectively scale to a WAN

3
Autonet Goals

Quick Reconfiguration
Ensure reconfiguration is quick enough so that
upper level layers do not notice any disruption
High Availability
Utilize redundant topology to ensure full
interconnection

4
Automatic Reconfiguration Mechanism

Processor in each switch performs 3 main tasks
Monitoring Task
Periodically check directly connected links to
determine whether they are broken or working
Topology Acquisition Task
Collect and distribute topology description of
the entire network to every node
Routing
Each switch uses the new topology description to
compute their respective forwarding tables

5
Monitoring Task

Determines if neighborhood links are useful
What does it mean to be useful?
Allows packet transfer between two distinct nodes
with low error rates
Two types of events
Link Failure responds quickly
Link Recovery not as quick of a response
Monitoring Task utilizes a method called the
Skeptic

6
Skeptic

What does the Skeptic Do?
Prevents constant reconfiguration due to
intermittent links or bursts of failures
Delays recovery of a link that has a history of
failures
Skepticism Level
The higher the skepticism level the more
skeptical we are about the reliability or quality
of a link
Generally numbered from 0-20
Used in calculation of wait time (wtime) and good
time (gtime)

7
Skeptic

Wait Time
wtime wbase wmult 2 level
wbase and wmult are policy parameters and level
is the skepticism level
Time that a switch must wait before transitioning
to a good state
Good Time (gtime)
gtime gbase gmult 2 level
gbase and gmult are policy parameters
Time that a switch must stay in a good state in
order to forgive a skepticism level
forgive means reducing the skepticism level by
1

8
Skeptic Internals
1
9
Skeptic

Skeptic Design Requirements 1
A link with a good history must be allowed to
fail and recover several times without
significant penalty
In the worst case, a links average long term
failure rate must not be allowed to exceed some
low rate
Common behaviors show by bad links should result
in exceedingly low average long term failure
rates
A link that stops being bad must eventually be
forgiven its bad history

10
Monitoring Task Layers

Monitoring Task has 2 Layers
Transmission Layer
Deals with failure and recovery events at the
hardware level
Connectivity Layer
Deals with failure and recovery events at the
network layer

11
Transmission Layer

Purpose
Watches link hardware to determine if link is
successful at sending and receiving data
Doesnt care where packets are coming from or
going to
Passes its conclusion to the connectivity layer
Isolates broken link by setting switch to discard
all incoming and outgoing packets on that link
Composed of
Fault Monitor
Skeptic
Three Error Detectors
Round Trip Verifier

12
Transmission Layer
1
13
Fault Monitor

Structure
Lowest level of the transmission layer
Takes three inputs
From error detectors (failure)
From round trip verifier (failure)
From dummy object (working)
Purpose
To pass input into skeptic for checking or
rechecking link status

14
Corrupt Packet Detector

Examines packets received by switch control
Checks for invalid packet lengths and corruptions
via CRC
Allows for some corruption
Uses Leaky Bucket Mechanism
Each time corrupt packet is detected, a token is
placed inside bucket
Tokens leak out of bucket every 10 min
If adding token causes bucket to have gt 5 tokens,
a fault is declared.

15
Stuck Link Detector

Discover links that become stuck in a state that
prevents data transmission
This happens when a command code is
miss-transmitted
Allow occasional occurrences of stuck links
Impose stuck link occurrence quota via Leaky
Bucket Method

16
Violation Detector

Detects coding Violations which are caused by
static on the line
Detects formation violations which are caused by
lost or ill-formed information
Should allow for isolated instances of both types
of errors as they are quite common
Acceptable rate depends on skeptic
Working link can have 3 errors
Broken links can not have any errors

17
Round Trip Verifier
1

Purpose
Allows confirmation between two nodes to decide
whether a link is working or not
Filters the output of the skeptic by delaying
working messages until confirmation
Implementation
Implemented with the 3 state machine above

18
Connectivity Layer

Reached when Transmission Layer declares a link
as working
Does care where a packet is coming from or going
to
Packets are communicated between the nodes to
determine identity and usefulness.
Filters out links that do not connect anywhere or
that connect back to the same switch

19
Connectivity Layer Structure

Fault Monitor receives working or broken result
from transmission layer
Round trip verifier will delay working messages
until it has exchanged packets and determined the
identity of the remote node
Distinct Node verifier will make sure that the
two nodes communicating are indeed different

1
20
Round Trip Verifier

Filters the output of the skeptic until it has
determined the identity of the remote node
Exchanges connectivity packet with remote link to
test communication between two links
Tests a link vigorously if there is reason to
believe that testing might change result
How can it be done?
If a previously working link stops responding, a
exponential decrease in packet tests are sent
once communication is resumed
This exponential back off allows for occasional
losses or delays in response without hindering
the system performance

21
Distinct Node Verifier

Makes sure that the identity of the remote node
is not the same as the local node.
Similar nodes should not be considered working
Change in status from working to broken or vice
versa will cause a re-computation of topology.

22
Topology Acquisition Task

Responsibility of Topology Acquisition Task
Collect and distribute the new topology
description of the entire network to every node
Three scenarios
Single node initiates the task and runs to
completion
Multiple initiators
Topology changes while performing task

23
Basic method

Topology Acquisition Task consists of 3 phases
Propagation
From top down, construct a rooted spanning tree
over the set of all reachable nodes
Collection
From bottom up, discover and merge descriptions
of larger and larger sub-trees
Distribution
From root node down, send complete description of
the network topology to every node

24
Propagation Phase

Starting at the initiator (root node)
Each node N offers each of its neighbors, M, an
option to join the spanning tree as a child of N
If M has not already joined the tree, it accepts
N as its parent and joins the tree
Otherwise it rejects the offer
The spanning tree is complete when all nodes have
received responses, accept or reject, from all
their neighbors

25
Collection Phase

Each node that accepted an offer to be a child
during the propagation phase
Commits to providing to its parent a description
of its neighborhood (sub-tree)
Collection phase begins at the leaves of the
spanning tree and rises up to the root
A node knows its a leaf node when all of its
neighbors refuse to be children

26
Collection Phase

Non-leaf nodes
Wait for all of their children to send their
neighborhood descriptions
Merges the descriptions
Sends the result to their parent
Eventually the collection phase reaches the root
node
A final merge produces the entire description of
the network

27
Distribution Phase

The root node sends the entire network
description to all of its children
Those nodes would then send it to their children,
and so on
In the end, every node gets a full description of
the network topology

28
Multiple Initiators

What if more than one node is running a Topology
Acquisition Task at the same time?
Confusion!!
Problem is solved by using unique identifiers for
each initiator
This identifier is included in each task packet
To keep things efficient, we do not want to have
all the tasks run to completion
So we conduct a competition!

29
Multiple Initiators

Competition Rules
Each node is allowed to belong to at most one
topology task at a time
Each time a node is offered to join a task, it
joins the one with the lowest identifier that it
has seen so far
When a topology task discovers that a node has
already accepted to join a task with a lower
identifier then that topology task just dies
Eventually, the task with the lowest identifier
is the only task that runs to completion

30
Topology Changes

Topology changes during a topology task causes
confusion as well
Problem is solved by using epoch numbers
Each node keeps an epoch number
Epoch number identifies the epoch in which their
topology task is running in
When a node detects a change in a direct link it
Forgets all of its old topology task state
Increments the epoch number
Starts a new topology task

31
Topology Changes

When any node receives a topology task packet it
compares the epoch number to its own epoch
number
If older it ignores the packet
If the same it processes the packet
If newer it forgets its task state and processes
the packet using the new epoch number
If any topology task runs to completion then you
must have reached a stable network state

32
Conclusion

Autonet has shown, in a moderate sized LAN, that
it can run itself with very little intervention
by humans
Autonet succeeds in hiding network failures very
well
Even allows switch repairs without network
disruption
Skeptic mechanism prevents constant
reconfiguration due to unreliable hardware
At the same time it allows quick recovery from
isolated failures

33
Conclusion

Automatic reconfiguration becomes prohibitively
slow in large sized WANs, even with todays
processing power and bandwidth
It is clear that Autonet needs some serious
optimizations
Further research has been done towards optimizing
automatic reconfiguration 2

34
References

1 T.L. Rodeheffer, and M.D. Schroeder,
Automatic Reconfiguration in Autonet, Technical
Report 77, SRC Research, Sept. 1991.
2 T. M. Pinkston, R. Pang, J. Duato.
"Deadlock-Free Dynamic Reconfiguration Schemes
for Increased Network Dependability," IEEE
Transactions on Parallel and Distributed Systems,
vol. 14, no. 8, pp. 780-794, August, 2003.

Write a Comment

User Comments (0)