Virtual and Redundant Switches - PowerPoint PPT Presentation

About This Presentation

Title:

Virtual and Redundant Switches

Description:

Switching fabric fails, each of the switches is now disconnected from the others, ... In this case, the switching fabric switch failed, and the uplink ports were ... – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 18

Provided by: YUC80

Learn more at: http://iram.cs.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Virtual and Redundant Switches

1
Virtual and Redundant Switches

IRAM Retreat Winter 2001
Sam Williams

2
Outline

Motivation
Existing Products
Arrayed Commodity Switches
Adding Redundancy
Optimizing
Generalization
Conclusions

3
Motivation

Cost of switches grows very quickly
O(Ports2) for crossbar based
Additionally address tables and buffers must
grow
Industry leading MTBF for a single switch is
about 50K hours
and typical is perhaps only 25K.
Modular Switches provide redundancy for
management and power, but not the data transport
fabric.
MTTR is typically over 1 hour
Can the money saved by cascading commodity
switches be applied towards improved performance
or redundancy?
The goals are to improve the MTBF, improve
performance, and simplify the work that must be
done to replace a failed switch.

4
Existing Products

Existing modular aggregators can merge several
smaller switches (modules) into a single large
virtual switch.
In this case, each 36 port switch module has a
pair of gigabit uplinks to the switching fabric,
which has either 6 or 24 gigabit ports (full
duplex)
Redundancy is also provided for management
modules, fans, and power supplies.
However, not for modules or switching fabric.
So if the switching fabric fails, the entire
device fails, but if individual switching
modules fail, then only that sub network
fails.
Management modules can infer priority to
improve performance for critical activity

3com switch 4007
5
Existing Products (Analysis)

The cost analysis here is based on use of either
18 or 48 Gbps switching fabrics, 36 port
switching modules and either a 7 or 13 bay
chassis.
Performance is slowdown on the time to send from
every node to every other node compared to a true
n36 port switch.
MTBF is for any part of the network
MTTR was at least 1 hour.
Repair cost is about 4000/failure
modularization helps to keep this low, but yearly
maintenance cost will grow with the number of
ports

6
Examples of failure

Switching module fails, each of the
nodes/sub-networks attached is no disconnected
from all other nodes
More likely case

Switching fabric fails, each of the switches is
now disconnected from the others, but nodes
attached to a switch still can communicate with
each other.

7
Examples of failure (continued)

Redundancy allows for this failure, with reduced
performance.
This are not commodity switches, and are
considerably more expensive.

However, in this case, the failure does cause a
network split.
This is the more likely case, so why not allow
the extra switch be used to cover any other
switchs failure
Could be extended to nodes, but then you pay
double for NICs and ports.

8
Virtual switch from commodity switches

Although without the management functions, and
performance, cheaper virtual switches can be
built nothing more than just cascading them
This is based on 5, 8, 16, and 24 port switches,
each with the last port MDI type, and from 5
different companies
Performance is poor since the uplinks are only
100Mbps
Adding a second uplink port only moderately
alleviates this deficiency

9
Virtual switch from mid-range switches

By using switches more suited to this design
(higher speed uplink(s)), we can improve
performance
These switches use an 8 or 24 port switch at the
bottom, each with 1 or 2 gigabit uplink modules,
and a 4, 8, or 12 port gigabit switch at the top
The gigabit uplinks and gigabit switches drive
cost to at least twice as much as commodity
solution, but with 10x better performance
Performance is near that of a monolithic switch
if 2 uplinks are used.
Compared to packaged solution, its about half the
cost, and slightly less performance, but no
management functionality.

10
Port Virtualization for Redundancy

The re-mapping stage is much simpler than a full
nm port switch. Essentially each of the m n bit
busses are mapped to one of the k n bit internal
busses which are connected directly to the
switches
For this example each of the 4 groups of 8
virtual ports is mapped to one of the 5 groups of
physical ports. The uplinks of the first stage
switches are sent back, and into one of the top
level switches.
An even simpler solution, for single redundancy,
would be to map either directly, or to the spare
In this design the the single point of failure is
the re-mapping block, since first and second
level switches have redundancy
So for the example below, MTBF is improved by
about 50 (from 208 days to 347 days)

11
Operation (Homogenous switches)

In this somewhat rigid example, there are 6 bays,
4 are map direct or to spare, There is a
switching fabric slot, and a slot for the
redundant switch, which can replace either of the
other two classes
In this case, the switching fabric switch failed,
and the uplink ports were remapped to the spare.
At this point the admin must replace the failed
switch. If any other switch fails before this,
the network will be partially split.

12
Operation - continued

In this case, one of the first level of switches
failed. Instead of those nodes loosing
connection to the rest of the network, they are
remapped to the spare.
Once again, the admin must replace the failed
switch. If any other switch fails before this,
the network will be partially split.
If the case had bee the spare went down, then it
would need to be replaced to provide redundancy.

13
Port Virtualization for Higher Performance

Previous performance analysis was based on
1-to-all messaging.
However, it is likely that network access
patterns can be broken into groups of high
inter-node communication
Thus monitoring can be performed, and the network
can be periodically paritioned into activity
groups
Create a graph based on bandwidth used between
nodes, use something like Kernighan partitioning
to separate it into a number of partitions equal
to the number of first stage switches (power of
2).
The re-mapping stage is only slightly simpler
than a full nm port switch (no buffers, never
any contention, etc)

Logical View 3 switches reserved as spares. 1
failed, and the network was repartitioning
14
Performance / Availability

MTTR for aggregators was typically over an hour.
This is on top of the time to detect the failure.
By automating recovery, the downtime can be
significantly reduced
This is dependent on timely detection of a failed
switch, which could be handled via packet
injection.
Once the failing switch is determined, a new
mapping can quickly be determined.
For the performance optimizing case, satisfying
connectivity is the top priority, a previously
scheduled performance can be done later.

15
Generalization

Use homogenous switches. There is a mapping
layer which maps physical to virtual ports. This
can range from simple 1 to 2, to complex 1 to n,
with performance monitoring and repartitioning.
Performance can be gained by using some faster
switches where needed.

16
Conclusion

It is possible to make a larger virtual switch
out of smaller switches, and still get reasonable
performance.
With little additional hardware, and monitoring
agent, it is possible to make it fault tolerant,
with several spare switches which can be
automatically swapped in simple case cost
O(Spares Ports).
more complex designs make it O(Ports2)
With a very simple, but large switch, it is
possible to also optimize for performance by
balancing network bandwidth among switches in the
pool. This is a much more costly solution.
A generalization would provide a pool of switches
connected by the port mapper, and some or none
reserved as spares.
Both of these concepts and their functionality
could be integrated into a single ASIC or even
using a network processor.

17
Future Work

How do switches fail? This determines the
failure detection method.
Implementation of type 1 or 2 switch would be
possible given the relative simple mapper.
Type 3, 4, or 5 would require a complex ASIC,
which should be replaced with a network processor
and software.

Write a Comment

User Comments (0)