Cluster Based Scalable Routing CBSR - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Cluster Based Scalable Routing CBSR

Description:

network transmission speeds continue to improve ... LANai. Memory. Memory. Issue & challenge (cont.) Low efficiency: receiving side. Internet ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 54
Provided by: jiul
Category:

less

Transcript and Presenter's Notes

Title: Cluster Based Scalable Routing CBSR


1
Cluster Based Scalable Routing(CBSR)
  • Yi Yang
  • Jiuliu Lu
  • Jing Li
  • Dan Fowlkes

2
Presentation Outline
  • Introduction
  • Objectives
  • Motivation
  • Related Work SPINE
  • Myrinet and GM

3
Presentation Outline (cont.)
  • Cluster-Based Scalable Router (CBSR)
  • System Architecture
  • System Set-up
  • Implementation and test
  • Summary Future Work

4
Introduction
  • network transmission speeds continue to
    improve
  • demand for network usage also increasing as
    more and more people (and their toasters and,
    pretty soon, we imagine, their pets) get on-line.

5
Introduction (cont.)
  • network performance dependent on more than
    transmission speed alone in order to maximize
    overall network performance other components of
    the network must be fine-tuned as well.
  • bottleneck used to be routers
  • now specially designed high-speed routers are
    available -- and EXPENSIVE

6
Introduction (cont.)
  • with currently falling prices for PCs, there may
    be a cost-effective alternative to shelling out
    the big bucks for one of these routers ---gt
    perform high-speed routing using clusters of
    workstations!

7
Objectives
  • demonstrate feasibility of using a cluster of
    workstations to do routing
  • in order to simply, we concentrate on creating
    an implementation that worked rather than one
    that could compete with commercial
    specially- designed high-speed router

8
Objectives (cont.)
  • ensure that system is scalable (non- scalable
    routers of rather limited use)
  • gauge performance of network

9
Motivation
  • Yi Yang Get a good grade.
  • Jiuliu Lu Get a good grade.
  • Jing Li Curiosity. (auditing the class)
  • Dan Ooooh... gummy bears..... err, i mean
    Get a good grade.

10
Motivation
  • To keep from having to shell out the big money
    for a specially designed high-speed router by
    using clusters of workstations to achieve the
    same functionality.

11
Related Work SPINE
  • Guiding Principle move application- specific
    functionality directly onto the network
    interface
  • should improve overall system performance by
    reducing the I/O related data and control
    transfers to the host system

12
Related Work SPINE (cont.)
  • they migrate an application's I/O specific
    functionality into device extensions
  • extension "code that is logically part of the
    application, but runs directly on the network
    interface."

13
Related Work SPINE (cont.)
  • defines interfaces which enable OSs to compute
    directly on an intelligent network interface
  • Aim efficiently implement methods (3), crucial
    to efficient I/O, in order to offer developers
    an architecture geared towards I/O intensive
    applications

14
RW SPINE, Method 1
  • Device-to-device transfers.
  • avoid extra copies of data so bandwidth needs in
    and out of host memory and over a shared bus
    significantly reduced
  • intelligent devices can process data prior to
    transferring it to a peer device in order to
    avoid unnecessary control transfers to the host
    system

15
RW SPINE, Method 2
  • Host/Device protocol partitioning.
  • system performance can be through quality of
    service, packet filtering, and low-level
    protocol support for application-specific
    multicast.

16
RW SPINE, Method 3
  • Device-level memory management.
  • allow direct transfers between network interface
    and application buffers

17
Myrinet
  • "Myrinet is a cost-effective, high-performance,
    packet-communication and switching technology
    that is widely used to interconnect clusters of
    workstations, PCS, or single-board computers."
  • two-fold benefit

18
Myrinet, Benefit 1
  • high performance
  • distribute demanding computational tasks across
    array of cost-effective hosts
  • given good sized array, benefits are competitive
    with high-speed routers
  • provide both high data-rate and low latency
    communication between host processes in order to
    support tightly coupled distributed computations

19
Myrinet, Benefit 2
  • high availability
  • achieved by allowing each computation to proceed
    with a subset of the hosts.

20
Myrinet (cont.)
  • can construct router out of cluster of
    workstations using conventional network such as
    Ethernet, but this router" would provide neither
    the performance nor features necessary for
    high-performance / high-availability clustering.

21
Myrinet (cont.)
  • packets used by Myrinet are not of fixed length
  • may be used to encapsulate other types of
    packets without need for an adaption layer
    (including IP packets)

22
Myrinet (cont.)
  • can carry packets of many types and protocols
    concurrently b/c each of these packets is
    identified by type
  • in this way, Myrinet has support for several
    software interfaces.

23
GM
  • message-based communication system for Myrinet
  • designed to keep the CPU overhead and latency
    low, the bandwidth high, and to be portable

24
GM Advantages over other Messaging Sys.
  • extremely low overhead (approximately 1 ms per
    packet) on all architectures
  • on systems supporting memory protection,
    includes the functionality to provide
    simultaneous memory-protected user-level
    OS-bypass network interface access to several
    user-level applications simultaneously

25
GM Advantages over other Messaging Sys. (cont.)
  • provides hosts with reliable, ordered delivery
    despite possible faults in the network
  • able to detect and retransmit both lost and
    corrupted packets

26
GM Advantages over other Messaging Sys. (cont.)
  • reroutes packets around any network faults when
    there exists an alternate route
  • catastrophic network errors are nonfatal -
    undeliverable packets are returned to the client
    with an error indication

27
GM Advantages over other Messaging Sys. (cont.)
  • able to support clusters of over 10,000 nodes
  • allows efficient deadlock-free bounded-memory
    forwarding through two levels of message priority
  • automatically maps Myrinet networks.

28
System Architecture
29
System Architecture (cont.)
  • Scalability
  • A CBSR may have variable number of workstations
    (routing machines).
  • A workstation may have variable number of NICs
    (network interfaces).

30
System Architecture (cont.)
31
System Architecture (cont.)
  • IP reads the IP headers of datagrams.
  • SAN doesnt extract information from IP
    datagrams. IP tells it to which interface a
    datagram is sent.
  • IP datagram is transmitted over SAN intactly
  • Myrinet SAN forwards message with DMA

32
System Architecture (cont.)
33
System Architecture (cont.)
  • IP treats datagrams from NICs and SAN equally.
    This approach needs to look up routing table
    twice.
  • SAN informs IP which interface a datagram should
    go. Routing table is involved only once.

34
System Architecture (cont.)
  • Transmission example 1

35
System Architecture (cont.)
  • Transmission example 2

36
System Set-up
37
System Set-up (cont.)
  • FreeBSD 4.0 current
  • GM API 1.1.1
  • Assume the availability of IP and IP routing
    table
  • Implement forwarding over Myrinet SAN

38
More about GM
  • How does GM transmit packets?

39
More about GM(cont.)

40
More about GM(cont.)

41
Implementation of CBSR

Myrinet
Keys
Receiving Event Processing
Token Based
Host
Interface
42
Flow of Code
  • GM setup initialize, open port
  • Prepare buffer
  • sending buffer , receiving buffer
  • sending content (for test)
  • Receiving event processing
  • Sending and sending event processing
  • GM shutdown

43
Result of Test

44
Result of Test (cont.)

45
Result of Test (cont.)
  • Small Size
  • 20,000 packets/s
  • 2.5 MB/s (20 Mb/s)
  • Middle Size
  • 15,000 packets/s
  • 15 MB/s (120 Mb/s)
  • Big Size
  • 8,000 packets/s
  • 32 MB/s (256 Mb/s)

46
Result of Test (cont.)
  • Comparison to SPINE
  • 11,800 packets/s
  • 0 load on CPU
  • including routing

47
Next step
  • Integration of routing forwarding
  • Multiple ports
  • Threads?
  • Test of scalability

48
Issue challenge
  • Low efficiency sending side

CPU
Myrinet
Internet
NIC
LANai
Memory
Memory
49
Issue challenge (cont.)
  • Low efficiency receiving side

CPU
Internet
Myrinet
NIC
LANai
Memory
Memory
50
Issue challenge (cont.)
CPU
CPU
Internet
Myrinet
Internet
51
Summary
  • General idea of scalable routing and related work
  • Myrinet and GM
  • CBSR
  • Architecture
  • Implementation
  • Test Result
  • Future work

52
Thanks
  • Prof. Vahdat
  • Andrew Gallatin
  • Prachi, Marty, Kisley
  • Marc Fiuczynski (U. of Washington)

53
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com