Towards Gigabit - PowerPoint PPT Presentation

1 / 60
About This Presentation
Title:

Towards Gigabit

Description:

Hardware / Driver / OS. Protocol Stack Overhead. Scalability of the protocol specification ... (Max Segment Life) Variance of IP delay. MSL Sequence Number Space ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 61
Provided by: Xiaoliang
Category:
Tags: gigabit | towards

less

Transcript and Presenter's Notes

Title: Towards Gigabit


1
Towards Gigabit
  • David Wei
  • Netlab_at_Caltech

2
Potential Problems
  • Hardware / Driver / OS
  • Protocol Stack Overhead
  • Scalability of the protocol specification
  • TCP Stability /Utilization (New Congestion
    Control Algorithm)
  • Related Experiments Measurements

3
Hardware / Drivers /OS
  • NIC Driver
  • Device Management (Interrupt)
  • Redundant Copies
  • Device Polling (http//info.iet.unipi.it/luigi/po
    lling/)
  • Zero-Copy TCP

www.cs.duke.edu/ari/publications/talks/freebsdcon
4
Device Polling
  • Current process for NIC driver in FreeBSD
  • Packet come to NIC
  • NIC-gtHardware Interrupt
  • CPU jumps to the interrupt handler for that NIC
  • MAC layer process reads data from NIC to a queue
  • Upper layer process the data in queue (lower
    priority)
  • Drawback
  • CPU checks the NIC for every packet -- Context
    switching.
  • Frequent interruption for high speed device
  • Live-Lock
  • CPU is too busy working on NIC interruption to
    process the data in the queue.

5
Device Polling
  • Device Polling
  • Polling CPU checks the device when it has time.
  • Scheduling User specifies a time ratio for CPU
    to work on devices and on non-device processing.
  • Advantages
  • Balance between the device service and non-device
    processing
  • Improve performance in fast devices

6
Protocol Stack Overhead
  • Per-packet over head
  • Ethernet Header / Checksum
  • IP Header / Checksum
  • TCP Header / Checksum
  • Coping / interruption process
  • Solution Increase packet size
  • Opt Packet Sizemin packet size along the path
    (Fragmentation results in low performance too.)

7
Path MTU Discovery (1191)
  • Current Method
  • Dont Fragment bits
  • (Router Drop/Fragment Host Test/Enforce)
  • MTUmin576, first hop MTU
  • MSSMTU-40
  • MTUlt65535 (Architecture)
  • MSSlt65495 (IP sign-bit bugs)
  • Drawback Usually too small

8
Path MTU Discovery
  • How to Discover PMTU?
  • Current
  • Search (Proportional Decreasing / Binary)
  • Update (Periodically Increasing set to the MTU
    of first hop)
  • Proposed
  • Search/Update with typical MTU values
  • Routers provide suggestion of MTU in DTB
    indicating the DF pack drop.

9
Path MTU Discovery
  • Implementation
  • Host
  • Packetization Layer (TCP / Connection over UDP)
    DF/Packet Size
  • IP Store PMTU for each known path (routing
    table)
  • ICMP Datagram Too Big Message
  • Router
  • Send ICMP Packet when Datagram is too big.
  • Implementation problems
  • RFC 2923

10
Scalability of Protocol Specifications
  • Windows Size Space (lt64K)
  • Sequence Number Space (Wrapping up, lt2G)
  • Inadequate Frequency of RTT Sampling (1 sample
    per Window)

11
Sequence Number Space
12
Sequence Number Space
13
Sequence Number Space
14
Sequence Number Space
15
Sequence Number Space
16
Sequence Number Space
17
Sequence Number Space
18
Sequence Number Space
19
Sequence Number Space
  • MSL (Max Segment Life)gtVariance of IP delay
  • MSLltSequence Number Space/Bandwidth

20
Sequence Number Space
  • MSL (Max Segment Life)gtVariance in IP
  • MSLlt8Sequence Number Space/Bandwidth
  • SN Space2312GB
  • Bandwidth1GB
  • MSLlt16sec
  • Variance of IP delaylt16 sec
  • Current TCP 3 min.
  • Not scalable with bandwidth growth

21
TCP-Extensions (1323)
  • Window Spaces 16bit Scale Factor in SYN
    WinWin2S
  • RTT Measurement Timestamp for each packet
    (generated by sender, relayed by receiver)
  • PAWS (Protect Against Wrapped Sequence Number)
    Use timestamp to expand the sequence space. (So
    the timer should not be too fast or too slow 1ms
    1 sec)
  • Header Prediction Simplify the process

22
High Speed TCP
  • Floyd 02. Goals
  • Achieve large window size with realistic loss
    rate (Use current window size in AIMD parameter)
  • High Speed in a single connection (10Gbps)
  • Easy to achieve high sending rate for a given
    loss rate. How to Achieve TCP-Friendliness?
  • Incremental Deployable (no router support
    required)

23
High Speed TCP
  • Problem in Steady State
  • TCP response function
  • Large congestion window requires a very low loss
    rate.
  • Problem in Recovery
  • Congestion Avoidance takes too long to recover
    (Consecutive Time-outs)

24
Consecutive Time-out
25
Consecutive Time-out
26
Consecutive Time-out
27
Consecutive Time-out
28
High Speed TCP
  • Change the TCP response function
  • p is high (higher than maxP corresponding to the
    default cwnd size W) standard TCP
  • p is low (cwnd gt W) use a(w), b(w) instead of
    constant a,b in the adjustment of cwnd.
  • For a given loss rate P and desired windows Size
    W1 at P get a(w) and b(w). (Keep the linearity
    on a log-log scale. ? logW?? logP)

29
Change TCP Function
  • Standard TCP

30
Change TCP Function
31
Change TCP Function
32
Change TCP Function
33
Expectations
  • Achieve large window with realistic loss rate
  • Relative fairness between standard TCP and High
    speed TCP (Acquired bandwidth ? cwnd )
  • Moderate decrease instead of halving window size
    when congestion detected (0.33 at 1000)
  • Pre-computed Look-up
  • to implement a(w) and b(w).

34
Slow Start
  • Modification of Slow Start
  • Problem doubling cwnd for each RTT is too
    aggressive for large cwnd
  • Proposal To limit ?cwnd in a RTT in Slow Start.

35
Limited Slow Start
  • For each ACK
  • Cwndltmax_ss_threshold
  • ?cwndMSS
  • (Standard TCP Slow Start)
  • Cwndgtmax_ss-threshold
  • ?cwnd0.5max_ss_threshold/cwnd
  • (at most 0.5 max_ssthreshold each RTT)

36
Related Projects
  • Cray Research (92)
  • CASA Testbed (94)
  • Duke (99)
  • Pittsburg Supercomputing center
  • Portland State Univ.(00)
  • Internet 2 (01)
  • Web100
  • Net100 (built on Web 100)

37
Cray Research 92
  • TCP/IP Performance at Cray Research (Dave
    Borman)
  • Configuration
  • HIPPI between two dedicated Y/MPs with Model E
    IOS and Unicos 8.0
  • Memory to memory transfer
  • Results
  • Direct channel-to-channel
  • MTU - 64K - 781 Mbps
  • Through a HIPPI switch
  • MTU - 33K - 416 Mbps
  • MTU - 49K - 525 Mbps
  • MTU - 64K - 605 Mbps

38
CASA Testbed 94
  • Applied Network Research of San Diego
    Supercomputer Center UCSD
  • Goal Delay and Loss Characteristics of
    HIPPI-based gigabit testbed
  • Link Feature Blocking (HIPPI), tradeoff between
    high lost rate and high delay
  • Conclusion Avoiding packet loss is more
    important than reduce delay
  • Performance (DelayBandwidth 2MB 1323 on Cray
    machines) 500Mbps TCP sustained throughput
    (TTCP/Netperf)

39
Trapeze/IP (Duke)
  • Goal
  • What optimization is most useful to reduce host
    overheads for fast TCP?
  • How fast does TCP really go, at what cost?
  • Approaches
  • Zero-Copy
  • Checksum offloading
  • Result
  • gt900Mbps for MTUgt8K

40
Trapeze/IP (Duke)
  • Zero-copy

www.cs.duke.edu/ari/publications/talks/freebsdcon
41
Trapeze/IP (Duke)
www.cs.duke.edu/ari/publications/talks/freebsdcon
42
Trapeze/IP (Duke)
www.cs.duke.edu/ari/publications/talks/freebsdcon
43
Trapeze/IP (Duke)
www.cs.duke.edu/ari/publications/talks/freebsdcon
44
Enabling High Performance Data Transfers on Hosts
  • By Pittsburg Supercomputing center
  • Enable RFC 1191 MTU Discovery
  • Enable RFC 1323 Large Windows
  • OS Kernel Large enough socket buffers
  • Application Set its send and receive socket
    buffer sizes
  • Detailed methods to tune various OS.

45
PSU Experiment
  • Goal
  • Round Trip Delay and TCP throughput with
    different window size
  • Influence by different devices (CISCO
    3508/3524/5500), different NIC
  • Environment
  • OS FreeBSD 4.0/4.1 (without 1323?), Linux,
    Solaris
  • WAN 155Mbps OC-3 over SONET MAN
  • Measurement Tools Ping TTCP

46
PSU Experiment
  • "smaller" switches and low-level routers can
    easily muck things up.
  • bugs in Linux 2.2 kernels
  • Different NICs have different performance.
  • Fast PCI bus (64 bits 66mhz) is necessary
  • Switch MTU size can make a difference (giant
    packets are better).
  • Bigger TCP window sizes can help but there seems
    to be a knee around 4MB that is not remarked upon
    in the literature.

47
Internet-2 Experiment
  • Goal Single TCP connection with 700-800Mbps over
    WAN Relations among Window Size, MTU and
    Throughput
  • Back-to-Back
  • OS FreeBSD 4.3 release
  • Architecture 64bit-66Mhz PCI
  • Configuration sendspacerecvspace102400
  • Setup Direct connection (back-back) and WAN
  • WAN Symmetric path host1-Abilene-host2
  • Measurement Ping IPerf

48
Internet-2 Experiment
  • Back-to-Back
  • No Loss
  • Found some bug in FreeBSD 4.3

Window 4KB MTU 8KB MTU
512K 690 855-986
1M 658 986
2M 562 986
4M 217 987
8M 93 987
16M 86 985
  • WAN
  • lt200Mbps
  • Asymmetry in different directions (cache of MTU)

49
Web 100
  • Goal Make it easy for non-expertise to achieve
    high bandwidth
  • Method Get more information from TCP
  • Software
  • Measurement embedded into kernel TCP
  • App Layer Diagnostics / Auto-Tuning
  • Proposal
  • RFC 2012 (MIB)

50
Net 100
  • Built on Web 100
  • Auto-tune the parameter for non-experts.
  • Network-Aware OS
  • Bulk File Transportation for ORNL
  • Implementation of Floyds High Speed TCP

51
Floyds TCP SS on Net100
  • www.csm.ornl.gov/dunigan/net100/floyd.html
  • RTT80ms
  • 1MBsndwnd
  • 2MBrcvwnd
  • Cwndweb100

52
Floyds TCP AIMD on Net100
  • www.csm.ornl.gov/dunigan/net100/floyd.html
  • RTT87ms
  • Wnd1000seg
  • Max_ss100seg
  • Ss1.8sec
  • MD at 1000
  • 0.33/Timeout
  • AI at 700
  • 8/RTT
  • Old TCP
  • 45sec recovery

53
Trend (Mathis Oct 2001)
54
Trend (Mathis Oct 2001)
  • TCP over Long Path

Year Wizard Non-Wizard Ratio
1988 1Mbps 300kbps 31
1991 10Mbps
1995 100Mbps
1999 1Gbps 3Mbps 3001
55
Related Tools
  • Measurement
  • IPerf
  • TCP Dump
  • Web100
  • Emulation
  • Dummynet

56
NLANR-Iperf
  • Feature
  • Try to send data on user space
  • Support IPv4/IPv6
  • Support TCP/UDP/Multicast
  • Similar software Auto Tuning Enabled FTP
    Client/Server
  • Concern
  • Preemption by other processes in Gigabit test?
    (Observation in Internet2 Experiment)

57
Dummy Net
  • Embedded in FreeBSD now
  • Delay delay in IP layer
  • Loss random loss in IP layer
  • Concern
  • Overhead
  • Pattern of packet loss

58
Current Status in Netlab_at_Caltech
  • 100Mbps Testbed in netlab

59
Next Step
  • 1Gbps Testbed in lab

60
QA
Write a Comment
User Comments (0)
About PowerShow.com