Title: Discussion of Transport Protocol and Network Monitoring Advances
1Discussion of Transport Protocol and Network
Monitoring Advances
- Deb Agarwal (DAAgarwal_at_lbl.gov)
- Lawrence Berkeley National Laboratory
- Berkeley, CA USA
- NOTE The views expressed in this presentation
are the authors and not necessarily those of the
United States Government
2Outline
- Reliable Multicast status
- TCP flow and congestion control advances
- Network performance testing and monitoring
3What is Multicast Communication?
- Group communication mechanism
- Provides one-to-many and many-to-many
communication - Efficient dissemination of messages
- Network-based duplication (when needed)
- Targeted retransmissions
- Bandwidth savings
- Parallel delivery at multiple locations
- Components
- IP Multicast (hosts and routers)
- Reliable multicast (application level)
4IP Multicast Communication
Multicast
Unicast
5Internet Protocol (IP) Multicast
- Efficient group communication mechanism
- provides one-to-many communication
- Best-effort delivery to the group members
(unreliable) - Implemented in the network routers and hosts
- Class D addresses used for multicast (224.x.x.x -
239.x.x.x) - Network components manage routing and duplicate
the message as needed - Co-exists with TCP and UDP communication
mechanisms
6Example IP Multicast Use (Access Grid )
7What is Reliable Multicast?
- Properties similar to TCP
- Application-level program (runs on end systems)
- Uses IP Multicast as the underlying communication
mechanism - Reliable and ordered delivery of messages within
a group (negative acknowledgments and
retransmissions) - Tracks group membership
8Example Reliable Multicast Use
9CD-1.1 Multicast Capability
- Motivation
- efficient transmission of data to multiple sites
- reliability (minimize single points of failure)
- allow as-needed use of multicast
- Designed and implemented by SAIC
- data provider multicasts data and provides
retransmissions (small look back window only) - reliability host requests any required catch-up
from data provider (unicast) - reliability host responsible for catch-up of data
consumers (unicast) - easy to use either unicast or multicast
10Design Constraints
- All system-level CD-1.1 unicast requirements
apply - 99.99 reliability requirement has been retained
- Application level reliability provided by a
combination of multicast and point-to-point
transmission mechanisms - Minimize perturbations to CD-1.1 Formats and
Protocols IDC 3.4.3 Rev. 0.2 - Compatible with CDS CD-1.1 unicast
- A data provider must be able to service both
multicast and point-to-point data consumers
simultaneously
1. Adapted from Cordova and Bowman, CD-1.1
Reliable MulticastingRequirements and
Preliminary Design.
11Design Features for the Multicast Capabilities
- Up to 20 data consumers in a single multicast
group. - Supports increases and decreases in the size of
the multicast group without the need to restart
the sending activity of the data provider. - Transmission rates are constant and configurable
to mitigate network congestion. - The size of multicast data packets is
configurable to support small MTU networks
(frames broken into packets) - Retransmission of missing multicast data packets,
availability limited by configurable data buffer
size. - Reliability hosts provide unicast catch-up of
frames missed by multicast data consumers
12CD-1.1 Reliable Multicast - Normal Operation
e.g., Data Center
Consumer 3
Reliability Host 1
Multicast
Multicast
IMS Station
Multicast
Consumer 4
Reliability Host 2
Multicast
Multicast
e.g., Backup Data Center
1. Adapted from Cordova and Bowman, CD-1.1
Reliable MulticastingRequirements and
Preliminary Design.
13CD-1.1 Reliable Multicast -Catch-Up Operation
e.g., Data Center
Reliability Host 1
Consumer 3
Multicast
Multicast
Pt to Pt
Pt to Pt
IMS Station
Multicast
Reliability Host 2
Consumer 4
Multicast
Multicast
Pt to Pt
Pt to Pt
e.g., Backup Data Center
1. Adapted from Cordova and Bowman, CD-1.1
Reliable MulticastingRequirements and
Preliminary Design.
14Conceptual Design
Data Provider
Data Consumers
Multicast Sender
Multicast Receiver
Point-to-Point Receiver
Point-to-Point Sender
Multicast Receiver
Point-to-Point Sender
Reliability Hosts
Point-to-Point Receiver
1. Adapted from Cordova and Bowman, CD-1.1
Reliable MulticastingRequirements and
Preliminary Design.
15Reliability Hosts
- Policies
- Any site may provide a reliability host for
catch-up data transmission - Data consumers may select the reliability host to
connect to - CD-1.1 Access Control
1. Adapted from Cordova and Bowman, CD-1.1
Reliable MulticastingRequirements and
Preliminary Design.
16Design Approach
- System level
- Custom reliable multicast solution (based on
CD1.1) - Point-to-point mechanism for application level
reliability - Separate multicast (real-time) and point-to-point
(catch-up) transmission into separate subsystems - Multicast subsystem
- Modify connection sequence to generalize
initiation of a connection after an outage - Multicast transmission initiated by pull from
data consumer - Use CD-1.1 procedure (in reverse) to establish
connection - Multicast transmission begins at data provider
time (current time - small look back) - no
attempt to catch up - Data provider provides packet-level reliability
host
1. Adapted from Cordova and Bowman, CD-1.1
Reliable MulticastingRequirements and
Preliminary Design.
17CD-1.1 Formats and Protocols
- Extension to connection options
- Data consumer requests connections for
- multicast transmission
- point-to-point catch-up of missing frames
- Data provider initiated connections for unicast
unchanged - Minor changes required to IDC 3.4.3 Rev. 0.2
- Use some fields that had been reserved for
multicasting - Defined a new type of Option Request Frame
- Changed miscellaneous text descriptions to be
valid for both unicast and multicast
1. Adapted from Cordova and Bowman, CD-1.1
Reliable MulticastingRequirements and
Preliminary Design.
18SAIC - CD-1.1 Reliable Multicast Testing
- Test phases
- 1. Local area testing on San Diego Testbed -
complete - 2. Wide area testing between San Diego Testbed
and CMR - complete - 3. Wide area testing between IS56US and San Diego
Testbed and CMR - complete - 4. Wide area testing between I56US and AFTAC, CMR
and San Diego Testbed - ongoing - Example test cases
- Multicast Data Transmission
- Gap Notification and Unicast Catchup
- Change of Multicast Group Address
- Multicast Layer Network Usage
- Data Timeliness
- Multicast Connectivity Fails
19Outline
- Reliable Multicast status
- TCP flow and congestion control advances
- Network performance testing and monitoring
20Standard TCP - Background
- Sliding window-based flow and congestion control
- Number of outstanding packets in the path
- Slow start
- Probe the network at start-up
- Quickly find operating throughput range (minimal
congestion) - Additive Increase Multiplicative Decrease (AIMD)
- Steady state algorithm
- Continually strive to improve throughput
- React quickly to congestion
- Congestion measured by lost packets
21Standard TCP Algorithms
- Slow start window size
- Initially one segment
- Size increased by one for each acknowledgement
- Exponential growth
- Continues until loss of a packet or reach a set
threshold - AIMD
- Loss of a packet cuts the window size in half
- Each acknowledgement increases the window by
1/cwind - Acknowledgements
- Receiver sends acknowledgement for highest in
sequence packet received (3 duplicates cause a
retransmission) - Selective acknowledgement SACK
- Allow acknowledgement of packets beyond a gap
22Standard TCP Congestion Control
23Problems With Standard TCP
- Large bandwidthdelay product
- Connection must have bandwidthdelay packets in
flight to fully utilize a connection - Random loss
- Interpreted as congestion
- Bursty transmission
- Window size jumps cause bursts of packets
- Small MTU (Maximum Transmission Unit)
- Window increases and decreases are in MSS
- Small default buffer sizes in end hosts (8KB
versus 4MB)
24Currently Proposed Solutions
- TCP Tuning
- Appropriate values for the TCP buffers at the
sender and receiver - HighSpeed TCP/Scalable TCP
- Makes AIMD more aggressive
- Non-linear increase and decrease parameters based
on current window size - FAST TCP
- TCP Vegas-like algorithm
- Congestion measured as a function of round trip
time - Jumbo Frames
- Increase the MTU of the network
- Web 100 Work Around Daemon
- Provides work-arounds for several TCP problems
25TCP Tuning
26TCP/HSTCP Response Function
27Performance Comparison
28Outline
- Reliable Multicast status
- TCP flow and congestion control advances
- Network performance testing, monitoring, and
diagnosis
29Network Testing Goals
- Network path to destination
- Roundtrip time/one way delay
- Capacity of each segment of a path
- Bottleneck in the path
- Available bandwidth
- Achievable bandwidth
- TCP throughput
- Appropriate TCP parameters
- UDP throughput
- Loss rate
- IP Multicast capabilities
- Identify router and host mis-configuration
- Jitter
- Packet reordering
30Network Testing Tools
- Ping
- ICMP packets
- Reachability and response time
- Traceroute
- ICMP packets
- List of routers along the path and response time
for each router - Iperf
- TCP and UDP throughput
- Delay, jitter, and loss
- Client/server
- Pchar/pathchar
- Series of UDP packets of varying sizes
- Bottleneck link capacity
- Per hop bandwidth, propagation delay, queue time,
and drop rate
31Testing Cont.
- Treno
- UDP packets to an unused port
- Expected TCP throughput
- Pathrate
- UDP packet trains
- Capacity of a path
- Pathload
- UDP packet trains
- Available bandwidth of a path
- Network Characterization Service (NCS)
- UDP packet trains
- Available bandwidth, maximum burst size, and
bottleneck - TCP and UDP achievable throughput
- Network Internet Measurement Infrastructure
(NIMI) - Framework for launching network testing tools
- Internet2 Performance Improvement Performance
Environment
32Network Monitoring Goals
- Track performance of network paths
- Detect problems
- Understand application and protocol behaviors
- Test new applications and protocols
- Predict future performance
- Resource scheduling
33Network Monitoring Tools - Active
- PingER
- ICMP ping-based (periodically send a set of
pings) - Response time
- Packet loss
- Reachability
- TCP Bulk transfer rate
- Traceroute
- Used to track High Energy Physics sites
34Cont.
- Multicast Beacon
- Periodic IP Multicasts from all sites
- Multicast connectivity/loss matrix
- Used by Access Grid community
35Network Monitoring Tools - Passive
- Tcpdump/tcptrace/xplot
- Capture traffic headers at an end host
- Protocol and application behavior
- CoralReef
- Capture traffic headers from inside the network
- Self-Configuring Network Monitor
- Secure capture of traffic headers inside the
network - Configuration based on request
- Protocol and network behavior
- NeTraMet A Network Traffic Flow Measurement
Tool - CoralReef packet capture
- Provides a flow meter
- NetFlow
- Flow statistics from routers (SNMP)
- Multi Router Traffic Grapher (MRTG)
- Graph statistics from routers (SNMP)
36Self-Configuring Network Monitor
37Network Diagnosis
- Netlogger
- Event triggered monitoring
- User specified monitoring points
- End-to-end monitoring of application performance
- Net 100
- TCP parameter visibility and recording
- Can be analyzed for problematic behaviors
- Traffic Analysis and Automatic Diagnosis (TAAD)
- CoralReef flow collection
- Analyze aggregated traffic flows for problematic
signatures
38Web100/Iperf/Netlogger
39Netlogger Debugging of an Application
40Network Intrusion Detection
- Bro
- Passively monitor network link
- Filters traffic to produce higher level events
- Uses a policy language to express site security
policy - Real-time detection and notification of attacks
- Ability to contact border router and block
attacker - Used at Lawrence Berkeley Lab (LBNL) and National
Energy Research Supercomputing Center (NERSC)
extensively
41Relevant Working Groups
- Reliable Multicast
- IETF Reliable Multicast Transport Working Group
- IRTF Reliable Multicast Research Group
- Network Measurement/Monitoring
- IETF IP Performance Metrics Working Group
- IRTF Internet Measurement Research Group
- GGF Grid High-Performance Networking Research
Working Group - GGF Network Measurements Working Group
42URLs
- AMP - http//amp.nlanr.net/AMP/
- CoralReef - http//www.caida.org/tools/measurement
/coralreef/ - FAST TCP - http//netlab.caltech.edu/FAST/
- GGF Network Measurements Working Group -
http//www-didc.lbl.gov/NMWG/ - HSTCP - http//www.icir.org/floyd/hstcp.html and
http//www-itg.lbl.gov/evandro/hstcp/index.html - IETF IP Performance Metrics working group -
http//www.ietf.org/html.charters/ippm-charter.htm
l - Iperf - http//dast.nlanr.net/Projects/Iperf/
- NCS - http//www-didc.lbl.gov/NCS/
- Net100 - http//www.net100.org/
- Netlogger - http//www.itg.lbl.gov/Netlogger/homep
age.html - Netperf - http//www.netperf.org/netperf/NetperfPa
ge.html - NeTraMet - http//www2.auckland.ac.nz/net/NeTraMet
/ - NIMI - http//www.ncne.nlanr.net/nimi/
- Pathrate/pathload - http//www.cc.gatech.edu/fac/C
onstantinos.Dovrolis/bw.html - Pathchar - ftp//ftp.ee.lbl.gov/pathchar/
- PingER - http//www-iepm.slac.stanford.edu/pinger/
- SCNM - http//www-itg.lbl.gov/Net-Mon/Self-Config.
html - TAAD - http//ncne.nlanr.net/research/taad/
- TCP tuning guide - http//www-didc.lbl.gov/TCP-tun
ing/TCP-tuning.html