Title: Enabling NetworkAware Applications
1Enabling Network-Aware Applications
- Brian Tierney
- Dan Gunter
- Jason Lee
- Martin Stoufer
- Lawrence Berkeley National Laboratory
- Joseph Evans
- University of Kansas
2Outline
- Motivation
- WAN basics
- Enable approach
- Use cases
- Enable components
- Sample results
- Future work
3Motivation
- Many high performance distributed applications
use a (small) fraction of their available
bandwidth - For, e.g., bulk data transfer, TCP tuning
techniques can make a large difference - Hand-tuning is time consuming, error-prone and
often an administrative nightmare
4Motivation (2)
- We need to understand where and how to put
intelligence into the network
5WAN basics (1)
- Proper selection of TCP buffer size depends upon
bandwidth-delay product of link - buf delay BW
- On an uncongested network, parallel streams can
help with TCPs conservative (congestion
avoidance) behavior - however, too many streams can stress the CPU
- unfair in a congested network, and may have
router interactions
6WAN basics (2)
7Network Awareness
- Paths are dynamic and sometimes (20)
asymmetrical accurately gauging network
capabilities is hard
8Current Approaches
- Put network tests (like a ping and a small timed
data transfer) into the application - Provide configuration options, and require users
to run tests before running the application - Make a best guess as to the likely network
characteristics - Do nothing use system defaults
9Enable Approach
- Provide an advice service with a simple
high-level API - Focus on bulk data transfer
- Co-locate the advice service with the data server
(e.g., the FTP server) - Recognize new data clients and schedule
measurements automatically - Calculate statistical metrics based on the
results of many measurements
10Enable Architecture
11Use-Case (1) Single client
12Use-Case (2) Automatic testing
Later that day..
13Use-Case(3) Multiple clients
14Use-Case(4) Tool Comparison
15Enable Components
16Enable server functionality
- Schedule tests
- Current network tests ping, pipechar, iperf
- consider conflicts iperf, pipechar need to run
serially multiple pings can run in parallel. - Record last N results in the database
- Perform analysis needed for advice
- keep running average of all tests to keep track
of long-term changes in characteristics - do trimmed-mean and stddev of last N to calculate
TCP buffers, etc.
17Enable API
- Uses XML-RPC so the API is cross-platform and
cross-language - C and Python are implemented Java is next
- Primary goal is simplicity -- requesting advice
is a single function call - sz srv.getBufferSize(my.host.org)
18Enable DB
- Simply makes its test/result objects
persistent, so little additional infrastructure
is required - everything stored in a single file
- transactions/locking considered overkill
- Last N (configurable) results stored
- Long-term archiving done with NetLogger (to
flat-files or relational DB)
19Sample Results (1)
- iperf over four network paths
Path RTT bottleneck link bandwidth LBL-CERN 180
ms 45 Mbits/sec LBL-ISI East 80 ms 1000
Mbits/sec LBL-ANL 60 ms 45 Mbits/sec LBL-KU 50
ms 45 Mbits/sec
Tuning method LBL-CERN LBL-ISI east
LBL-ANL LBL-KU no tuning 2 Mbits/sec 5
Mbits/sec 5 Mbits/sec 6 Mbits/sec Linux 2.4
auto 6 Mbits/sec 110 Mbits/sec 9 Mbits/sec 9
Mbits/sec hand tuning 18 Mbits/sec 266
Mbits/sec 17 Mbits/sec 27 Mbits/sec Enable
tuning 18 Mbits/sec 264 Mbits/sec 16 Mbits/sec 26
Mbits/sec
20Sample results (2)
- Tests over a 5 to 30 hour time period between a
host at LBNL and a host at Kansas University. - ping
- pipechar
- iperf
21iperf results (1)
22iperf results (2)
tightly clustered
longer left tail
23pipechar results (1)
24pipechar results (2)
tightly clustered
very long left tail
25ping results (1)
26ping results (2)
very tightly clustered
much longer right tail
27Comparing Results
- Enable makes it easy to add new tools
- This is useful for comparing different tools
- how pipechar, iperf, and nettest measure
bandwidth - how good is web100 autotuning?
- It also provides better insight into how to
provide a good summary statistic for any given
tool
28Future Work
- Better advice
- more tools/tests (NWS, asymmetric paths?)
- better analysis (confidence intervals)
- QoS
- Fancier scheduling (stochastic intervals)
- Scalability issues
- Aggregation of clients
- Aging/purging of stored results
- Multiple Enable hosts on same subnet to overcome
OS limitations on parallel tests (e.g.
pipechar)
29Getting Enable
- http//www-didc.lbl.gov/ENABLE
- Current version 1.0-Beta
- Functional
- Requires Python (2.1)
- Codebase is small and easily installed