Title: Improving%20the%20Performance%20of%20the%20Linux%20Network%20Subsystem
1Improving the Performance of the Linux Network
Subsystem
- King Fahd University of Petroleum and Minerals
(KFUPM) - INFORMATION AND COMPUTER SCIENCE DEPARTMENT
- Dr. K. Salah
- April 22, 2007
- Dhahran, Saudi Arabia
2Agenda
- Introduction
- Receive-livelock Phenomenon
- Existing Schemes
- Previous Work. Why Hybrid Scheme?
- Problem Statement
- Project Objectives
- Equipment
- Project Phases and Scheduling
- Benefits and Utilizations
- Budget
- Summary
3Introduction
- High-Speed Network devices are widely deployed
- Gigabit Ethernet Technology supports 1 Gb/s and
10 Gb/s raw bandwidth - Network performance has been shifted to servers
and end hosts - The high bandwidth increase can negatively impact
the OS performance due to the interrupt overhead
caused by the incoming gigabit traffic. - As interrupt handling has more priority over
other processing, this leads to receive-livelock
phenomenon
4Typical Architecture Model
5Packet Arrival Rate - Slow
Protocol Stack
Applications
Network traffic
Host system
6Packet Arrival Rate - Fast
Protocol Stack
Applications
Network traffic
X
X
Host system
7Receive-livelock Phenomenon
Ideal
Throughput
MLFRR
Acceptable
Livelock
Offered load
(Source K. K. Ramakrishnan,1993)
8Existing Schemes
- Normal Interruption
- Interrupt Disabling and Enabling
- Polling
- Pure Polling vs. NAPI Polling
- Interrupt Coalescing (IC)
- Hybrid Scheme
9Interrupt Disabling and Enabling
- The idea of pure interrupt disable-enable scheme
is to have the interrupts of incoming packets
turned off or disabled as long as there are
packets to be processed by kernels protocol
stack, i.e., the protocol buffer is not empty. - When the buffer is empty, the interrupts are
turned on again or re-enabled. - Any incoming packets (while the interrupts are
disabled) are DMAd quietly to protocol buffer
without incurring any interrupt overhead.
10Polling
- Disable interrupts of incoming packets altogether
and thus eliminating interrupt overhead
completely. - OS periodically polls its host system memory
(i.e., protocol processing buffer or DMA Rx Ring)
to find packets to process. - In general, exhaustive polling is rarely
implemented. Polling with quota is usually the
case whereby only a maximum number of packets is
processed in each poll in order to leave some CPU
power for application processing. - Two drawbacks for polling.
- First, unsuccessful polls can be encountered as
packets are not guaranteed to be present at all
times in the host memory, and thus CPU power is
wasted. - Second, processing of incoming packets is not
performed immediately as the packets get queued
until they are polled. - Selecting the polling period is crucial.
- Very frequent polling can be detrimental to
performance as significant overhead can be
encountered at each poll. - On the other hand, if polling is performed
infrequently, packets may encounter long delays.
11Pure Polling vs. NAPI Polling
12Pure Polling vs. NAPI Polling
13Shortcomings of NAPI
- Rotten Packets
- When NAPI re-enables interrupts, there is the
possibility of a packet or more would sneak in
during that time and go undetected until a fresh
packet arrives. These packets are known as
Rotten packets. - Poor Performance with CPU-bound Applications
- NAPI was reported not to perform well for hosts
that heavily loaded with CPU-bound applications.
This is caused from scheduling polling using
Linux softIRQs whereby CPU-bound user
applications compete with softIRQs for CPU, and
therefore softIRQs (and NAPI) would get less
chance to run.
14Interrupt Coalescing
- Most network adapters or NICs are manufactured to
have interrupt coalescing. - In IC, the NIC generates a single interrupt for a
group of incoming packets. - This is opposed to normal interruption mode in
which the NIC generates an interrupt for every
incoming packet. - Two schemes to mitigate the rate of interrupts
- Count-based IC
- NIC generates an interrupt when a predefined
number of packets has been received. - Time-based IC
- NIC waits a predefined time period before it
generates an interrupt. During this time period
multiple packets can be received.
15Hybrid Scheme
- A combination of
- Interrupt Disabling and Enabling
-
- Polling
16Why?
17Problem Statement
- In this research we intend
- to implement a novel hybrid interrupt-handling
scheme that improves the performance of Linux
networking subsystem and overcome the
shortcomings of NAPI. - to prove experimentally that our proposed scheme
outperforms NAPI under different system
configurations and load conditions.
18Project Objectives
- Devise a novel scheme for Linux platform to
enhance packet reception of links at Gigabit
speed. - The scheme is expected to outperform in terms of
latency, throughput, and CPU availability the
scheme of NAPI currently implemented in the
latest Linux 2.6. - The novel scheme should architect a proper
solution to measure and forecast the traffic
rate. - Also the novel scheme should work for a host with
single and multiple interfaces. - More importantly, the scheme should work for SMP
(Symmetric Multi-Processing) architecture where
the hosts motherboard has multiple processors.
19Project Objectives (contd)
- Find solutions to shortcomings and open issues of
NAPI (other than latency, throughput, and CPU
availability). These shortcomings include rotten
packets and poor network performance when the
system is heavily loaded with CPU-bound
applications. - Devise a novel generic benchmark for Linux hosts
to measure find the switching point (cliff
point). -
20Project Objectives (contd)
- Develop a testbed of an experiment to examine and
compare the performance of the new modified Linux
version to latest Linux NAPI. - The experiment takes into account numerous and
different test conditions and variables. - Linux host with single and multiple network
interfaces - Different types of input traffic (bursty,
constant, Poisson) - Different packet sizes
- Various types of system loads including CPU-bound
and I/O bound applications - Hosts with single and multiple processors (i.e.
SMP). - The experiment should follow guidelines of
testing and benchmarking laid out in RFC2544. -
21Experimental Equipment
22Project Phases and Scheduling
- Phase I (Period of six months)
- This is primarily a Linux network stack re-design
and modification phase - Phase II (Period of twelve months)
- This phase is concerned with the testbed and
experimental setup as well as running performance
evaluation of NAPI and our proposed hybrid
scheme. - Phase III (Period of six months)
- This phase is concerned with the performance of
our hybrid scheme for hosts with SMP support.
23Phase I
- Devise an appropriate technique to measure in
real-time the traffic arrival rate. This task
includes the following subtasks - Perform extensive review to measure and forecast
the arrival traffic rate. Devise a forecast
technique that has the following requirements - (1) computationally simplified and optimized with
minimal overhead and operations, - (2) accurate in terms of being comparable to
actual data rate, - (3) stable in terms of ignoring short traffic
spikes, and - (4) responsive in terms of following changes in
actual traffic rate. - Examine the effectiveness of the proposed
technique to forecast the traffic arrival rate
and compare it with other proposed techniques in
the literature. The technique must be
appropriate for different type of traffics
including bursty traffic with empirical packet
sizes. Discrete Event Simulation (DES) will be
used to assess the performance and effectiveness
of our proposed technique. - Plot, analyze, and compare performance of
proposed technique for forecasting arrival
traffic rate. - Determine (using simulation and fine tuning of
parameters) the minimum and maximum values (i.e.,
confidence interval) of forecasted/estimated
traffic rate. These values will be used as the
upper and lower thresholds of the cliff point and
will be used by the hybrid scheme for switching
between interrupt disable-enable and polling.
Also they will be used to prevent frequent
oscillation and switching between the scheme of
interrupt disable-enable and polling, and thereby
minimizing the overall overhead.
24Phase I contd
- Understand thoroughly Linux kernel and the
complex NAPI code. This would require the
following subtasks - Understand and perform extensive review and study
of Linux 2.6 network stack (NAPI) and the NIC
network drivers. - Set up a utility called cscope or kscope to
navigate and browse the actual Linux code and
understand it thoroughly. - Identify exactly what code needs to be changed in
both Linux kernel as well as the network driver - Identify how different the code should be to
support single processor and multi-processor
host, i.e., SMP. - Investigate open known issues or shortcomings
with NAPI (other than expected latency at low
traffic rate) and critique proposed solutions in
the literature. - These shortcomings include rotten packets and
poor network performance under heavy CPU-bound
applications. - More importantly, investigate how our proposed
solution of hybrid scheme will resolve these
known open issues.
25Phase II
- Modify, test, and recompile the code of Linux 2.6
to implement our proposed hybrid scheme and the
scheme to forecast the traffic arrival rate. In
addition the code has to handle solutions to
rotten packets and the problem of poor
performance of network stack under a system
heavily loaded with CPU-bound applications. - Learn how to use the IXIA 400T traffic
generator/analyzer. Configure simple experiment
of generating and receiving packets. - Identify the proper cliff point for the system.
This can be accomplished only by determining the
interrupt overhead and protocol processing time.
The interrupt overhead and protocol processing
time will be determined using measurement. - Using IXIA or some other technique, devise a
generic and useful way to measure interrupt
overhead. Determine the distribution of the
interrupt overhead. - Using IXIA or some other technique, devise a way
to measure protocol processing at OS level.
Determine the distribution of kernels protocol
processing.
26Phase II contd
- Using IXIA 400T and a PC with Linux 2.6 and NAPI
enabled, measure and plot the following
performance metrics - Packet forwarding latency
- Packet forwarding throughput
- CPU utilization with packet forwarding
- The above experiment will consider the following
different configurations and conditions - Different packet sizes
- Traffic distribution Poisson vs. bursty
- Traffic reception and transfer on a single NIC
- Traffic reception and transfer on multiple NICs
- Using IXIA 400T and a PC with our proposed hybrid
scheme, do the same performance measurements as
in Task 7 and Task 8. - Plot and compare performance of NAPI and our
proposed hybrid scheme. Make proper conclusions. - Compare and evaluate the performance of our
solutions for NAPI shortcomings of rotten packets
and poor network performance under CPU-bound
applications. Consider performance conditions
and configurations of Task 7 and Task 8.
27Phase III
- Examine the performance impact described for
previous tasks of (Task 6-11) under Linux support
for SMP with dual processors motherboard. - Compare SMP performance to the performance when
using only a single processor. This is a huge
phase, as six tasks are to be carried out again.
Its is to be noted according to RFC 2544
recommendations that in order to obtain a
reported value for a single performance point, a
test has to be repeated at least 20 times and the
reported value must be the average of these 20
recorded values. Also the recommendations and
guidelines state that the test has to run at
least 20 minutes for obtaining one single
reported value. - Ensure that the novel scheme preservers the order
of packets, i.e., there is no need for packet
re-ordering. - Prepare and deliver the final report
28Work Plan
29Personal Requirement
- The project team will consist of the primary
investigator and two graduate students (PhD or MS
degree candidates). - The graduate students will be a computer
science/engineering graduate and will work under
the supervision and guidance of the PI.
30Benefits and Utilization
- contribute to the advancement of open-source
operating systems (as that of Linux) by providing
a step-up version that improves the performance
of its networking subsystem to suit Gigabit
network traffic. - This will lead to having better Linux-based
routers, firewalls, servers, and proxies. - utilize previously theoretical work of 24 to
devise a new hybrid interrupt handling scheme to
improve the networking performance of Linux or
any operating systems. polling, and thereby
minimizing the overall overhead. - provide adequate solutions to NAPI shortcomings
of the current Linux 2.6 networking subsystem.
31Benefits and Utilization -- contd
- prove and demonstrate that the proposed hybrid
scheme is a big enhancement in terms of
performance form current versions when
considering many different configurations and
load conditions. - provide an algorithm and computationally
optimized technique to forecast the traffic
arrival rate. Such an algorithm or technique
should have no or minimal impact on Linux
performance. - provide a generic methodology and benchmark to
identify the switching point. - Research community at large can benefit
substantially from the experimental work in terms
of methodology, testbed, experimental setup and
configuration. The experimental methodology and
techniques can be employed for similar systems to
conduct performance comparison.
32Benefits and Utilization -- contd
- major beneficiaries may include almost all Saudi
companies, as well as governmental and
non-governmental institutions, that show keen
interest in using Linux. - GbE deployment
- Linux wide popularity
- will benefit KFUPM in general and the department
of Information and Computer Science in
particular. - It is anticipated that a modified version of
Linux that best suits Gigabit traffic will carry
the name of KFUPM and the ICS department on it. - KFUPM can be seen as an active contributor to
open-source code and community. - results of general interest to the research
community will be published at key international
conference, such as these of IEEE and ACM. Also
it is anticipated that this research work will
lead to publications in refereed reputable
journals. - No network traffic generators or analyzers at
KFUPM. - Such a project can definitely lay the ground for
further research and development by having such
equipment available. The equipment can be
utilized for research. - Also the IT center at the university can use such
equipment for diagnosing and troubleshooting
network problems related to performance
bottlenecks.
33Budget
34Summary
- In this research we intend to improve the
performance of Linux networking subsystem and
overcome the shortcomings of NAPI. - The project will be of great benefit to research
and open-source community and KUFPM, and the
public at large