Improving%20the%20Performance%20of%20the%20Linux%20Network%20Subsystem presentation

About This Presentation

Transcript and Presenter's Notes

Title: Improving%20the%20Performance%20of%20the%20Linux%20Network%20Subsystem

1
Improving the Performance of the Linux Network
Subsystem

King Fahd University of Petroleum and Minerals
(KFUPM)
INFORMATION AND COMPUTER SCIENCE DEPARTMENT
Dr. K. Salah
April 22, 2007
Dhahran, Saudi Arabia

2
Agenda

Introduction
Receive-livelock Phenomenon
Existing Schemes
Previous Work. Why Hybrid Scheme?
Problem Statement
Project Objectives
Equipment
Project Phases and Scheduling
Benefits and Utilizations
Budget
Summary

3
Introduction

High-Speed Network devices are widely deployed
Gigabit Ethernet Technology supports 1 Gb/s and
10 Gb/s raw bandwidth
Network performance has been shifted to servers
and end hosts
The high bandwidth increase can negatively impact
the OS performance due to the interrupt overhead
caused by the incoming gigabit traffic.
As interrupt handling has more priority over
other processing, this leads to receive-livelock
phenomenon

4
Typical Architecture Model
5
Packet Arrival Rate - Slow
Protocol Stack
Applications
Network traffic
Host system
6
Packet Arrival Rate - Fast
Protocol Stack
Applications
Network traffic
X
X
Host system
7
Receive-livelock Phenomenon

Ideal

Throughput
MLFRR

Acceptable

Livelock

Offered load

(Source K. K. Ramakrishnan,1993)
8
Existing Schemes

Normal Interruption
Interrupt Disabling and Enabling
Polling
Pure Polling vs. NAPI Polling
Interrupt Coalescing (IC)
Hybrid Scheme

9
Interrupt Disabling and Enabling

The idea of pure interrupt disable-enable scheme
is to have the interrupts of incoming packets
turned off or disabled as long as there are
packets to be processed by kernels protocol
stack, i.e., the protocol buffer is not empty.
When the buffer is empty, the interrupts are
turned on again or re-enabled.
Any incoming packets (while the interrupts are
disabled) are DMAd quietly to protocol buffer
without incurring any interrupt overhead.

10
Polling

Disable interrupts of incoming packets altogether
and thus eliminating interrupt overhead
completely.
OS periodically polls its host system memory
(i.e., protocol processing buffer or DMA Rx Ring)
to find packets to process.
In general, exhaustive polling is rarely
implemented. Polling with quota is usually the
case whereby only a maximum number of packets is
processed in each poll in order to leave some CPU
power for application processing.
Two drawbacks for polling.
First, unsuccessful polls can be encountered as
packets are not guaranteed to be present at all
times in the host memory, and thus CPU power is
wasted.
Second, processing of incoming packets is not
performed immediately as the packets get queued
until they are polled.
Selecting the polling period is crucial.
Very frequent polling can be detrimental to
performance as significant overhead can be
encountered at each poll.
On the other hand, if polling is performed
infrequently, packets may encounter long delays.

11
Pure Polling vs. NAPI Polling
12
Pure Polling vs. NAPI Polling
13
Shortcomings of NAPI

Rotten Packets
When NAPI re-enables interrupts, there is the
possibility of a packet or more would sneak in
during that time and go undetected until a fresh
packet arrives. These packets are known as
Rotten packets.
Poor Performance with CPU-bound Applications
NAPI was reported not to perform well for hosts
that heavily loaded with CPU-bound applications.
This is caused from scheduling polling using
Linux softIRQs whereby CPU-bound user
applications compete with softIRQs for CPU, and
therefore softIRQs (and NAPI) would get less
chance to run.

14
Interrupt Coalescing

Most network adapters or NICs are manufactured to
have interrupt coalescing.
In IC, the NIC generates a single interrupt for a
group of incoming packets.
This is opposed to normal interruption mode in
which the NIC generates an interrupt for every
incoming packet.
Two schemes to mitigate the rate of interrupts
Count-based IC
NIC generates an interrupt when a predefined
number of packets has been received.
Time-based IC
NIC waits a predefined time period before it
generates an interrupt. During this time period
multiple packets can be received.

15
Hybrid Scheme

A combination of
Interrupt Disabling and Enabling
Polling

16
Why?
17
Problem Statement

In this research we intend
to implement a novel hybrid interrupt-handling
scheme that improves the performance of Linux
networking subsystem and overcome the
shortcomings of NAPI.
to prove experimentally that our proposed scheme
outperforms NAPI under different system
configurations and load conditions.

18
Project Objectives

Devise a novel scheme for Linux platform to
enhance packet reception of links at Gigabit
speed.
The scheme is expected to outperform in terms of
latency, throughput, and CPU availability the
scheme of NAPI currently implemented in the
latest Linux 2.6.
The novel scheme should architect a proper
solution to measure and forecast the traffic
rate.
Also the novel scheme should work for a host with
single and multiple interfaces.
More importantly, the scheme should work for SMP
(Symmetric Multi-Processing) architecture where
the hosts motherboard has multiple processors.

19
Project Objectives (contd)

Find solutions to shortcomings and open issues of
NAPI (other than latency, throughput, and CPU
availability). These shortcomings include rotten
packets and poor network performance when the
system is heavily loaded with CPU-bound
applications.
Devise a novel generic benchmark for Linux hosts
to measure find the switching point (cliff
point).

20
Project Objectives (contd)

Develop a testbed of an experiment to examine and
compare the performance of the new modified Linux
version to latest Linux NAPI.
The experiment takes into account numerous and
different test conditions and variables.
Linux host with single and multiple network
interfaces
Different types of input traffic (bursty,
constant, Poisson)
Different packet sizes
Various types of system loads including CPU-bound
and I/O bound applications
Hosts with single and multiple processors (i.e.
SMP).
The experiment should follow guidelines of
testing and benchmarking laid out in RFC2544.

21
Experimental Equipment
22
Project Phases and Scheduling

Phase I (Period of six months)
This is primarily a Linux network stack re-design
and modification phase
Phase II (Period of twelve months)
This phase is concerned with the testbed and
experimental setup as well as running performance
evaluation of NAPI and our proposed hybrid
scheme.
Phase III (Period of six months)
This phase is concerned with the performance of
our hybrid scheme for hosts with SMP support.

23
Phase I

Devise an appropriate technique to measure in
real-time the traffic arrival rate. This task
includes the following subtasks
Perform extensive review to measure and forecast
the arrival traffic rate. Devise a forecast
technique that has the following requirements
(1) computationally simplified and optimized with
minimal overhead and operations,
(2) accurate in terms of being comparable to
actual data rate,
(3) stable in terms of ignoring short traffic
spikes, and
(4) responsive in terms of following changes in
actual traffic rate.
Examine the effectiveness of the proposed
technique to forecast the traffic arrival rate
and compare it with other proposed techniques in
the literature. The technique must be
appropriate for different type of traffics
including bursty traffic with empirical packet
sizes. Discrete Event Simulation (DES) will be
used to assess the performance and effectiveness
of our proposed technique.
Plot, analyze, and compare performance of
proposed technique for forecasting arrival
traffic rate.
Determine (using simulation and fine tuning of
parameters) the minimum and maximum values (i.e.,
confidence interval) of forecasted/estimated
traffic rate. These values will be used as the
upper and lower thresholds of the cliff point and
will be used by the hybrid scheme for switching
between interrupt disable-enable and polling.
Also they will be used to prevent frequent
oscillation and switching between the scheme of
interrupt disable-enable and polling, and thereby
minimizing the overall overhead.

24
Phase I contd

Understand thoroughly Linux kernel and the
complex NAPI code. This would require the
following subtasks
Understand and perform extensive review and study
of Linux 2.6 network stack (NAPI) and the NIC
network drivers.
Set up a utility called cscope or kscope to
navigate and browse the actual Linux code and
understand it thoroughly.
Identify exactly what code needs to be changed in
both Linux kernel as well as the network driver
Identify how different the code should be to
support single processor and multi-processor
host, i.e., SMP.
Investigate open known issues or shortcomings
with NAPI (other than expected latency at low
traffic rate) and critique proposed solutions in
the literature.
These shortcomings include rotten packets and
poor network performance under heavy CPU-bound
applications.
More importantly, investigate how our proposed
solution of hybrid scheme will resolve these
known open issues.

25
Phase II

Modify, test, and recompile the code of Linux 2.6
to implement our proposed hybrid scheme and the
scheme to forecast the traffic arrival rate. In
addition the code has to handle solutions to
rotten packets and the problem of poor
performance of network stack under a system
heavily loaded with CPU-bound applications.
Learn how to use the IXIA 400T traffic
generator/analyzer. Configure simple experiment
of generating and receiving packets.
Identify the proper cliff point for the system.
This can be accomplished only by determining the
interrupt overhead and protocol processing time.
The interrupt overhead and protocol processing
time will be determined using measurement.
Using IXIA or some other technique, devise a
generic and useful way to measure interrupt
overhead. Determine the distribution of the
interrupt overhead.
Using IXIA or some other technique, devise a way
to measure protocol processing at OS level.
Determine the distribution of kernels protocol
processing.

26
Phase II contd

Using IXIA 400T and a PC with Linux 2.6 and NAPI
enabled, measure and plot the following
performance metrics
Packet forwarding latency
Packet forwarding throughput
CPU utilization with packet forwarding
The above experiment will consider the following
different configurations and conditions
Different packet sizes
Traffic distribution Poisson vs. bursty
Traffic reception and transfer on a single NIC
Traffic reception and transfer on multiple NICs
Using IXIA 400T and a PC with our proposed hybrid
scheme, do the same performance measurements as
in Task 7 and Task 8.
Plot and compare performance of NAPI and our
proposed hybrid scheme. Make proper conclusions.
Compare and evaluate the performance of our
solutions for NAPI shortcomings of rotten packets
and poor network performance under CPU-bound
applications. Consider performance conditions
and configurations of Task 7 and Task 8.

27
Phase III

Examine the performance impact described for
previous tasks of (Task 6-11) under Linux support
for SMP with dual processors motherboard.
Compare SMP performance to the performance when
using only a single processor. This is a huge
phase, as six tasks are to be carried out again.
Its is to be noted according to RFC 2544
recommendations that in order to obtain a
reported value for a single performance point, a
test has to be repeated at least 20 times and the
reported value must be the average of these 20
recorded values. Also the recommendations and
guidelines state that the test has to run at
least 20 minutes for obtaining one single
reported value.
Ensure that the novel scheme preservers the order
of packets, i.e., there is no need for packet
re-ordering.
Prepare and deliver the final report

28
Work Plan
29
Personal Requirement

The project team will consist of the primary
investigator and two graduate students (PhD or MS
degree candidates).
The graduate students will be a computer
science/engineering graduate and will work under
the supervision and guidance of the PI.

30
Benefits and Utilization

contribute to the advancement of open-source
operating systems (as that of Linux) by providing
a step-up version that improves the performance
of its networking subsystem to suit Gigabit
network traffic.
This will lead to having better Linux-based
routers, firewalls, servers, and proxies.
utilize previously theoretical work of 24 to
devise a new hybrid interrupt handling scheme to
improve the networking performance of Linux or
any operating systems. polling, and thereby
minimizing the overall overhead.
provide adequate solutions to NAPI shortcomings
of the current Linux 2.6 networking subsystem.

31
Benefits and Utilization -- contd

prove and demonstrate that the proposed hybrid
scheme is a big enhancement in terms of
performance form current versions when
considering many different configurations and
load conditions.
provide an algorithm and computationally
optimized technique to forecast the traffic
arrival rate. Such an algorithm or technique
should have no or minimal impact on Linux
performance.
provide a generic methodology and benchmark to
identify the switching point.
Research community at large can benefit
substantially from the experimental work in terms
of methodology, testbed, experimental setup and
configuration. The experimental methodology and
techniques can be employed for similar systems to
conduct performance comparison.

32
Benefits and Utilization -- contd

major beneficiaries may include almost all Saudi
companies, as well as governmental and
non-governmental institutions, that show keen
interest in using Linux.
GbE deployment
Linux wide popularity
will benefit KFUPM in general and the department
of Information and Computer Science in
particular.
It is anticipated that a modified version of
Linux that best suits Gigabit traffic will carry
the name of KFUPM and the ICS department on it.
KFUPM can be seen as an active contributor to
open-source code and community.
results of general interest to the research
community will be published at key international
conference, such as these of IEEE and ACM. Also
it is anticipated that this research work will
lead to publications in refereed reputable
journals.
No network traffic generators or analyzers at
KFUPM.
Such a project can definitely lay the ground for
further research and development by having such
equipment available. The equipment can be
utilized for research.
Also the IT center at the university can use such
equipment for diagnosing and troubleshooting
network problems related to performance
bottlenecks.

33
Budget
34
Summary

In this research we intend to improve the
performance of Linux networking subsystem and
overcome the shortcomings of NAPI.
The project will be of great benefit to research
and open-source community and KUFPM, and the
public at large

Write a Comment

User Comments (0)

About PowerShow.com

Improving%20the%20Performance%20of%20the%20Linux%20Network%20Subsystem PowerPoint PPT Presentation