Improving%20the%20Performance%20of%20the%20Linux%20Network%20Subsystem PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Improving%20the%20Performance%20of%20the%20Linux%20Network%20Subsystem


1
Improving the Performance of the Linux Network
Subsystem
  • King Fahd University of Petroleum and Minerals
    (KFUPM)
  • INFORMATION AND COMPUTER SCIENCE DEPARTMENT
  • Dr. K. Salah
  • April 22, 2007
  • Dhahran, Saudi Arabia

2
Agenda
  • Introduction
  • Receive-livelock Phenomenon
  • Existing Schemes
  • Previous Work. Why Hybrid Scheme?
  • Problem Statement
  • Project Objectives
  • Equipment
  • Project Phases and Scheduling
  • Benefits and Utilizations
  • Budget
  • Summary

3
Introduction
  • High-Speed Network devices are widely deployed
  • Gigabit Ethernet Technology supports 1 Gb/s and
    10 Gb/s raw bandwidth
  • Network performance has been shifted to servers
    and end hosts
  • The high bandwidth increase can negatively impact
    the OS performance due to the interrupt overhead
    caused by the incoming gigabit traffic.
  • As interrupt handling has more priority over
    other processing, this leads to receive-livelock
    phenomenon

4
Typical Architecture Model
5
Packet Arrival Rate - Slow
Protocol Stack
Applications
Network traffic
Host system
6
Packet Arrival Rate - Fast
Protocol Stack
Applications
Network traffic
X
X
Host system
7
Receive-livelock Phenomenon

Ideal

Throughput
MLFRR

Acceptable

Livelock

Offered load

(Source K. K. Ramakrishnan,1993)
8
Existing Schemes
  • Normal Interruption
  • Interrupt Disabling and Enabling
  • Polling
  • Pure Polling vs. NAPI Polling
  • Interrupt Coalescing (IC)
  • Hybrid Scheme

9
Interrupt Disabling and Enabling
  • The idea of pure interrupt disable-enable scheme
    is to have the interrupts of incoming packets
    turned off or disabled as long as there are
    packets to be processed by kernels protocol
    stack, i.e., the protocol buffer is not empty.
  • When the buffer is empty, the interrupts are
    turned on again or re-enabled.
  • Any incoming packets (while the interrupts are
    disabled) are DMAd quietly to protocol buffer
    without incurring any interrupt overhead.

10
Polling
  • Disable interrupts of incoming packets altogether
    and thus eliminating interrupt overhead
    completely.
  • OS periodically polls its host system memory
    (i.e., protocol processing buffer or DMA Rx Ring)
    to find packets to process.
  • In general, exhaustive polling is rarely
    implemented. Polling with quota is usually the
    case whereby only a maximum number of packets is
    processed in each poll in order to leave some CPU
    power for application processing.
  • Two drawbacks for polling.
  • First, unsuccessful polls can be encountered as
    packets are not guaranteed to be present at all
    times in the host memory, and thus CPU power is
    wasted.
  • Second, processing of incoming packets is not
    performed immediately as the packets get queued
    until they are polled.
  • Selecting the polling period is crucial.
  • Very frequent polling can be detrimental to
    performance as significant overhead can be
    encountered at each poll.
  • On the other hand, if polling is performed
    infrequently, packets may encounter long delays.

11
Pure Polling vs. NAPI Polling
12
Pure Polling vs. NAPI Polling
13
Shortcomings of NAPI
  • Rotten Packets
  • When NAPI re-enables interrupts, there is the
    possibility of a packet or more would sneak in
    during that time and go undetected until a fresh
    packet arrives. These packets are known as
    Rotten packets.
  • Poor Performance with CPU-bound Applications
  • NAPI was reported not to perform well for hosts
    that heavily loaded with CPU-bound applications.
    This is caused from scheduling polling using
    Linux softIRQs whereby CPU-bound user
    applications compete with softIRQs for CPU, and
    therefore softIRQs (and NAPI) would get less
    chance to run.

14
Interrupt Coalescing
  • Most network adapters or NICs are manufactured to
    have interrupt coalescing.
  • In IC, the NIC generates a single interrupt for a
    group of incoming packets.
  • This is opposed to normal interruption mode in
    which the NIC generates an interrupt for every
    incoming packet.
  • Two schemes to mitigate the rate of interrupts
  • Count-based IC
  • NIC generates an interrupt when a predefined
    number of packets has been received.
  • Time-based IC
  • NIC waits a predefined time period before it
    generates an interrupt. During this time period
    multiple packets can be received.

15
Hybrid Scheme
  • A combination of
  • Interrupt Disabling and Enabling
  • Polling

16
Why?
17
Problem Statement
  • In this research we intend
  • to implement a novel hybrid interrupt-handling
    scheme that improves the performance of Linux
    networking subsystem and overcome the
    shortcomings of NAPI.
  • to prove experimentally that our proposed scheme
    outperforms NAPI under different system
    configurations and load conditions.

18
Project Objectives
  • Devise a novel scheme for Linux platform to
    enhance packet reception of links at Gigabit
    speed.
  • The scheme is expected to outperform in terms of
    latency, throughput, and CPU availability the
    scheme of NAPI currently implemented in the
    latest Linux 2.6.
  • The novel scheme should architect a proper
    solution to measure and forecast the traffic
    rate.
  • Also the novel scheme should work for a host with
    single and multiple interfaces.
  • More importantly, the scheme should work for SMP
    (Symmetric Multi-Processing) architecture where
    the hosts motherboard has multiple processors.

19
Project Objectives (contd)
  • Find solutions to shortcomings and open issues of
    NAPI (other than latency, throughput, and CPU
    availability). These shortcomings include rotten
    packets and poor network performance when the
    system is heavily loaded with CPU-bound
    applications.
  • Devise a novel generic benchmark for Linux hosts
    to measure find the switching point (cliff
    point).

20
Project Objectives (contd)
  • Develop a testbed of an experiment to examine and
    compare the performance of the new modified Linux
    version to latest Linux NAPI.
  • The experiment takes into account numerous and
    different test conditions and variables.
  • Linux host with single and multiple network
    interfaces
  • Different types of input traffic (bursty,
    constant, Poisson)
  • Different packet sizes
  • Various types of system loads including CPU-bound
    and I/O bound applications
  • Hosts with single and multiple processors (i.e.
    SMP).
  • The experiment should follow guidelines of
    testing and benchmarking laid out in RFC2544.

21
Experimental Equipment
22
Project Phases and Scheduling
  • Phase I (Period of six months)
  • This is primarily a Linux network stack re-design
    and modification phase
  • Phase II (Period of twelve months)
  • This phase is concerned with the testbed and
    experimental setup as well as running performance
    evaluation of NAPI and our proposed hybrid
    scheme.
  • Phase III (Period of six months)
  • This phase is concerned with the performance of
    our hybrid scheme for hosts with SMP support.

23
Phase I
  • Devise an appropriate technique to measure in
    real-time the traffic arrival rate. This task
    includes the following subtasks
  • Perform extensive review to measure and forecast
    the arrival traffic rate. Devise a forecast
    technique that has the following requirements
  • (1) computationally simplified and optimized with
    minimal overhead and operations,
  • (2) accurate in terms of being comparable to
    actual data rate,
  • (3) stable in terms of ignoring short traffic
    spikes, and
  • (4) responsive in terms of following changes in
    actual traffic rate.
  • Examine the effectiveness of the proposed
    technique to forecast the traffic arrival rate
    and compare it with other proposed techniques in
    the literature. The technique must be
    appropriate for different type of traffics
    including bursty traffic with empirical packet
    sizes. Discrete Event Simulation (DES) will be
    used to assess the performance and effectiveness
    of our proposed technique.
  • Plot, analyze, and compare performance of
    proposed technique for forecasting arrival
    traffic rate.
  • Determine (using simulation and fine tuning of
    parameters) the minimum and maximum values (i.e.,
    confidence interval) of forecasted/estimated
    traffic rate. These values will be used as the
    upper and lower thresholds of the cliff point and
    will be used by the hybrid scheme for switching
    between interrupt disable-enable and polling.
    Also they will be used to prevent frequent
    oscillation and switching between the scheme of
    interrupt disable-enable and polling, and thereby
    minimizing the overall overhead.

24
Phase I contd
  • Understand thoroughly Linux kernel and the
    complex NAPI code. This would require the
    following subtasks
  • Understand and perform extensive review and study
    of Linux 2.6 network stack (NAPI) and the NIC
    network drivers.
  • Set up a utility called cscope or kscope to
    navigate and browse the actual Linux code and
    understand it thoroughly.
  • Identify exactly what code needs to be changed in
    both Linux kernel as well as the network driver
  • Identify how different the code should be to
    support single processor and multi-processor
    host, i.e., SMP.
  • Investigate open known issues or shortcomings
    with NAPI (other than expected latency at low
    traffic rate) and critique proposed solutions in
    the literature.
  • These shortcomings include rotten packets and
    poor network performance under heavy CPU-bound
    applications.
  • More importantly, investigate how our proposed
    solution of hybrid scheme will resolve these
    known open issues.

25
Phase II
  • Modify, test, and recompile the code of Linux 2.6
    to implement our proposed hybrid scheme and the
    scheme to forecast the traffic arrival rate. In
    addition the code has to handle solutions to
    rotten packets and the problem of poor
    performance of network stack under a system
    heavily loaded with CPU-bound applications.
  • Learn how to use the IXIA 400T traffic
    generator/analyzer. Configure simple experiment
    of generating and receiving packets.
  • Identify the proper cliff point for the system.
    This can be accomplished only by determining the
    interrupt overhead and protocol processing time.
    The interrupt overhead and protocol processing
    time will be determined using measurement.
  • Using IXIA or some other technique, devise a
    generic and useful way to measure interrupt
    overhead. Determine the distribution of the
    interrupt overhead.
  • Using IXIA or some other technique, devise a way
    to measure protocol processing at OS level.
    Determine the distribution of kernels protocol
    processing.

26
Phase II contd
  • Using IXIA 400T and a PC with Linux 2.6 and NAPI
    enabled, measure and plot the following
    performance metrics
  • Packet forwarding latency
  • Packet forwarding throughput
  • CPU utilization with packet forwarding
  • The above experiment will consider the following
    different configurations and conditions
  • Different packet sizes
  • Traffic distribution Poisson vs. bursty
  • Traffic reception and transfer on a single NIC
  • Traffic reception and transfer on multiple NICs
  • Using IXIA 400T and a PC with our proposed hybrid
    scheme, do the same performance measurements as
    in Task 7 and Task 8.
  • Plot and compare performance of NAPI and our
    proposed hybrid scheme. Make proper conclusions.
  • Compare and evaluate the performance of our
    solutions for NAPI shortcomings of rotten packets
    and poor network performance under CPU-bound
    applications. Consider performance conditions
    and configurations of Task 7 and Task 8.

27
Phase III
  • Examine the performance impact described for
    previous tasks of (Task 6-11) under Linux support
    for SMP with dual processors motherboard.
  • Compare SMP performance to the performance when
    using only a single processor. This is a huge
    phase, as six tasks are to be carried out again.
    Its is to be noted according to RFC 2544
    recommendations that in order to obtain a
    reported value for a single performance point, a
    test has to be repeated at least 20 times and the
    reported value must be the average of these 20
    recorded values. Also the recommendations and
    guidelines state that the test has to run at
    least 20 minutes for obtaining one single
    reported value.
  • Ensure that the novel scheme preservers the order
    of packets, i.e., there is no need for packet
    re-ordering.
  • Prepare and deliver the final report

28
Work Plan
29
Personal Requirement
  • The project team will consist of the primary
    investigator and two graduate students (PhD or MS
    degree candidates).
  • The graduate students will be a computer
    science/engineering graduate and will work under
    the supervision and guidance of the PI.

30
Benefits and Utilization
  • contribute to the advancement of open-source
    operating systems (as that of Linux) by providing
    a step-up version that improves the performance
    of its networking subsystem to suit Gigabit
    network traffic.
  • This will lead to having better Linux-based
    routers, firewalls, servers, and proxies.
  • utilize previously theoretical work of 24 to
    devise a new hybrid interrupt handling scheme to
    improve the networking performance of Linux or
    any operating systems. polling, and thereby
    minimizing the overall overhead.
  • provide adequate solutions to NAPI shortcomings
    of the current Linux 2.6 networking subsystem.

31
Benefits and Utilization -- contd
  • prove and demonstrate that the proposed hybrid
    scheme is a big enhancement in terms of
    performance form current versions when
    considering many different configurations and
    load conditions.
  • provide an algorithm and computationally
    optimized technique to forecast the traffic
    arrival rate. Such an algorithm or technique
    should have no or minimal impact on Linux
    performance.
  • provide a generic methodology and benchmark to
    identify the switching point.
  • Research community at large can benefit
    substantially from the experimental work in terms
    of methodology, testbed, experimental setup and
    configuration. The experimental methodology and
    techniques can be employed for similar systems to
    conduct performance comparison.

32
Benefits and Utilization -- contd
  • major beneficiaries may include almost all Saudi
    companies, as well as governmental and
    non-governmental institutions, that show keen
    interest in using Linux.
  • GbE deployment
  • Linux wide popularity
  • will benefit KFUPM in general and the department
    of Information and Computer Science in
    particular.
  • It is anticipated that a modified version of
    Linux that best suits Gigabit traffic will carry
    the name of KFUPM and the ICS department on it.
  • KFUPM can be seen as an active contributor to
    open-source code and community.
  • results of general interest to the research
    community will be published at key international
    conference, such as these of IEEE and ACM. Also
    it is anticipated that this research work will
    lead to publications in refereed reputable
    journals.
  • No network traffic generators or analyzers at
    KFUPM.
  • Such a project can definitely lay the ground for
    further research and development by having such
    equipment available. The equipment can be
    utilized for research.
  • Also the IT center at the university can use such
    equipment for diagnosing and troubleshooting
    network problems related to performance
    bottlenecks.

33
Budget
34
Summary
  • In this research we intend to improve the
    performance of Linux networking subsystem and
    overcome the shortcomings of NAPI.
  • The project will be of great benefit to research
    and open-source community and KUFPM, and the
    public at large
Write a Comment
User Comments (0)
About PowerShow.com