Title: ND The research group on Networks
1ND The research group onNetworks Distributed
systems
2ND activities
- ICON Interconnection Networks
- Interconnection networks are tightly
coupled/short distance networks with extreme
demands on bandwidth, latency, and delivery - Problem areas Effective routing/topologies,
fault-tolerance/dynamic reconfiguration, and
Quality of Service - VINNER End-to-end Internet communications
- Problem area Network resilience as a set of
methods and techniques that improve the user
perception of network robustness and reliability.
3ND activities
- QuA - Support of Quality of Service in component
architectures - Problem area How to develop applications that
are sensitive to QoS on a component architecture
platform and how dynamic QoS management and
adaptation can be supported - Relay Resource utilization in time-dependent
distributed systems - Problem area Reduce the effects of resource
limitations and geographical distances in
interactive distributed applications through a
toolkit of kernel extensions, programmable
subsystems, protocols and decision methods
4Assessment of Data Path Implementations for
Download and Streaming
- Pål Halvorsen1,2, Tom Anders Dalseng1 and Carsten
Griwodz1,2 - 1Department of Informatics, University of Oslo,
Norway - 2Simula Research Laboratory, Norway
5Overview
- Motivation
- Existing mechanisms in Linux
- Possible enhancements
- Summary and Conclusions
6Delivery Systems
Network
7Delivery Systems
bus(es)
8Intel Hub Architecture
- several in-memory data movements and context
switches
Pentium 4 Processor
registers
cache(s)
RDRAM
RDRAM
RDRAM
RDRAM
PCI slots
PCI slots
PCI slots
9Motivation
- Data copy operations are expensive
- consume CPU, memory, hub, bus and interface
resources (proportional to data size) - profiling shows that 40 of CPU time is consumed
by copying data between user and kernel space - gap between memory and CPU speeds increase
- different access times to different banks
- System calls make a lot of switches between user
and kernel space
10ZeroCopyData Paths
data_pointer
data_pointer
bus(es)
11Motivation
- Data copy operations are expensive
- consume CPU, memory, hub, bus and interface
resources (proportional to data size) - profiling shows that 40 of CPU time is consumed
by copying data between user and kernel - gap between memory and CPU speeds increase
- different access times to different banks
- System calls make a lot of switches between user
and kernel space - A lot of research has been performed in this area
- BUT, what is the status today of commodity
operating systems?
12Existing Linux Data Paths
13ContentDownload
bus(es)
14Content Download read / send
application
application buffer
kernel
copy
copy
page cache
socket buffer
DMA transfer
DMA transfer
- 2n copy operations
- 2n system calls
15Content Download mmap / send
application
kernel
page cache
socket buffer
copy
DMA transfer
DMA transfer
- n copy operations
- 1 n system calls
16Content Download sendfile
application
kernel
gather DMA transfer
page cache
socket buffer
append descriptor
DMA transfer
- 0 copy operations
- 1 system calls
17Content Download Results
- Tested transfer of 1 GB file on Linux 2.6
- Both UDP (with enhancements) and TCP
UDP
TCP
18Streaming
bus(es)
19Streaming read / send
application
application buffer
kernel
copy
copy
page cache
socket buffer
DMA transfer
DMA transfer
- 2n copy operations
- 2n system calls
20Streaming read / writev
application
application buffer
kernel
copy
copy
copy
page cache
socket buffer
DMA transfer
DMA transfer
- 3n copy operations
- 2n system calls
? One copy more than previous solution
21Streaming mmap / send
application
application buffer
kernel
copy
page cache
socket buffer
copy
DMA transfer
DMA transfer
- 2n copy operations
- 1 4n system calls
22Streaming mmap / writev
application
application buffer
kernel
copy
page cache
socket buffer
copy
DMA transfer
DMA transfer
- 2n copy operations
- 1 n system calls
? Three calls less than previous solution
23Streaming sendfile
application
application buffer
copy
kernel
gather DMA transfer
page cache
socket buffer
append descriptor
DMA transfer
- n copy operations
- 4n system calls
24Streaming Results
- Tested streaming of 1 GB file on Linux 2.6
- RTP over UDP
Compared to not sending an RTP header over UDP,
we get an increase of 29 (additional send call)
More copy operations and system calls required ?
potential for improvements
TCP sendfile (content download)
25Enhanced Streaming Data Paths
26Enhanced Streaming mmap / msend
application
application buffer
msend allows to send data from an mmaped file
without copy
copy
kernel
gather DMA transfer
page cache
socket buffer
append descriptor
copy
DMA transfer
DMA transfer
- n copy operations
- 1 4n system calls
? One copy less than previous solution
27Enhanced Streaming mmap / rtpmsend
application
application buffer
RTP header copy integrated into msend system call
copy
kernel
gather DMA transfer
page cache
socket buffer
append descriptor
DMA transfer
- n copy operations
- 1 n system calls
? Three calls less than previous solution
28Enhanced Streaming mmap/krtpmsend
application
application buffer
An RTP engine in the kernel adds RTP headers
copy
kernel
gather DMA transfer
RTP engine
page cache
socket buffer
append descriptor
DMA transfer
- 0 copy operations
- 1 system call
? One copy less than previous solution
? One call less than previous solution
29Enhanced Streaming rtpsendfile
application
application buffer
RTP header copy integrated into sendfile system
call
copy
kernel
gather DMA transfer
page cache
socket buffer
append descriptor
DMA transfer
- n copy operations
- n system calls
? existing solution requires three more calls per
packet
30Enhanced Streaming krtpsendfile
application
application buffer
An RTP engine in the kernel adds RTP headers
copy
kernel
gather DMA transfer
RTP engine
page cache
socket buffer
append descriptor
DMA transfer
- 0 copy operations
- 1 system call
? One copy less than previous solution
? One call less than previous solution
31Enhanced Streaming Results
- Tested streaming of 1 GB file on Linux 2.6
- RTP over UDP
mmap based mechanisms
sendfile based mechanisms
Existing mechanism (streaming)
27 improvement
25 improvement
TCP sendfile (content download)
32Conclusions
- Current commodity operating systems still pay a
high price for streaming services - However, small changes in the system call layer
might be sufficient to remove most of the
overhead - Conclusively, commodity operating systems still
have potential for improvement with respect to
streaming support - What can we hope to be supported?
- Road ahead optimize the code, make patch and
submit to kernel.org
33Questions??