Title: Offloading Multimedia Proxies using Network Processors
1Offloading Multimedia Proxies using Network
Processors
- A presentation by Øyvind Hvamstad
- 19. Nov. 2004
2Domain overview
- Distributed media on demand (MoD)
- Standarized protocols (RTSP/RTP)
- Utilizing proxies
- -with Network processing units.
OUR FOCUS
Stream setup with RTSP
Stream transport with RTP
3Multimedia stream characteristics
A one hour long MPEG-2 movie at an average
bit-rate Of 3.5 Mbps takes up 1.6 GB worth of
storage. DivX can reduce this by a factor of
6.5. Thus reducing the size to 246 MB
- High data-rates
- Requires much bandwith and storage space.
- Depending on codec, quality and length.
- Soft real-time requirements
- Percieved quality is sensitive to jitter.
- Access Patterns
- Zipf distribution (10 requested 90 of the time)
- Newly published material tend to be popular.
- Consumed from start to end or quickly aborted.
- write-once-read-many
4The MoD Proxy
- Deployed in client vicinity to
- Reduce client startup latency
- Reduce server load
- Reduce network load
- Must face the challenges of
- Many concurrent clients
- Possible high aggregate network load
- CPU intensive tasks
5MoD proxy tasks
- Forwarding data and requests
- Caching
- Protocol translation
- Transcoding
- Re-encoding
- Encryption
- General functions
- Access control
- QoS mechanisms
- Traffic engeneering
OUR FOCUS
6Traditional proxy data-path
Application (RTSP/RTP)
- Reception
- Interrupt and bus transfer
- Processing
- Memory copy
- Transmission
- Memory copy
- Processing
- Bus transfer
- Application layer forwarding
- Reception and transmission for every packet
- application processing.
User space
Kernel space
Communication system
Transport (TCP/UDP)
Network(IP)
Link
7IXP1200 Network processor and the ENP-2505 card
- IXP1200 Features
- One embedded StrongARM RISC processor
- Six Microengines
- Interfaces for external memory and I/O devices
- On-chip scratchpad memory
- ENP-2505 Devices
- 256 MB SDRAM
- 8 MB SRAM
- 8 MB FLASH ROM
- 4 Ethernet (10/100 Mbit/s)
8Exactly what?
- Offload application layer packet forwarding.
- Free cycles on the host CPU for other tasks.
- Show improvements compared to a traditional
architecture. - Reduced latency
- Provide a basis for future work in the area.
- Extensions
- Caching
- Zero copy
9Design
Cache control
lookup()
To/from server
To/from client
RTSP client
RTSP server
insert() remove() lookup()
insert() remove() lookup()
signal()
Session Mgmt.
control-plane data-plane
lookup()
fast_forward()
From server
To client
RTP server
RTP client
Cacher
write_through(async)
fetch()
10Prototype implementation
Intel ACEs
RTSP Proxy
StackACE
Linux run-time IXA run-time
Ingress coreACE
Classifer/ RTP-fwd coreACE
Egress coreACE
StrongARM Microengines
control-plane data-plane
Ingress microACE
Classifer/ RTP-fwd microACE
Egress microACE
µe 0
µe 1
11Experiments
- Measure the processing overhead during
RTP-forwarding. - Cycle precision
- Probes at different locations in the code.
- Minimal probe overhead.
Switch
Dell GX260
Darwin streaming server
rclient.py
eth0
IXP1200
Proxy
12Results
- Offloading effect
- 100 of all network traffic processed by the
StrongARM and the microengines. - Prototype performance
- Processes every RTP packet using about a tenth of
the cycles compared to a traditional
architecture. (Delay reduced from 80 µs to 8 µs
_at_ 232 Mhz)
13Extensions
- Write-through caching
- Make a multicaster by copying packets
- Use a lazy-copy strategy to reduce copy
operations pr. packet. - Send payload copy to the host.
- Zero-copy-path
- Batched transfer to the host.
- Scatter-gather DMA to assemble packet payloads in
host memory. - Large disk-requests.
14Conclusion
- Prototype relevance
- Data will always flow through the proxy
- The forwarder processes an RTP packet efficiently
compared to a traditional architecture. - Is thus an orthogonal way to improve MoD proxies.
- Low resource utilization leaves room for
extensions.
15Other applications
- The idea might be more applicable in other, more
real-time areas. - Online games
- Proxy holds game state
- Might also need to forward real-time data while
playing. - Live voice communication
- Other urgent game data
- Video conferences
- Node that handles overlay multicasting.
- Real-time data forwarded with low latency.
16Linear modulo operator
Intel Assembler macro
ANSI C function
macro Modout_z, in_x, in_y.local xm aluxm,
--, B, in_xLoop aluout_z, xm, -,
in_y brlt0End aluxm, --, B, out_z
alu--, xm, -, in_y brgt0LoopEnd aluou
t_z, --, B, xm.endlocalendm
int mod(int x, int y) int ztop z y
x if (z lt 0) z x x z if ((x - y)
gt 0) goto top return z
17Basic hash function
Intel Assembler macro
ANSI C function
macro Hashout_z, in_x, in_seed .local x
start immedstart,START alux, in_x, -, start
alu_shfx, --, B, x, gtgt1 Modout_z, x,
in_seed.endlocalendm
int hash(int x, int y) x x START x
x / 2 return x seed
18Incremental checksumming
macro IncrementCksumout_newsum, oldsum, old,
new.local sum tmp mask immedmask,
0xffff alusum, --, B, oldsum alusum, sum,
-, old alusum, sum, , new alutmp, sum,
AND, mask alu_shfsum, tmp, , sum, gtgt16
alutmp, sum, AND, mask alu_shfsum, tmp, ,
sum, gtgt16 alusum, --, B, sum aluout_newsum
, sum, AND, mask.endlocalendm
sum oldsumsum sum - oldsum sum
newtmp sum 0xffffsum tmp (sum gtgt
16)tmp sum 0xffffsum sum sum tmp
(sum gtgt 16)
19Just cant get enough, huh?
- /hom/oyvindh/thesis.pdf
- CVS repository
- pserverhic.no/cvs
- Module thesis
- User anonymous
- Pwd ltemptygt
- Should be up in a few days
- Have just moved to a new location