Shangri-La: Achieving High Performance from Compiled Network Applications while Enabling Ease of Programming Michael K. Chen, Xiao Feng Li, Ruiqi Lian, Jason H. Lin, Lixia Liu, Tao Liu, Roy Ju - PowerPoint PPT Presentation

About This Presentation

Title:

Shangri-La: Achieving High Performance from Compiled Network Applications while Enabling Ease of Programming Michael K. Chen, Xiao Feng Li, Ruiqi Lian, Jason H. Lin, Lixia Liu, Tao Liu, Roy Ju

Description:

Shangri-La: Achieving High Performance from Compiled Network Applications while Enabling Ease of Programming – PowerPoint PPT presentation

Number of Views:72

Avg rating:3.0/5.0

Slides: 16

Provided by: Jennifer924

Learn more at: https://cseweb.ucsd.edu

Category:

more less

Transcript and Presenter's Notes

Title: Shangri-La: Achieving High Performance from Compiled Network Applications while Enabling Ease of Programming Michael K. Chen, Xiao Feng Li, Ruiqi Lian, Jason H. Lin, Lixia Liu, Tao Liu, Roy Ju

1
Shangri-La Achieving High Performance from
Compiled Network Applications while Enabling Ease
of ProgrammingMichael K. Chen, Xiao Feng Li,
Ruiqi Lian, Jason H. Lin, Lixia Liu, Tao Liu, Roy
Ju

Discussion Prepared by Jennifer Chiang

2
Shangri-La Some Insight

A synonym for paradise
Legendary place from
James Hiltons novel Lost Horizon
Goal achieve a perfect compiler

3
Introduction

Problem Programming network processors is
challenging.
Tight memory access and instruction budgets to
sustain high line rates.
Traditionally done by hand coded assembly.
Solution Recently, researchers proposed high
level programming languages for packet
processing.
Challenge Is compiling these languages into code
as competitive as hand tuned assembly?

4
Shangri-La Compiler from 10,000 foot view

Consists of programming language, compiler, and
runtime system targeted towards Intel IXP
multi-core network processor.
Accepts packet program written in Baker.
Maximizes processor utilization
Hot code paths mapped across processing elements.
No hardware caches on target.
Delayed update software controlled caches for
frequently accessed data.
Packet handling optimizations
Reduce per packet memory access and instruction
counts.
Custom stack model
Maps stack frames to fastest levels of target
processors memory hierarchy.

5
Baker Programming Language

Backer programs are structured as a dataflow of
packets from Rx to Tx.
Module container for holding related PPFs,
wirings, support code shared data.
PPF (Packet processing functions)
C like code that performs the actual packet
processing.
Hold temporary local states access global data
structures.
CC (Communication channels)
Input and output channel endpoints of PPFs wired
together.
Asynchronous queues ordered by FIFO.

6
Baker Program Example
Module
PPF
CC
7
Packet Support

Specify protocols using Backers protocol
construct
Metadata used to store state associated with a
packet, but not contained in a packet.
Useful for storing state associated with a packet
from one PPF and used later by another PPF
Packet_handle
used to manipulate packets.

Data
Metadata
Packet_handle
8
IXP2400 Network Processor

Intel XScale core process control packets,
execute noncritical application code, handle
initialization and management of the network
processor.
8 MEs (microengines) - lightweight,
multi-threaded pipelined processors running
special ISA designed for processing packets.
4 levels of memory Local Memory, Scratch Memory,
SRAM, DRAM

DRAM
SRAM
XScale Core
Scratch Memory
Local Memory
9
Compiler Details
10
Aggregation

Throughput model t n / p x k
n number of MEs
k pipeline stage with lowest throughput
t throughput
P total number of pipeline stages
Latency of a packet through the system can be
tolerated, but minimum forwarding rates must be
guaranteed.
Maximize throughput, compiler uses pipeline or
duplicates code across multiple processing
elements.
Techniques pipelining, merging, duplication

11
Delayed-Update Software Controlled Caching

Caching candidates frequently read data
structures with high hit rates, but infrequently
written.
Updates to these structures rely only on
coherency of single atomic write to guarantee
correctness.
Reduces frequency and cost of coherency checks.
Late penalty packet delivery errors

12
PAC

Packet access combining
Packet data always stored in DRAM memory.
If every packet access mapped to DRAM access,
packet forwarding rates are quickly limited by
DRAM bandwidth.
Code Generation stage of compiler multiple
protocol field accesses combined into a single
wide DRAM access.
Same can be done for SRAM metadata accesses.

13
Stack Layout Optimization

Goal allocate as many stack frames as possible
to the limited amount of fast memory.
Stack can grow into SRAM, but has high latency
and impacts performance.
Assign local memory to procedures higher in the
program call graph.
Assign SRAM memory when Local Memory is
completely exhausted.
Utilize physical and virtual
stack pointers.

14
Experimental Results

3 benchmarks L3-Switch, Firewall, MPLS
Significant impact of PAC evident in the large
reduction in packet handling SRAM and DRAM
accesses.
Code generated by Shangri-La for all 3
successfully achieved 100 forwarding rates at
2.5Gbps, which meets the designed spec of
IXp24000.
Also, same throughput target achieved by hand
coded assembly written specifically for these
processors.

15
Conclusions

Shangri-La provides complete framework for
aggressively compiling network programs.
Reduce both instruction and memory access counts.
Achieved goal of 100 packet forwarding rate at
2.5Gbps

Write a Comment

User Comments (0)