Shangri-La: Achieving High Performance from Compiled Network Applications while Enabling Ease of Programming Michael K. Chen, Xiao Feng Li, Ruiqi Lian, Jason H. Lin, Lixia Liu, Tao Liu, Roy Ju - PowerPoint PPT Presentation

About This Presentation
Title:

Shangri-La: Achieving High Performance from Compiled Network Applications while Enabling Ease of Programming Michael K. Chen, Xiao Feng Li, Ruiqi Lian, Jason H. Lin, Lixia Liu, Tao Liu, Roy Ju

Description:

Shangri-La: Achieving High Performance from Compiled Network Applications while Enabling Ease of Programming – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 16
Provided by: Jennifer924
Learn more at: https://cseweb.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: Shangri-La: Achieving High Performance from Compiled Network Applications while Enabling Ease of Programming Michael K. Chen, Xiao Feng Li, Ruiqi Lian, Jason H. Lin, Lixia Liu, Tao Liu, Roy Ju


1
Shangri-La Achieving High Performance from
Compiled Network Applications while Enabling Ease
of ProgrammingMichael K. Chen, Xiao Feng Li,
Ruiqi Lian, Jason H. Lin, Lixia Liu, Tao Liu, Roy
Ju
  • Discussion Prepared by Jennifer Chiang

2
Shangri-La Some Insight
  • A synonym for paradise
  • Legendary place from
  • James Hiltons novel Lost Horizon
  • Goal achieve a perfect compiler

3
Introduction
  • Problem Programming network processors is
    challenging.
  • Tight memory access and instruction budgets to
    sustain high line rates.
  • Traditionally done by hand coded assembly.
  • Solution Recently, researchers proposed high
    level programming languages for packet
    processing.
  • Challenge Is compiling these languages into code
    as competitive as hand tuned assembly?

4
Shangri-La Compiler from 10,000 foot view
  • Consists of programming language, compiler, and
    runtime system targeted towards Intel IXP
    multi-core network processor.
  • Accepts packet program written in Baker.
  • Maximizes processor utilization
  • Hot code paths mapped across processing elements.
  • No hardware caches on target.
  • Delayed update software controlled caches for
    frequently accessed data.
  • Packet handling optimizations
  • Reduce per packet memory access and instruction
    counts.
  • Custom stack model
  • Maps stack frames to fastest levels of target
    processors memory hierarchy.

5
Baker Programming Language
  • Backer programs are structured as a dataflow of
    packets from Rx to Tx.
  • Module container for holding related PPFs,
    wirings, support code shared data.
  • PPF (Packet processing functions)
  • C like code that performs the actual packet
    processing.
  • Hold temporary local states access global data
    structures.
  • CC (Communication channels)
  • Input and output channel endpoints of PPFs wired
    together.
  • Asynchronous queues ordered by FIFO.

6
Baker Program Example
Module
PPF
CC
7
Packet Support
  • Specify protocols using Backers protocol
    construct
  • Metadata used to store state associated with a
    packet, but not contained in a packet.
  • Useful for storing state associated with a packet
    from one PPF and used later by another PPF
  • Packet_handle
  • used to manipulate packets.

Data
Metadata
Packet_handle
8
IXP2400 Network Processor
  • Intel XScale core process control packets,
    execute noncritical application code, handle
    initialization and management of the network
    processor.
  • 8 MEs (microengines) - lightweight,
    multi-threaded pipelined processors running
    special ISA designed for processing packets.
  • 4 levels of memory Local Memory, Scratch Memory,
    SRAM, DRAM

DRAM
SRAM
XScale Core
Scratch Memory
Local Memory
9
Compiler Details
10
Aggregation
  • Throughput model t n / p x k
  • n number of MEs
  • k pipeline stage with lowest throughput
  • t throughput
  • P total number of pipeline stages
  • Latency of a packet through the system can be
    tolerated, but minimum forwarding rates must be
    guaranteed.
  • Maximize throughput, compiler uses pipeline or
    duplicates code across multiple processing
    elements.
  • Techniques pipelining, merging, duplication

11
Delayed-Update Software Controlled Caching
  • Caching candidates frequently read data
    structures with high hit rates, but infrequently
    written.
  • Updates to these structures rely only on
    coherency of single atomic write to guarantee
    correctness.
  • Reduces frequency and cost of coherency checks.
  • Late penalty packet delivery errors

12
PAC
  • Packet access combining
  • Packet data always stored in DRAM memory.
  • If every packet access mapped to DRAM access,
    packet forwarding rates are quickly limited by
    DRAM bandwidth.
  • Code Generation stage of compiler multiple
    protocol field accesses combined into a single
    wide DRAM access.
  • Same can be done for SRAM metadata accesses.

13
Stack Layout Optimization
  • Goal allocate as many stack frames as possible
    to the limited amount of fast memory.
  • Stack can grow into SRAM, but has high latency
    and impacts performance.
  • Assign local memory to procedures higher in the
    program call graph.
  • Assign SRAM memory when Local Memory is
    completely exhausted.
  • Utilize physical and virtual
  • stack pointers.

14
Experimental Results
  • 3 benchmarks L3-Switch, Firewall, MPLS
  • Significant impact of PAC evident in the large
    reduction in packet handling SRAM and DRAM
    accesses.
  • Code generated by Shangri-La for all 3
    successfully achieved 100 forwarding rates at
    2.5Gbps, which meets the designed spec of
    IXp24000.
  • Also, same throughput target achieved by hand
    coded assembly written specifically for these
    processors.

15
Conclusions
  • Shangri-La provides complete framework for
    aggressively compiling network programs.
  • Reduce both instruction and memory access counts.
  • Achieved goal of 100 packet forwarding rate at
    2.5Gbps
Write a Comment
User Comments (0)
About PowerShow.com