Dynamic%20FPGA%20Routing%20for%20Just-in-Time%20Compilation - PowerPoint PPT Presentation

About This Presentation
Title:

Dynamic%20FPGA%20Routing%20for%20Just-in-Time%20Compilation

Description:

Also with the Center for Embedded Computer Systems at UC Irvine. This work was supported in part by the ... Transmeta Crusoe & Efficeon. Dynamic code morphing ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 26
Provided by: roma74
Category:

less

Transcript and Presenter's Notes

Title: Dynamic%20FPGA%20Routing%20for%20Just-in-Time%20Compilation


1
Dynamic FPGA Routing for Just-in-Time Compilation
  • Roman Lyseckya, Frank Vahida, Sheldon X.-D. Tanb
  • aDepartment of Computer Science and Engineering
  • bDepartment of Electrical Engineering
  • University of California, Riverside
  • Also with the Center for Embedded Computer
    Systems at UC Irvine
  • This work was supported in part by the National
    Science Foundation, the Semiconductor Research
    Corporation, and a Department of Education GAANN
    fellowship

2
IntroductionJust-in-Time Compilation has Become
Commonplace
  • Just-in-Time Compilation
  • Modern Pentium processors
  • Dynamically translate instructions onto
    underlying RISC architecture
  • Transmeta Crusoe Efficeon
  • Dynamic code morphing
  • Translate x86 instructions to underlying VLIW
    processor
  • Interpreted languages
  • Distribute SW as processor independent
    bytecode/source
  • SW typically executed on a virtual machine
  • JIT compile bytecode to processors native
    instructions
  • Java, Python, etc.

3
IntroductionJust-in-Time Compilation also
Performs Optimization
  • Dynamic optimizations are increasingly common
  • Dynamically recompile binary during execution
  • Dynamo Bala, et al., 2000 - Dynamic software
    optimizations
  • Identify frequently executed code segments
    (hotpaths)
  • Recompile with higher optimization
  • BOA Gschwind, et al., 2000 - Dynamic optimizer
    for Power PC
  • Advantages
  • Transparent optimizations
  • No designer effort
  • No tool restrictions
  • Adapts to actual usage
  • Speedups of up 20-30 -- 1.3X
  • JIT compilation operates on software binaries

4
IntroductionBut Todays Binaries are More than
just Software
5
IntroductionJust-in-Time FPGA Compilation?
  • JIT FPGA compilation
  • Idea standard binary for FPGA
  • Similar benefits as standard binary for
    microprocessor
  • Portability, transparency, standard tools
  • Embedded JIT compilation tools optimized for each
    FPGA

6
IntroductionOne Use of JIT FPGA Compilation
CableTV Company
7
IntroductionOne Use of JIT FPGA Compilation
CableTV Company
8
IntroductionOne Use of JIT FPGA Compilation
CableTV Company
9
IntroductionAnother Use - Warp Processors
(Dynamic HW/SW Partitioning)
Profiler
µP
I
D
Warp Config. Logic Architecture
Dynamic Part. Module (DPM)
Lysecky/Vahid, DATE04 Stitt/Lysecky/Vahid
DAC03 Stitt/Vahid, ICCAD02
10
IntroductionAnother Use - Warp Processors
(Dynamic HW/SW Partitioning)
Profiler
ARM
I
D
WCLA
DPM
Lysecky/Vahid, DATE04 Stitt/Lysecky/Vahid,
DAC03 Stitt/Vahid, ICCAD02
11
IntroductionAll that CAD on-chip?
  • CAD people may first think Just-in-Time FPGA
    compilation is absurd
  • CAD tools are extremely complex
  • Require long execution times on power desktop
    workstations
  • Require very large memory resources
  • Usually require GBytes of hard drive space
  • Costs of complete CAD tools package can exceed 1
    million
  • All that CAD on-chip?

12
Simultaneous FPGA/CAD Design
  • Careful simultaneous design of configurable logic
    fabric and CAD tools
  • Analyze architectural features as to their
    impacts on on-chip Just-in-Time CAD tools
  • Fast execution time
  • Very low data memory
  • Produce reasonable (good) hardware circuits

13
Simultaneous FPGA/CAD Design Configurable Logic
Fabric
  • Array of configurable logic blocks (CLBs)
    surrounded by switch matrices (SMs)
  • Each CLB is directly connected to a SM
  • Switch matrix connections
  • Four short wires connect adjacent SMs
  • Four long wires connect every other SM together

SM
SM
SM
CLB
CLB
SM
SM
SM
Lysecky/Vahid, DATE04
14
Simultaneous FPGA/CAD Design Combinational Logic
Block Design
  • Incorporate two 3-input 2-output LUTs
  • Corresponds to four 3-input LUTs
  • Allows for good quality circuit while reducing
    on-chip CAD tools complexity
  • Provide routing resources between adjacent CLBs
    to support carry chains

Lysecky/Vahid, DATE04
15
Simultaneous FPGA/CAD Design Switch Matrix
  • Switch Matrix
  • SM connected using eight channels per side
  • Four short channels
  • Four long channels
  • Routes wires from different side using the same
    channel
  • Each short channel is associated with single long
    channel
  • Wires are routed using a single pair of channels
    through configurable logic fabric

Lysecky/Vahid, DATE04
16
FPGA Routing
  • FPGA Routing
  • Find a path within FPGA to connect source and
    sinks of each net within our hardware circuit
  • Typically use a form of maze routing Lee, 1961
  • Routes each net using Dijkstras shortest path
    algorithm

17
FPGA Routing
  • Pathfinder Ebeling, et al., 1995
  • Introduced negotiated congestion
  • During each routing iteration, route nets using
    shortest path
  • Allows overuse (congestion) of routing resources
  • If congestion exists (illegal routing)
  • Update cost of congested resources based on the
    amount of overuse
  • Rip-up all routes and reroute all nets

2
18
FPGA Routing
  • VPR Versatile Place and Route Betz, et al.,
    1997
  • Uses modified Pathfinder algorithm
  • Increase performance over original Pathfinder
    algorithm
  • Routability-driven routing
  • Goal Use fewest tracks possible
  • Timing-driven routing
  • Goal Optimize circuit speed

19
JIT FPGA Routing
  • Riverside On-Chip Router (ROCR)
  • Represent routing nets between CLBs as routing
    between SMs
  • Resource Graph
  • Nodes correspond to SMs
  • Edges correspond to channels between SMs
  • Capacity of edge equal to the number of wires
    within the channel
  • Requires much less memory than VPR as resource
    graph is much smaller

20
JIT FPGA Routing
  • Riverside On-Chip Router (ROCR) - Global Routing
  • Based on VPRs routability-driven router
  • Utilizes similar cost model consisting of base,
    historical congestion, and current congestion
    costs
  • Routes nets between SMs using greedy, depth-first
    routing algorithm
  • Faster than traditional VPRs breadth-first
    routing method
  • Requires addition of adjustment cost to direct
    ROCR to re-route illegal nets using different
    initial routing path
  • Ignores illegal routing within SMs
  • If congestion exists, rip-up and re-route only
    the illegal routes
  • Reduces computation time during successive
    routing iterations

21
JIT FPGA Routing
  • Riverside On-Chip Router (ROCR) - Detailed
    Routing
  • Assign specific channels to each route
  • Construct routing conflict graph
  • Routes conflict if assigning same channel results
    in an illegal routing within any SM
  • Use Brelazs greedy vertex coloring algorithm
    Brelaz, 1979
  • If illegal routes exist, rip-up illegal routes
    and repeat global routing

22
Experiments Memory Usage
23
Experiments Algorithm Performance
24
Experiments Critical Path Results
But 10 shorter critical path than VPR (RD)
25
Experiments Wire Segments
26
Conclusions
  • Developed Riverside On-Chip Router (ROCR)
  • Fast, lean on-chip router for JIT FPGA
    compilation
  • Order of magnitude less memory required
  • On average 10X faster than VPRs faster routing
    algorithm
  • Produces acceptable circuit quality
  • Uses only 10 more routing resources
  • Critical path 10 shorter than VPRs
    routability-driven router
  • JIT FPGA Compilation
  • Enables development of a standard HW binary
  • Brings portability of SW design to HW designers
  • Presently requires custom FPGA fabric
  • Future work - Overhead of mapping simple fabric
    onto commercial fabric?
Write a Comment
User Comments (0)
About PowerShow.com