Configurable Soft Processor Arrays Using the OpenFire Processor - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Configurable Soft Processor Arrays Using the OpenFire Processor

Description:

1. Configurable Soft Processor Arrays Using the OpenFire ... Dual-core Opteron: 233 million transistors 'Montecito' dual-core Itanium: 1.7 billion transistors ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 20
Provided by: scra8
Category:

less

Transcript and Presenter's Notes

Title: Configurable Soft Processor Arrays Using the OpenFire Processor


1
Configurable Soft Processor Arrays Using the
OpenFire Processor
  • Stephen Craven
  • September 30, 2005
  • Configurable Computing Lab
  • 3015 Torgersen Hall
  • Virginia Tech

2
Outline
  • Motivation
  • Single Chip Multi-Processors
  • Application-Specific Instruction set Processors
  • OpenFire Processor
  • Features and Configurability
  • Performance
  • Configurable Array Example Median Image
    Filtering
  • Optimizations
  • Performance Comparisons

3
Problem Design is Difficult!
  • System complexity makes design difficult
  • Dual-core Opteron 233 million
    transistors
  • Montecito dual-core Itanium 1.7 billion
    transistors
  • Complexity growing 58 / year
  • Productivity growing only 21

Source ITRS 1999
4
QuestionHow do we use these transistors?
  • Intels approach Make the processor bigger!
  • Branch prediction
  • Larger caches
  • More functional units
  • Specialized instruction extensions

Source UC Berkeley HERC and CPUscorecard.com
Answer Not very well!
5
Gigahertz Race is Over
  • Were out of tricks!
  • Architectural improvements diminishing
  • Processor timing budget dominated by wire delays
  • Little remaining ILP to exploit
  • Intels focus now on power dissipation

Source Bennett, Xilinx Research Labs
Presentation, 2005
6
What about the Embedded World?
  • Designs comprised of collection of
  • Processors
  • Busses
  • Hardware accelerators
  • Memories
  • Interfaces
  • Etc.
  • Power and cost primary drivers

NEC uPD61126 MPEG decoder
7
Design Difficulties
  • Verification
  • Diverse skill requirements
  • Interfacing IP
  • Simulation
  • HW / SW Partitioning
  • Time-to-market pressures
  • Rapid prototyping

8
One Solution Single Chip Multiprocessors
  • Moving towards Single Chip Multi-Processors
    (SCMP) because
  • Underutilized silicon budget
  • Diminishing ROI on Instruction Level Parallelism
  • Design and verification too costly
  • SCMPs more energy efficient
  • SCMPs can leverage existing IP
  • SCMPs by nature are easily scalable
  • Fast, on-chip inter-processor communication
  • Companies with SCMP products
  • IBM, Sun, Intel, and AMD
  • PicoChip, Rapport, Boston Circuits, and Cradle
  • Xilinx and Altera

9
ASIPs An Improvement
  • Application-Specific Instruction set Processors
    (ASIP) allow
  • Optimum match of instruction set to application
  • Performance benefits approaching ASICs while
    retaining programmability
  • Architectural features customized to application
  • Significantly reduced verification and design
    effort
  • Available commercially through Tensilica and ARC
  • Complete design flows and generated custom
    toolsets
  • Academic/Research use through ASIPMeister
  • Closed source
  • GUI Only

10
Configurable Arrays
  • Merging SCMP with ASIP combines benefits of both
  • Reduced design time utilizing existing IP
  • Programmability of SCMP with performance
    improvements of ASIP
  • Attractive for applications that combine
    computation and control
  • Maintain constant, simple interface between
    processors
  • Eases optimizations
  • Provides verification points
  • FPGAs ideal platform for research and
    implementation

11
Creation Process
  • Start with homogeneous array
  • Apply optimizations
  • Instruction removal
  • Instruction-set extension
  • Datapath sizing
  • Direct ALU-to-ALU instructions
  • Custom datapath generation
  • Interfaces remain unchanged

12
Proposed Design Methodology
  • Designer extracts task level parallelism
  • Advantages
  • Starts with C
  • Every iteration results in simulated and
    implementable design
  • HW / SW partitioning avoided
  • Trade-off between run-time and performance

13
OpenFire
  • Configurable 32-bit RISC processor
  • Specialized for processor arrays
  • Instructions based on Xilinx MicroBlaze
  • Not burdened by features unused in arrays
    (interrupts, exceptions, caches, interfaces)
  • Open source
  • Released under MIT license
  • Support utilities provided (C simulator, BRAM
    loaders, etc.)

14
Performance
  • Cycle accurate with MicroBlaze except for
  • Multiply has 5 cycle latency (3 for MicroBlaze)
  • Single cycle instruction fetches (2 cycles for
    MicroBlaze)
  • 100 MHz on a Xilinx Virtex II-Pro 30 speed grade
    6
  • OpenFire 641 slices 58.47 DMIPS
  • MicroBlaze 734 slices 58.98 DMIPS
  • Performance variable depending on configuration
  • 16-bit datapath implementation reduces area to
    402 slices, speed increases to 106 MHz
  • Minimal MicroBlaze implementation (no OPB,
    division unit, barrel shifter, or cache) at 100
    MHz

15
Extensibility
  • Planned extensions include
  • Increasing number of Fast Simplex Link (FSL) bus
    I/Os
  • Fast ALU-to-FSL and FSL-to-ALU operations
  • Additional debugging capabilities

16
Case Study Image Filtering
  • 3x3 Median Image Filter written in C
  • Soft Processor Arrays created
  • Master node MicroBlaze with DDR SDRAM
  • Slave nodes OpenFires connected in ring network
    with master

17
Array Results
  • Slave processor area reduced 45 by downsizing
    datapath to 16-bits
  • Required only slight modifications to original C
    code
  • Allows more OpenFires on chip, increasing
    throughput
  • Near-linear speedup with increasing array size

18
Conclusion
  • Configurable soft processor arrays offer the best
    of SCMPs and ASIPs
  • Simplified design
  • Improved performance
  • OpenFire processor designed for use in processor
    arrays
  • Excellent performance / area
  • Highly configurable
  • Datapath width adjustment can produce noticeable
    performance improvement
  • Future goals replace processors with custom
    datapath where needed

19
Questions?
  • Comments, opinions, and suggestions are
    appreciated!
  • http//www.ccm.ece.vt.edu/scraven
Write a Comment
User Comments (0)
About PowerShow.com