A Reconfigurable Signal Processing IC with embedded FPGA and Multi-Port Flash Memory - PowerPoint PPT Presentation

About This Presentation
Title:

A Reconfigurable Signal Processing IC with embedded FPGA and Multi-Port Flash Memory

Description:

A Reconfigurable Signal Processing IC with embedded FPGA and Multi-Port Flash Memory M. Borgatti, L. Cal , G. De Sandre, B. For t, D. Iezzi, F. Lertora, G. Muzzi, – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 24
Provided by: MicheleB4
Learn more at: http://web.cecs.pdx.edu
Category:

less

Transcript and Presenter's Notes

Title: A Reconfigurable Signal Processing IC with embedded FPGA and Multi-Port Flash Memory


1
A Reconfigurable Signal Processing IC with
embedded FPGA and Multi-Port Flash Memory
M. Borgatti, L. Calì, G. De Sandre, B. Forêt, D.
Iezzi, F. Lertora, G. Muzzi, M. Pasotti, M.
Poles, P.L. Rolandi
STMicroelectronics - Central RD - Italy
2
Outline of Presentation
  • Project motivation and background
  • System architecture
  • Reconfigurable core
  • Memory subsystem
  • System performance
  • Application example embedded face recognition
    system
  • Energy efficiency, measurements
  • SoC integration and design flow
  • System 2 RTL and RTL 2 Layout
  • Summary

2
3
Project motivation and background
  • Conflicting industry trends
  • Economics of system integration
  • Even more complex SoC
  • More integration
  • Cost effectiveness and performance (per unit)
  • Increasing design complexity and risks
  • Increasing NREs
  • Shorter time-to-market and product life
  • Strong need for
  • Faster project turnaround
  • Lower risk
  • Usage of re-configurable silicon fabrics

3
4
Project motivation and background
  • Pragmatic approach proposed
  • Reconfigurable architecture
  • Joins a statically extensible processor with
    e-FPGA
  • Tight connection to Flash memory subsystem
  • Open architecture with flexible programmable I/O
  • Programmable platform approach
  • Simple model for programmers

4
5
Programmable Platform Approach
System Applications Family
System Application
Application Compilation
Platform Compilation
Config. Proc e-FPGA
Silicon process Enabling technologies
Programmable platform
5
6
System Architecture
48 kB SRAM
8KB D
8KB I
bus bridge
Extensible MPU
64 bit AHB BUS
8KB D
M/S AHB I/F
DMA FPGA Prog. I/F
FP
CP
DP
e-FPGA
INTs
Instr. Ext.
Flash Mem
Inst. Ext I/F
Buffer I/F
AHB/APB Bridge
1kB Buffer
GP I/O
64 bit APB BUS
I2C BUS
General Purpose I/O Lines
I2C Master
I/O registers
6
7
e-FPGA Purposes
  • Processor ISA extensions
  • Simplest programmers model
  • Specific interface to the MPU datapath
  • Impact on processor performance
  • Impact on processor energy efficiency
  • Efficiency limited by instruction stream decoding
  • Bus-mapped co-processor
  • Maximum benefits in speed/power
  • Flexible I/O

7
8
e-FPGA Microprocessor interface
e-FPGA Clock
Microprocessor clock
Clock Ctrl
Instruction
Other FPGA Purposes
Pipe Control
Decode
Register File
R
Instruction extension
E
Result
8
9
Flash Memory Architecture
DFT
2Mb 0
2Mb 1
2Mb 2
2Mb 3
PMA
Power Block
128-bit Memory Sub-System Crossbar
128
128
128
128
?P I/F
DP
CP
FP
64
64
32
8-bit ?P
FPGA Port
Code Port
Data Port
9
10
Flash Memory Subsystem
  • Modular approach
  • Customizable array of N independent 2Mb modules
  • 3 content-specific ports (CP, DP, FP)
  • HW support for filesystem implem. (DP)
  • Defrag
  • Compression
  • Virtual erase
  • 2Mb Module features
  • 128b I/O
  • 40ns access time (400MB/s peak throughput)
  • Power management and arbitration

10
11
System Memory Hierarchy
32-bit uP RegisterFile
AHB Bridge
64-bit AHB Bus
32-bit FPGA PI/F
  • AHB Peak Throughput
  • 800MB/s
  • e-FPGA
  • 400MB/s
  • (50MB/s sustained)
  • Total Aggregate Peak
  • 1.2GB/s

64-bit AHB
32-bit
64-bit CP I/F
64-bit DP I/F
DMA
64 bit Port CP
32-bit Port FP
64-bit Port DP
512-B Buffer
2 x 64- 1 x 32-bit Memory Port I/Fs
6x4 128-bit Crossbar
4 x Flash Memory Controller Logic
4 x 16384 x 128-bit Memory Module
11
12
Application Ex. Face Recognition
  • Target application
  • Recognize a face out of twenty
  • low-resolution images from CMOS cameras
  • Potential applications
  • Low cost smart toys
  • Advanced human-machine interfaces
  • Color CMOS camera processors
  • Image preprocessing Bayer filter
  • Face location based on Hough transform
  • Face recognition Line-Based
  • Recognition rates over 90
  • Scale-invariant
  • Tolerant to changes in illumination intensity

12
13
Processor Extension (I)
8
16


?Processor Load Unit
4-segm.
4-segm.
  • 8-issue, 8-bit L2 distance
  • Complexity
  • 23 8-bit OPS
  • 6 64-bit OPS
  • 1GOPS peak throughput
  • Distance computation
  • 10k equiv. ASIC gates
  • Mapped to e-FPGA

_
x
64-bit register

Result
13
14
Processor Extension (II)
Number
Remaind.
root
gtgt1
ltlt2
gtgt2
gtgt30
1
  • Fixed-point square root kernel
  • Complexity
  • 12 32-bit OPS
  • 2k equiv. ASIC gates
  • Mapped to e-FPGA

_
gt
2
ltlt 1
Result
14
15
Performance Processing Time _at_ 100 MHz
Algorithm Stage RISC w/ basic DSP RISC w/ basic DSP uP Ext. Speed-Up
Bayer Filter 58 msec 24.7 msec x 2.3
Edge Detection 4.5 msec 2.5 msec x 1.8
Face Detection 1.5 sec 382 msec x 4
Face Recognition (20-face database) 9.15 sec 860 msec x 10.6
Totals 10.7 sec 1.26 sec x 8.5
16
Energy Efficiency vs. Flexibility
FPGA-mapped CoProcessors
1000
Dedicated HW
uP FPGA Instructions
100
Energy Efficiency (MOPS/mW)
Energy-Flexibility Gap !
10
ASIPs, DSPs
1
Embedded Processors
0.1
Flexibility (Coverage)
from Zhang et Al., ISSCC 2000
16
17
Performance Energy Efficiency
Algorithm Stage Speed-Up Energy Gain Energy x Delay Gain
Bayer Filter x 2.3 x 1.4 x 3.2
Edge Detection x 1.8 x 0.95 x 1.7
Face Detection x 4 x 2.9 x 11.6
Face Recognition (20-face database) x 10.6 x 9 x 95.4
Totals x 8.5 x 6.7 x 57
17
18
Functional model (untimed)
Partitioning / I/F Synthesis / Refinement
uP ISS
Cycle Accurate Simulation Performance Analysis
Libraries HW/SW
VHDL (e-FPGA)
Inst.Ext. Verilog
HW (RTL) uP, AHB/APB Bus Peripherals
C
Soft Hardware (eFPGA)
SW Apps
eFPGA mapping
eFPGA HARD MACRO
SoC Integration
18
19
CPU core, IPs
Interface RTL code
Flash RAM
eFPGA core
Inst. Ext.
Coproc.
I/O I/F
Synthesis
Floorplanning / PR
Synthesis
Static Timing Analysis, Dynamic Verification
Con.
Mapping (PR)
FPGA Timing DB
Bit-stream
Netlist Timing Database
Static Timing Analysis (SoC eFPGA)
Silicon fab
19
20
Chip Layout
DFT
1MB FLASH Memory
Embedded FPGA
88 KB I D
TAGS
Process 0.18um CMOS 2P/6M Embedded Flash
Flash Memory (x4) 256kB x 9 sectors 128-bit word 1MB/s write through. 400MB/s read through.
SRAM Memory Main 48kB (64-bit) I 8kB (64-bit) D 8kB (64-bit) Buffers 4x256B
Chip size 8.4 x 8.4 mm2 (e-FPGA size 8.2 mm2)
I/O 24 inputs 24 outputs (tristate) 8 bidirs
Supply 2.7-3.6V (external), 1.8V(core)
32b uP AHB APB 250k GATES
Flash Ports Buffers
uP AHB/APB
FPGA
48 KB SRAM
BUFFER
48kB SRAM
88 kB ID
20
21
Chip Performances and Power Consumption
Processor maximum speed 125MHz (WCMIL)
Reconfiguration speed 500us _at_ 100MHz clock
Chip average power consumption 300mW _at_ 100MHz, 1.8V
21
22
Summary
  • e-FPGAs allow architectural tradeoffs for
    reconfigurable embedded systems
  • Processor ISA extensions
  • Bus-mapped co-processor
  • Flexible I/O
  • Modular, content-specific, multiport e-Flash
  • Performance figures
  • Up to 10x speedup
  • Up to 9x energy reduction
  • Dynamic reconfiguration in 500 us
  • Specific design-flow for system and RTL

22
23
Acknowledgements
The authors thank all the colleagues of NVM-DP
Dept. A. Maurelli, F. Piazza and L. Fumagalli.
23
Write a Comment
User Comments (0)
About PowerShow.com