Title: HW/SW Co-design
1HW/SW Co-design
- Lecture 4
- Lab 2 Passive HW Accelerator Design
Course material designed by Professor Yarsun Hsu,
EE Dept, NTHU RA Yi-Chiun Fang, EE Dept, NTHU
2Outline
- Introduction to AMBA Bus System
- Passive Hardware Design
- Interrupt Service Routine
- Environment Configuration
- Co-designed System with GHDL Simulation
- Co-designed System on FPGA
3INTRODUCTION TO AMBA BUS SYSTEM
4AMBA 2.0 Bus System (1/7)
- Established by ARM
- Advanced High-performance Bus (AHB)
- For high-performance, high clock frequency system
modules such as embedded processor, DMA
controller, and memory controller - Advanced Peripheral Bus (APB)
- Optimized for minimal power consumption and
reduced interface complexity to support
peripheral functions - For more details, please refer to the following
documents - AMBA 2.0 Specification
- Introduction to AMBA Bus System
- GRLIB AHBCTRL - AMBA AHB controller with
plugplay support
5AMBA 2.0 Bus System (2/7)
Slave on AHB The only master on APB
6AMBA 2.0 Bus System (3/7)
- AMBA AHB is designed to be used with a central
multiplexor interconnection scheme - Avoids tri-state bus
7AMBA 2.0 Bus System (4/7)
- An AHB transfer consists of two distinct sections
- The address phase, which lasts only a single
cycle - The data phase, which may require several cycles
- This is achieved using the HREADY signal
8AMBA 2.0 Bus System (5/7)
- A slave may insert wait states into any transfer
- For write operations, the bus master will hold
the data stable throughout the extended cycles - For read transfers, the slave does not have to
provide valid data until the transfer is about to
complete
wait states
9AMBA 2.0 Bus System (6/7)
- GRLIB implements AMBA AHB with slight
modifications - Please refer to the GRLIB User's Manual and GRLIB
IP Cores Manual for detailed information
10AMBA 2.0 Bus System (7/7)
- The GRLIB implementation of AHB includes a
mechanism to provide plugplay support - The implementation is located at
grlib-gpl-1.0.19-b3188/lib/grlib/amba/ - The configuration record from each AHB unit is
sent to the AHB bus controller via the HCONFIG
signal
identification of attached units
interrupt routing
address mapping of slaves
type ahb_config_type is array (0 to NAHBCFG-1) of
amba_config_word
11PASSIVE HARDWARE DESIGN
12Passive HW Accelerators
- The accelerator (bus slave) does not actively
send signals to the bus - It only responds to the master
- The master gives commands to the slave via its
control registers and probes its status registers
master
slave
13Passive 1-D IDCT HW Acc. (1/4)
- A simple 2-stage design
- Gate delay
- Stage 1 1 mult
- Stage 2 3 add
- Action register
- Write 1 to start, resetto 0 automatically by
theaccelerator when done - Mode register
- Row/column mode
- No wait states
- Immediate response
action
mode
14Passive 1-D IDCT HW Acc. (2/4)
- Data packing
- Since the 8x8 blocks are of type short (16-bit),
each value occupies only half of the data bus
(32-bit) - We pack two values together to increase data bus
utilization and reduce the communication overhead - The action bit and mode bit are also packed
together
31
0
1
2
action
mode
UNUSED
15Passive 1-D IDCT HW Acc. (3/4)
- 1-D IDCT calculation
- STEP1 Write Y registers (4 transfers)
- STEP2 Write mode bit action bit
- STEP3 Poll the action bit
- STEP4 Read x registers after action bit reset
16Passive 1-D IDCT HW Acc. (4/4)
static void hw_idct_1d(short dst, short src,
unsigned int mode) long long_ptr (long
)src Y_array_base0 long_ptr0
Y_array_base1 long_ptr1 ...
c_reg (long)((mode ltlt 1) 0x1) while
(c_reg 0x1) /busy waiting loop/
dst 0 ((short )x_array_base)0 dst 8
((short )x_array_base)1 ...
17INTERRUPT SERVICE ROUTINE
18GRLIB GPTIMER (1/2)
- General Purpose Timer Unit
- Timers are present in almost any electronic
device which needs timing functions (e.g.
timekeeping time measurement) - Acts as a slave on AMBA APB
- Provides a common decrementing prescaler (clocked
by the system clock) and decrementing timers - Capable of assertinginterrupt on timerunderflow
- We initialize timer 2 for1ms resolution (i.e.
aninterrupt will be assertedevery 1ms)
19GRLIB GPTIMER (2/2)
- Please refer to the GRLIB IP Cores Manual for
detailed information
20eCos ISR (1/3)
- When an interrupt occurs, the processor jumps to
a specific address for execution of the Interrupt
Service Routine (ISR) - One of the key concerns in embedded systems with
respect to interrupts is latency, which is the
interval of time from when an interrupt occurs
until the ISR begins to execute
interrupt latency
21eCos ISR (2/3)
- Basic API for implementing ISR
- Please refer to the eCos Reference Manual for
detailed information
include ltcyg/kernel/kapi.hgt void
cyg_interrupt_create(cyg_vector_t vector,
cyg_priority_t priority, cyg_addrword_t data,
cyg_ISR_t isr, cyg_DSR_t dsr, cyg_handle_t
handle, cyg_interrupt intr) void
cyg_interrupt_delete(cyg_handle_t
interrupt) void cyg_interrupt_attach(cyg_handle_t
interrupt) void cyg_interrupt_detach(cyg_handle_
t interrupt) void cyg_interrupt_acknowledge(cyg_v
ector_t vector) void cyg_interrupt_mask(cyg_vecto
r_t vector) void cyg_interrupt_unmask(cyg_vector_
t vector)
22eCos ISR (3/3)
- An ISR is a C function which takes the following
form - An ISR should complete as soon as possible
cyg_uint32 isr_function(cyg_vector_t vector,
cyg_addrword_t data) ... / do the service
routine / return CYG_ISR_HANDLED
23Program Profiling (1/2)
- We use GPTIMER for time measurment
- Every time the timer asserts an interrupt, the
timer ISR will increase a global variable
time_tick
cyg_uint32 timer_isr(cyg_vector_t vector,
cyg_addrword_t data) unsigned long
time_tick (unsigned long ) data
(time_tick) cyg_interrupt_acknowledge(vec
tor) return CYG_ISR_HANDLED
24Program Profiling (2/2)
- We record the latency of every function block by
monitoring the time_tick variable
void func() unsigned long local_timer
time_tick ... time_elapsed
(time_tick - local_timer)
25ENVIRONMENT CONFIGURATION
26Build SW Application
- Copy the files in lab_pkg/lab2/sw to your
original Lab 1 directory - Replace the Makefile and modify the path for
ECOSDIR in Makefile - Type make to build
- -D_HW_ACC_ flag will link the co-designed version
of hw_idct_2d() in idct_hw.c with the testbench - Without this flag, hw_idct_2d() will be identical
to sw_idct_2d() - -D_PROFILING_ flag will enable profiling using
timer interrupt, and report the results in the end
27Install IDCT Accelerator
- Copy lab_pkg/lab2/hw/devices.vhd to
grlib-gpl-1.0.19-b3188/lib/grlib/amba/ and
replace the original file - Copy lab_pkg/lab2/hw/libs.txt and the whole
lab_pkg/lab2/hw/esw folder to grlib-gpl-1.0.19-b31
88/lib/ - The 1-D IDCT passive accelerator is located at
lab_pkg/lab2/hw/esw/idct_acc/idct_1x8.vhd - Copy lab_pkg/lab2/hw/leon3mp.vhd to
grlib-gpl-1.0.19-b3188/designs/leon3-gr-xc3s-1500/
and replace the original file
28CO-DESIGNED SYSTEM WITH GHDL SIMULATION
29GHDL Simulation (1/6)
- We compile our program as a virtual SDRAM for
LEON3 processor - LEON3 will fetch the instructions and perform the
corresponding operations - All the hardware signals can be recorded and
dumped by GHDL
30GHDL Simulation (2/6)
- In order to perform GHDL simulation, we disallow
our program to link with eCos - Remove -D__ECOS -I(ECOSDIR)/include from
CFLAGS - Remove -Ttarget.ld, -nostdlib, -L(ECOSDIR)/lib
from LFLAGS - Remove D_PROFILING_ flag
- You can remove -D_VERBOSE_ for faster simulation
- You can modify the NUM_BLKS macro in idct_test.c
to reduce the number of testbench iterations - Type make to build
- You should see a file named sdram.srec
31GHDL Simulation (3/6)
- Start Cygwin
- cd grlib-gpl-1.0.19-b3188/designs/leon3-gr-xc3s-15
00/ - make distclean
- make soft
- Copy sdram.srec webuilt into this directoryand
replace theoriginal one - make ghdl
- You can check forsyntax errors throughGHDL
32GHDL Simulation (4/6)
- Type ./testbench.exe --vcdwaveform.vcd after
compilation to begin simulation - You should see an AHB slave with Unknown vendor
appear, which is our IDCT accelerator
33GHDL Simulation (5/6)
- The dump file waveform.vcd can be viewed
on-the-fly using GTKWave - Drag waveform.vcd and drop it over the
gtkwave.exe icon to open - You can also use Windows cmd to open
- File ? Reload Waveform in GTKWave to update the
dump file
34GHDL Simulation (6/6)
stage1
stage2
addr phase
data phase
probecontrol reg
35CO-DESIGNED SYSTEM ON FPGA
36Build FPGA Bitstream (1/2)
- Type make ise tee ise_log under
grlib-gpl-1.0.19-b3188/designs/leon3-gr-xc3s-1500/
after you install the accelerator - It is strongly suggested that you verify the
hardware with GHDL simulation first - It is also suggested that you take a look at
ise_log for more information - Configure your FPGA with leon3mp.bit after
generating the bitstream
37Build FPGA Bitstream (2/2)
- After entering GRMON, check the system
configuration using info sys - You should see a device with Unknown vendor
appear
38Profiling Results
- Build the program with -D_PROFILING_ flag on
- Compare the computation results of sw_idct_2d()
and hw_idct_2d() - Compare thecomputationresults withand
without-D_VERBOSE_flag