HW/SW Co-design - PowerPoint PPT Presentation

About This Presentation
Title:

HW/SW Co-design

Description:

... high-performance, high clock frequency system modules such ... Provides a common decrementing prescaler (clocked by the system clock) and decrementing timers ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 39
Provided by: bet126
Category:
Tags: clocked | design

less

Transcript and Presenter's Notes

Title: HW/SW Co-design


1
HW/SW Co-design
  • Lecture 4
  • Lab 2 Passive HW Accelerator Design

Course material designed by Professor Yarsun Hsu,
EE Dept, NTHU RA Yi-Chiun Fang, EE Dept, NTHU
2
Outline
  • Introduction to AMBA Bus System
  • Passive Hardware Design
  • Interrupt Service Routine
  • Environment Configuration
  • Co-designed System with GHDL Simulation
  • Co-designed System on FPGA

3
INTRODUCTION TO AMBA BUS SYSTEM
4
AMBA 2.0 Bus System (1/7)
  • Established by ARM
  • Advanced High-performance Bus (AHB)
  • For high-performance, high clock frequency system
    modules such as embedded processor, DMA
    controller, and memory controller
  • Advanced Peripheral Bus (APB)
  • Optimized for minimal power consumption and
    reduced interface complexity to support
    peripheral functions
  • For more details, please refer to the following
    documents
  • AMBA 2.0 Specification
  • Introduction to AMBA Bus System
  • GRLIB AHBCTRL - AMBA AHB controller with
    plugplay support

5
AMBA 2.0 Bus System (2/7)
Slave on AHB The only master on APB
6
AMBA 2.0 Bus System (3/7)
  • AMBA AHB is designed to be used with a central
    multiplexor interconnection scheme
  • Avoids tri-state bus

7
AMBA 2.0 Bus System (4/7)
  • An AHB transfer consists of two distinct sections
  • The address phase, which lasts only a single
    cycle
  • The data phase, which may require several cycles
  • This is achieved using the HREADY signal

8
AMBA 2.0 Bus System (5/7)
  • A slave may insert wait states into any transfer
  • For write operations, the bus master will hold
    the data stable throughout the extended cycles
  • For read transfers, the slave does not have to
    provide valid data until the transfer is about to
    complete

wait states
9
AMBA 2.0 Bus System (6/7)
  • GRLIB implements AMBA AHB with slight
    modifications
  • Please refer to the GRLIB User's Manual and GRLIB
    IP Cores Manual for detailed information

10
AMBA 2.0 Bus System (7/7)
  • The GRLIB implementation of AHB includes a
    mechanism to provide plugplay support
  • The implementation is located at
    grlib-gpl-1.0.19-b3188/lib/grlib/amba/
  • The configuration record from each AHB unit is
    sent to the AHB bus controller via the HCONFIG
    signal

identification of attached units
interrupt routing
address mapping of slaves
type ahb_config_type is array (0 to NAHBCFG-1) of
amba_config_word
11
PASSIVE HARDWARE DESIGN
12
Passive HW Accelerators
  • The accelerator (bus slave) does not actively
    send signals to the bus
  • It only responds to the master
  • The master gives commands to the slave via its
    control registers and probes its status registers

master
slave
13
Passive 1-D IDCT HW Acc. (1/4)
  • A simple 2-stage design
  • Gate delay
  • Stage 1 1 mult
  • Stage 2 3 add
  • Action register
  • Write 1 to start, resetto 0 automatically by
    theaccelerator when done
  • Mode register
  • Row/column mode
  • No wait states
  • Immediate response

action
mode
14
Passive 1-D IDCT HW Acc. (2/4)
  • Data packing
  • Since the 8x8 blocks are of type short (16-bit),
    each value occupies only half of the data bus
    (32-bit)
  • We pack two values together to increase data bus
    utilization and reduce the communication overhead
  • The action bit and mode bit are also packed
    together

31
0
1
2
action
mode
UNUSED
15
Passive 1-D IDCT HW Acc. (3/4)
  • 1-D IDCT calculation
  • STEP1 Write Y registers (4 transfers)
  • STEP2 Write mode bit action bit
  • STEP3 Poll the action bit
  • STEP4 Read x registers after action bit reset

16
Passive 1-D IDCT HW Acc. (4/4)
static void hw_idct_1d(short dst, short src,
unsigned int mode) long long_ptr (long
)src Y_array_base0 long_ptr0
Y_array_base1 long_ptr1 ...
c_reg (long)((mode ltlt 1) 0x1) while
(c_reg 0x1) /busy waiting loop/
dst 0 ((short )x_array_base)0 dst 8
((short )x_array_base)1 ...
17
INTERRUPT SERVICE ROUTINE
18
GRLIB GPTIMER (1/2)
  • General Purpose Timer Unit
  • Timers are present in almost any electronic
    device which needs timing functions (e.g.
    timekeeping time measurement)
  • Acts as a slave on AMBA APB
  • Provides a common decrementing prescaler (clocked
    by the system clock) and decrementing timers
  • Capable of assertinginterrupt on timerunderflow
  • We initialize timer 2 for1ms resolution (i.e.
    aninterrupt will be assertedevery 1ms)

19
GRLIB GPTIMER (2/2)
  • Please refer to the GRLIB IP Cores Manual for
    detailed information

20
eCos ISR (1/3)
  • When an interrupt occurs, the processor jumps to
    a specific address for execution of the Interrupt
    Service Routine (ISR)
  • One of the key concerns in embedded systems with
    respect to interrupts is latency, which is the
    interval of time from when an interrupt occurs
    until the ISR begins to execute

interrupt latency
21
eCos ISR (2/3)
  • Basic API for implementing ISR
  • Please refer to the eCos Reference Manual for
    detailed information

include ltcyg/kernel/kapi.hgt void
cyg_interrupt_create(cyg_vector_t vector,
cyg_priority_t priority, cyg_addrword_t data,
cyg_ISR_t isr, cyg_DSR_t dsr, cyg_handle_t
handle, cyg_interrupt intr) void
cyg_interrupt_delete(cyg_handle_t
interrupt) void cyg_interrupt_attach(cyg_handle_t
interrupt) void cyg_interrupt_detach(cyg_handle_
t interrupt) void cyg_interrupt_acknowledge(cyg_v
ector_t vector) void cyg_interrupt_mask(cyg_vecto
r_t vector) void cyg_interrupt_unmask(cyg_vector_
t vector)
22
eCos ISR (3/3)
  • An ISR is a C function which takes the following
    form
  • An ISR should complete as soon as possible

cyg_uint32 isr_function(cyg_vector_t vector,
cyg_addrword_t data) ... / do the service
routine / return CYG_ISR_HANDLED
23
Program Profiling (1/2)
  • We use GPTIMER for time measurment
  • Every time the timer asserts an interrupt, the
    timer ISR will increase a global variable
    time_tick

cyg_uint32 timer_isr(cyg_vector_t vector,
cyg_addrword_t data) unsigned long
time_tick (unsigned long ) data
(time_tick) cyg_interrupt_acknowledge(vec
tor) return CYG_ISR_HANDLED
24
Program Profiling (2/2)
  • We record the latency of every function block by
    monitoring the time_tick variable

void func() unsigned long local_timer
time_tick ... time_elapsed
(time_tick - local_timer)
25
ENVIRONMENT CONFIGURATION
26
Build SW Application
  • Copy the files in lab_pkg/lab2/sw to your
    original Lab 1 directory
  • Replace the Makefile and modify the path for
    ECOSDIR in Makefile
  • Type make to build
  • -D_HW_ACC_ flag will link the co-designed version
    of hw_idct_2d() in idct_hw.c with the testbench
  • Without this flag, hw_idct_2d() will be identical
    to sw_idct_2d()
  • -D_PROFILING_ flag will enable profiling using
    timer interrupt, and report the results in the end

27
Install IDCT Accelerator
  • Copy lab_pkg/lab2/hw/devices.vhd to
    grlib-gpl-1.0.19-b3188/lib/grlib/amba/ and
    replace the original file
  • Copy lab_pkg/lab2/hw/libs.txt and the whole
    lab_pkg/lab2/hw/esw folder to grlib-gpl-1.0.19-b31
    88/lib/
  • The 1-D IDCT passive accelerator is located at
    lab_pkg/lab2/hw/esw/idct_acc/idct_1x8.vhd
  • Copy lab_pkg/lab2/hw/leon3mp.vhd to
    grlib-gpl-1.0.19-b3188/designs/leon3-gr-xc3s-1500/
    and replace the original file

28
CO-DESIGNED SYSTEM WITH GHDL SIMULATION
29
GHDL Simulation (1/6)
  • We compile our program as a virtual SDRAM for
    LEON3 processor
  • LEON3 will fetch the instructions and perform the
    corresponding operations
  • All the hardware signals can be recorded and
    dumped by GHDL

30
GHDL Simulation (2/6)
  • In order to perform GHDL simulation, we disallow
    our program to link with eCos
  • Remove -D__ECOS -I(ECOSDIR)/include from
    CFLAGS
  • Remove -Ttarget.ld, -nostdlib, -L(ECOSDIR)/lib
    from LFLAGS
  • Remove D_PROFILING_ flag
  • You can remove -D_VERBOSE_ for faster simulation
  • You can modify the NUM_BLKS macro in idct_test.c
    to reduce the number of testbench iterations
  • Type make to build
  • You should see a file named sdram.srec

31
GHDL Simulation (3/6)
  • Start Cygwin
  • cd grlib-gpl-1.0.19-b3188/designs/leon3-gr-xc3s-15
    00/
  • make distclean
  • make soft
  • Copy sdram.srec webuilt into this directoryand
    replace theoriginal one
  • make ghdl
  • You can check forsyntax errors throughGHDL

32
GHDL Simulation (4/6)
  • Type ./testbench.exe --vcdwaveform.vcd after
    compilation to begin simulation
  • You should see an AHB slave with Unknown vendor
    appear, which is our IDCT accelerator

33
GHDL Simulation (5/6)
  • The dump file waveform.vcd can be viewed
    on-the-fly using GTKWave
  • Drag waveform.vcd and drop it over the
    gtkwave.exe icon to open
  • You can also use Windows cmd to open
  • File ? Reload Waveform in GTKWave to update the
    dump file

34
GHDL Simulation (6/6)
stage1
stage2
addr phase
data phase
probecontrol reg
35
CO-DESIGNED SYSTEM ON FPGA
36
Build FPGA Bitstream (1/2)
  • Type make ise tee ise_log under
    grlib-gpl-1.0.19-b3188/designs/leon3-gr-xc3s-1500/
    after you install the accelerator
  • It is strongly suggested that you verify the
    hardware with GHDL simulation first
  • It is also suggested that you take a look at
    ise_log for more information
  • Configure your FPGA with leon3mp.bit after
    generating the bitstream

37
Build FPGA Bitstream (2/2)
  • After entering GRMON, check the system
    configuration using info sys
  • You should see a device with Unknown vendor
    appear

38
Profiling Results
  • Build the program with -D_PROFILING_ flag on
  • Compare the computation results of sw_idct_2d()
    and hw_idct_2d()
  • Compare thecomputationresults withand
    without-D_VERBOSE_flag
Write a Comment
User Comments (0)
About PowerShow.com