HardwareSoftware Managed Scratchpad Memory for Embedded System - PowerPoint PPT Presentation

1 / 60
About This Presentation
Title:

HardwareSoftware Managed Scratchpad Memory for Embedded System

Description:

Hardware/Software Managed Scratchpad Memory for Embedded System ... Chaco (Bruce Hendrickson and Robert Leland) Metis (George Karypis and Vipin Kumar) ... – PowerPoint PPT presentation

Number of Views:179
Avg rating:3.0/5.0
Slides: 61
Provided by: a15377
Category:

less

Transcript and Presenter's Notes

Title: HardwareSoftware Managed Scratchpad Memory for Embedded System


1
Hardware/Software Managed Scratchpad Memory for
Embedded System
  • Andhi Janapsatya, Sri Parameswaran, Aleksandar
    Ignjatovic
  • School of Computer Science and Engineering
  • UNSW, Australia
  • Presented By Andhi Janapsatya

2
Outline
  • Background
  • Embedded System
  • Instruction Cache
  • Scratchpad System
  • Design
  • Implementation (Graph Partitioning)
  • Experimental Setup and Results

3
Design Challenge
  • Energy Efficient Embedded System
  • Benefits
  • Longer battery life
  • Cheaper packaging cost
  • Better performance

4
Typical Embedded System
  • Embedded Processor with Level-1 cache.

5
Typical Embedded System
  • Embedded Processor with Level-1 cache.

6
Typical Embedded System
  • Embedded Processor with Level-1 cache.

7
Typical Embedded System
  • Embedded Processor with Level-1 cache.

8
Aim of this work
  • In this work, we focus on the optimization of
    instruction memory.
  • I-cache consumes up to 27 of total Processor
    energy Montanaro, 1996.

9
Cache Architecture
  • Hardware managed Tag-RAM.
  • Automatic checking of cache hit/miss.
  • Every instruction fetch has to go via cache.

10
Instruction Cache
  • Energy consumption breakdown for instruction
    cache (direct-mapped).
  • CACTI energy estimation tools shows that 36 of
    direct-mapped cache access energy is due to
    tag-RAM access. (0.18 µm)

11
Embedded Application
  • Embedded application and embedded processor are
    known prior to execution.
  • Profiling can identify the hot-spots in
    applications.
  • We aim to take advantage of profiling
    information, knowledge of the application, and
    the processor architecture for system
    optimization.
  • We propose the use of instruction scratchpad
    memory (SPM) as a replacement of the instruction
    cache.

12
Cache Architecture
  • Hardware managed Tag-RAM.
  • Automatic checking of cache hit/miss.
  • Every instruction fetch has to go via cache.

13
SPM Architecture
  • No Tag-RAM.

14
Scratchpad System
  • Embedded Processor with SPM.

15
Scratchpad System
  • Embedded Processor with SPM.

16
Scratchpad System
  • Embedded Processor with SPM.

17
Scratchpad System
  • Embedded Processor with SPM.

FAST
SLOW
18
Scratchpad System
  • Embedded Processor with SPM.

19
Scratchpad System
  • Embedded Processor with SPM.

20
Scratchpad System
  • Embedded Processor with SPM.

21
Scratchpad System
  • Embedded Processor with SPM.

22
Motivational Example
  • Basic blocks A, B, and C are hot-spots.
  • Prog size 4 bb, Cache size 2 bb.
  • Approximately 50 less memory access.

23
Scratchpad System
  • Less energy cost per SPM access compared to a
    cache access. (no tag-checking).
  • Reduce pollution of instruction within the SPM.
    (Less memory access compared to a cache system).

24
Scratchpad System
  • Pre-determine whether to fetch an instruction
    from SPM or memory.
  • Requires a mechanism to determine when to load to
    scratchpad.

25
Existing Scratchpad Systems
  • Panda 1997 and Avissar2002 presented schemes
    for static management of data SPM.
  • Kandemir2002 introduced dynamic management of
    SPM for data memory.
  • Udayakumaran2003 improve the dynamic management
    scheme for data SPM.

26
Existing Scratchpad Systems
  • Static management means content of SPM does not
    change during program run-time.
  • Dynamic management scheme means content of SPM
    are updated during program run-time.
  • Steinke 2002 and Angiolini 2003 presented
    scheme for static instruction scratchpad memory.

27
Existing Scratchpad Systems
  • Steinke 2002 presented a dynamic management for
    instruction scratchpad memory.
  • Series of load and store instruction are added to
    copy basic blocks into the I-SPM.

28
Scratchpad System
  • Use a special instruction, called SMI, to
    initiate the process of loading instruction into
    the scratchpad memory.
  • SMI Instruction format.

29
SPM Controller
  • SMI initiates the SPM Controller
  • BBT stores information of where to copy in I-SPM.

30
SPM Controller
  • SMI being executed.
  • CPU pass operand of SMI to SPM controller.

31
SPM Controller
  • SPM controller send signal to stall CPU while
    copying process is in progress.

32
SPM Controller
  • Basic blocks are copied from instruction memory
    to the I-SPM by the Memory Controller.

33
Scratchpad System
  • Each instruction is either executed from the SPM
    or from the memory.
  • Copy processes are performed by inserted SMI
    within the program.

34
Scratchpad System
  • Two Questions
  • Where should custom instructions (SMI) be
    inserted?
  • Which basic blocks are to be copied by each SMI?

35
Embedded System
  • Profile the application to obtain
  • Instruction execution frequency.
  • Instruction execution path.
  • Embedded Processor
  • Scratchpad size.

36
Control Flow Graph
  • Each vertex represents a basic block.
  • Each edge represents the execution frequency for
    each path.

37
Insertion of SMI
  • SMI is executed each time the edge is executed.

38
Graph Partitioning
  • Partition the graph into sub-graphs such that the
    edges connecting the sub-graphs are minimal
    frequency.
  • Edges connecting sub-graphs indicate position
    where SMI can be inserted.

39
Graph Partitioning
  • Size of each sub-graphs should be less than the
    size of the scratchpad.

40
Graph Partitioning
  • Existing Graph Partitioning tools
  • Chaco (Bruce Hendrickson and Robert Leland)
  • Metis (George Karypis and Vipin Kumar)
  • All existing tools require the user to specify
    how many parts a graph is to be partitioned into.

41
Graph Partitioning
  • Remove least executed edges to induce the
    creation of sub-graphs.
  • Repeat until total size of each sub-graph is less
    than the scratchpad size.

42
Graph Partitioning
43
Graph Partitioning
44
Graph Partitioning
45
Graph Partitioning
46
Graph Partitioning
47
Graph Partitioning
48
Graph Partitioning
49
Graph Partitioning
  • Where should custom instruction (SMI) be
    inserted?
  • Green edges indicate locations for inserting SMI.

50
Graph Partitioning
  • Which basic blocks are to be copied by each SMI?
  • SMI is used to copy basic blocks in the sub-graph
    that the edge is pointing to.

51
Graph Partitioning
  • Select which sub-graphs are to be executed from
    SPM and memory.

52
Graph Partitioning
  • Calculate the energy cost of executing sub-graphs
    from SPM plus the copying energy cost.
  • Calculate energy cost of executing sub-graphs
    from DRAM.

53
Experimental Setup
54
Cost adding SMI
55
Cost of SPM Controller
  • At 500MHz, 1.8V, 0.18 µm technology - average
    power consumption of 2.94mW.
  • Synthesis result on a 800K gates Xilinx Virtex
    FPGA shows that the controller occupies less than
    1 of total FPGA area, while the processor
    occupies 83.

56
Performance (SPM / Cache)
  • Negative result is due to SPM capacity miss.

57
Energy (SPM / Cache)
58
Conclusion
  • Profiling allows frequently executed
    code-segments to be identified within embedded
    application.
  • Usage of SPM as replacement of I-cache can
    minimize energy consumption of embedded system.
  • Experimental results show 51 energy reduction
    and 53 performance improvement for embedded
    systems with SPM when compared to systems with
    I-cache.

59
  • Thank You

60
SPM Capacity Miss
Write a Comment
User Comments (0)
About PowerShow.com