Efficient Dynamic Heap Allocation of Scratch-Pad Memory - PowerPoint PPT Presentation

About This Presentation
Title:

Efficient Dynamic Heap Allocation of Scratch-Pad Memory

Description:

Efficient Dynamic Heap Allocation. of Scratch-Pad Memory ... Application of the charcoal ?lter to a 1024x768 Jpeg image using ImageMagick. ogg ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 32
Provided by: lynn66
Category:

less

Transcript and Presenter's Notes

Title: Efficient Dynamic Heap Allocation of Scratch-Pad Memory


1
Efficient Dynamic Heap Allocationof Scratch-Pad
Memory
Carnegie Trust for the Universities of Scotland
  • Ross McIlroy, Peter Dickman and Joe Sventek

2
Scratch-Pad Memory Allocator
  • SMA A dynamic memory allocator targeting
    extremely small memories
  • (lt 1MB in size)
  • Why target such tiny memories?
  • Why provide dynamic memory allocation for such
    small memories?

3
Outline
  • Rational for SMA
  • SMA Approach
  • Results
  • Concurrent SMA
  • Conclusion / Future work

4
Outline
  • Rational for SMA
  • SMA Approach
  • Results
  • Concurrent SMA
  • Conclusion / Future work

5
What Tiny Memories?
  • Embedded Systems
  • Sensor Network Motes
  • Vehicular Devices
  • Scratch-Pad Memories
  • Network Processors
  • Heterogeneous Multi-Core Processors

6
Scratch-Pad Memories
  • Memory structured as a hierarchy
  • Small fast memories, large slow memories
  • Usually hidden by hardware caches
  • Some processor architectures employ scratch-pad
    memories instead
  • Similar size and speed as caches, but
    explicitlyaccessible by software
  • Examples
  • IBM Cell processor
  • Intel IXP network processors
  • Intel PXA mobile phone processors

7
Why Dynamic Management?
  • Developers want as much useful data in the fast
    Scratch-Pad memory as possible
  • They dont want to deal with the fragmented
    memory hierarchy

Manual Static
Developer ease ? ?
Make full use of Scratch-Pad ? ?
Dynamic
?
?
8
Why SMA?
SMAmalloc
40
297
72.8
52.4
Resource Doug Lea malloc
State Memory (bytes) 516
Code Memory (instructions) 1634
Avg. Alloc Time (cycles) 70.7
Avg. Free Time (cycles) 95.2
Managing 4kB Scratch-Pad memory on an Intel IXP
processor
9
Outline
  • Rational for SMA
  • SMA Approach
  • Results
  • Concurrent SMA
  • Conclusion / Future work

10
Basic Approach
  • By default represent memory coarsely as a series
    of fixed size blocks
  • Can employ a very simple bitmap based allocation
    / free algorithm
  • When required, split blocks into variable sized
    regions
  • Prevents excessive internal fragmentation

11
Large Block Allocation
  • Each block in memory represented by a bit in a
    free-block bitmap

1
1
1
1
rem_blocks blocks_bm mask next_pos
ffs(rem_blocks)
in_use mask blocks_bm next pos
fls(in_use) 1
12
Small Region Allocation
  • Unused parts of an allocated block can be reused
    by sub-block sized allocations
  • Blocks are split into power of two sized regions,
    in a Binary Buddy type approach
  • Free regions are stored in per-size free lists

13
Coalescing Freed Regions
  • We wanted to avoid boundary tags
  • Instead the orderly way in which regions are
    split is exploited
  • A word sized coalesce tag stores the coalesce
    details for all regions in a block

1
14
Deferred Coalescing
  • SMA (CAM)
  • Any size can have coalescing deferred
  • Content addressable memory used to associate
    thesize of deferred coalesced regions with the
    regionsthemselves
  • SMA (LM)
  • Sizes which coalescing can be deferred chosen
    atcompile time
  • Deferred regions stored in an array in local
    memory

15
Outline
  • Rational for SMA
  • SMA Approach
  • Results
  • Concurrent SMA
  • Conclusion / Future work

16
Experimental Setup
  • Intel IXP 2350
  • Network processor
  • 4 microengine cores with 4kB local scratch-pad
    each
  • Access to another 16kB of shared scratch-pad
  • Compared against Doug Leas malloc

a2p Conversion of a 15kB text file to postscript
gcc Compilation of the file combine.c in the gcc source, using gcc
gst Ghostscript extraction of a 682kB postscript file
cvt Application of the charcoal ?lter to a 1024x768 Jpeg image using ImageMagick
ogg Encoding of a 20 second wav file using the ogg encoder
pyt Execution of the python example file md5driver.py
tar Archive and gzip compression of 27 files in 4 directories into a 1Mb archive
17
Allocation Performance
18
Free Performance
19
Memory Wastage
20
Memory Wastage
21
Outline
  • Rational for SMA
  • SMA Approach
  • Results
  • Concurrent SMA
  • Conclusion / Future work

22
Lock-Free Block Allocation
  • State for large blocks is stored in the
    free-block bitmap
  • A simple lock-free update algorithm can be used
    to protect this bitmap
  • Uses the test and clear primitive

0
0
0
0
0
0
Global
Test Clear
Test Clear
Atomic Set
0
0
Thread 1
Thread 2
23
Protecting Small Region Lists
  • Locks are used to protect the free-lists used for
    small size allocation
  • SMA Coarse uses one lock
  • SMA Fine uses one lock per size class
  • In SMA Fine, when regions are being coalesced,
    two locks must be held briefly

24
Concurrency Scaling
25
Outline
  • Rational for SMA
  • SMA Approach
  • Results
  • Concurrent SMA
  • Conclusion / Future work

26
Future Work
  • Provide the illusion of a single memory
  • Let runtime worry about data placement
  • Data can be annotated to give hints to the
    runtime system

27
Conclusion
  • Tiny memories need to be managed too
  • SMA is a simple and efficient algorithm for
    dynamic management of small memories
  • Fixed size block allocation is simple and has low
    state overheads
  • Splitting partially used blocks to be reused by
    small allocations limits fragmentation
  • SMA can be augmented to support concurrent
    requests from multiple cores

28
Questions?
29
16kb Management Allocation
30
16kB Management Free
31
16kB Management Waste
Write a Comment
User Comments (0)
About PowerShow.com