Getting Real, Getting Dirty, without getting real dirty - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Getting Real, Getting Dirty, without getting real dirty

Description:

Storage Allocation for Real Time. Fast is good but predictable is required ... Free-list segregated by size. All requests input to the system are rounded up to ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 43
Provided by: roncy
Category:
Tags: dirty | getting | real | without

less

Transcript and Presenter's Notes

Title: Getting Real, Getting Dirty, without getting real dirty


1
Fast and Bounded-Time Storage Allocation
Funded by the National Science Foundation under
grant ITR-008124
Steven M. Donahue
Matt Hampton, Ron K. Cytron, Mark Franklin Center
for Distributed Object Computing Department of
Computer Science Washington University http//deuc
e.doc.wustl.edu/doc/
Joint work with Krishna Kavi University of North
Texas
WMPI May 2002
2
Outline
  • Motivation
  • Background
  • Our approach
  • Analysis and results
  • Conclusions and future work

3
MotivationReal Time
Application
Language
Java
Timber
Ada
Operating System
lynxOS
VxWorks
QNX
Hardware
4
MotivationReal Time Java
  • Specify time properties
  • Start
  • Period
  • Cost
  • Deadline (relative)
  • Storage allocation must be predictable and
    reasonably bounded

new Foo()
5
MotivationIntelligent RAM
Storage Management
IRAM
CPU
RAM
alloc, dealloc
Logic
cache
M
M
L2 cache
Data Bus
M
M
6
Storage Allocation for Real Time
  • Fast is good but predictable is required
  • Develop allocation system that provides
    performance, robustness, portability, and quick
    development time

7
Common Storage AllocationFree-list Algorithm
  • Linked list of free blocks

Return
  • Search for desired fit

Allocation Times for List Allocator (ns)
  • Allocation worst-case?
  • O(n) for n blocks in the list

8
Application Specific Allocator
  • Example suppose application allocates only
    blocks of size 135, 65, 23, and 5
  • Have n free-list allocators, one for each size
  • Allocation worst-case?
  • O(1)

9
Application Specific Disadvantages
  • Not general purpose
  • A priori knowledge of allocated blocks
  • Makes code ugly and nonportable
  • Can require extra storage
  • Number of allocators times maxlive

10
Ideal Allocator
  • General Purpose
  • Minimal impact on app footprint
  • Ratio of worst-case and average-case close to 1
  • Knuths Buddy Algorithm
  • Overall speed is as fast as possible
  • Hardware and optimizations

11
Knuths Buddy System
  • Free-list segregated by size
  • All requests input to the system are rounded up
    to a power of 2

12
Knuths Buddy System (1)
256
  • System begins with one large block, 256
  • Example allocate a block of size 16
  • Three Operations
  • Find
  • Wait
  • Return
  • First Find a chunk of free storage

128
64
32
16
8
4
2
1
13
Knuths Buddy System (2)
256
  • Wait
  • Recursively subdivide

128
64
32
16
8
4
2
1
14
Knuths Buddy System (3)
256
128
  • Wait
  • Recursively subdivide

64
32
16
8
4
2
1
15
Knuths Buddy System (4)
256
128
  • Wait
  • Recursively subdivide

64
32
16
8
4
2
1
16
Knuths Buddy System (5)
256
128
64
32
  • Yield 2 blocks size 16

16
8
4
2
1
17
Knuths Buddy System (6)
256
128
64
32
  • Yield 2 blocks size 16

16
8
  • Return One of those blocks can be given to the
    program

4
2
1
18
General Architecture
19
Finding a Block
Allocate a block of 16
Software approach
Our hardware trick
256
256
128
128
64
64
32
32
16
16
8
8
4
4
2
2
1
1
Find O( log(N) )
Find O( 1 ) in practice
20
Return
Allocate a block of 16
Software approach
Our hardware trick
64
64
32
32
16
16
new Foo()
new Foo()
App
App
Allocator
Allocator
F
R
W
F
W
R
21
Results and Analysis
  • Two versions of Buddy System
  • Reference version
  • Optimized version with Fast Find and Return
  • Experiments compared the Hardware Buddy Systems
    to two other systems
  • Java VM 1.1.18 free-list allocator
  • glibc malloc black box allocator
  • Two suites of benchmarks
  • SPEC Java Benchmarks
  • C Malloc Benchmarks

22
Overview of Java Benchmarks
Allocation Time as a percentage of Execution Time
23
Overview of C Benchmarks
Allocation Time as a percentage of Execution Time
24
How Much Does Reference Version Save Java?
  • The savings that the reference version provides
    are not substantial compared to application
    execution
  • Is significantly more efficient in allocation!

25
Fast Find Results
  • Compare ratio of average-case to worst-case
  • Optimized version has much better ratio
  • A system using the optimized version would suffer
    from less over provisioning

26
Fast Return Results
  • Could Fast Return affect performance?
  • In C Malloc Benchmarks, minimum inter-arrival
    time of allocation requests is greater than Wait
    time of Buddy System
  • Wait time could complete in parallel

inter-arrival time
27
Fast Return Results
  • To what extent could Fast Return affect
    performance?
  • Factor Wait time out of allocation time
  • Could save 96
  • Affects overall application efficiency to a
    limited extent

28
Results Analysis
  • Fast Return effectiveness depends on
    inter-arrival time of allocations
  • Some Java allocations came too quickly
  • But how often?
  • In one benchmark, less than 15 of the requests
    arrived before Wait time could complete in
    parallel.

29
Conclusions
  • Lets revisit ideal allocator
  • Used without modification by any application
  • Does not require unreasonable amount of storage
  • Difference between worst-case and average-case
    performance is small
  • Overall speed is as fast as possible

30
Future Work
  • Improve Wait time similar to improvements for
    Find and Return
  • Algorithms for defragmentation heap with Buddy
    System
  • Use allocator as a building block
  • Garbage Collection
  • Goal of IRAM (intelligent storage).

31
Questions?
32
Motivation
  • Bring modern, high-level languages to the arena
    of real-time applications
  • Allow embedded developers to author applications
    with high level languages
  • RTSJ?
  • IRAM?
  • Somehow tie in storage allocation here!

33
Overview of Java Benchmarksmaybe nuke
  • Compress data compression utility
  • Jess Java Expert System Shell
  • Raytrace graphics raytrace
  • DB Database engine
  • Javac java compiler
  • Mpegaudio Mpeg decompress
  • MTRT multi-threaded raytrace
  • Jack parser generator

34
Overview of C Benchmarksmaybe nuke
  • Cfrac factoring program
  • Gawk Gnu Awk interpreter
  • GS Ghostscript image interpreter
  • P2C Pascal to C converter
  • PTC Another Pascal to C converter

35
Optimizations
  • RKC use these terms right when you are
    introducing knuth buddy. Then you dont need this
    the pictures in knuth buddy will help
  • Think of allocation as having three parts
  • Find appropriate size block to return and perhaps
    break apart.
  • Wait while block is broken apart or lists are
    manipulated
  • Return the requested size block
  • Fast Find
  • Fast Return

36
Current Allocators
  • Unorganized Free-List of blocks
  • Organized Free-list
  • Segregate available blocks by size
  • Application Specific Allocator
  • Application knows what sizes of blocks it
    consumes
  • Use a free-list for each block size

37
Worst-case Free-list Behavior
  • The longer the free-list, the more pronounced the
    effect
  • No a priori bound on how much worse the
    list-based scheme could get
  • Over provision cost on relatively unbounded worst
    case behavior

Allocation Times for List Allocator (ns)
38
Organized Free-list
  • Example Knuths Buddy System
  • Free-list segregated by size
  • Sizes are powers of two
  • Allocation worst-case?
  • O(log(N)) where N is size of heap

39
General Architecture
  • 32 bit general purpose registers
  • 32 bit registers for head pointers of segregated
    free lists
  • Shift registers to track size

40
General Architecture
  • Simple ALU for address calculation and
    comparisons
  • Memory I/O registers
  • Controller for algorithm

41
Fast Find Results
Average Case
Worst Case
  • Optimized average-case suffers a little for Fast
    Find compared to reference version
  • But worst-case times are significantly improved

42
How Much Does Reference Version Save C?
  • Results similar to Java results
  • Reference version can save a substantial amount
    of allocation time, but not overall application
    time

43
Finding a Block -- Fast
  • In FPGA, each level has a bit to indicate its
    status
  • Suppose we want a block of size 16
  • Levels too small for a given request are masked
    out
  • A leading ones detector Finds the desired level
  • Allocation completes by recursively subdividing
    and returning the block

256
128
64
32
16
8
4
2
1
44
Fast Return
  • Want 16 bytes
  • Use Fast Find to get to the size 64 level
  • Return the first part of that block immediately
    to the requestor
  • Software would have to reorganize

64
32
16
45
Fast Return
  • Want 16 bytes
  • Use Fast Find to get to the size 64 level
  • Return the first part of that block immediately
    to the requestor
  • Adjustment to the structures happens in parallel
    with the return

64
32
16
46
Effect of Optimizations
  • Fast Find improves Find portion of allocation to
    O( 1 ) in practice
  • However, Wait time is still O(log(N))
  • But, Fast Return could allow Wait time to execute
    in parallel to application execution

new Foo()
47
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com