Getting Real, Getting Dirty, without getting real dirty - PowerPoint PPT Presentation

1 / 42

About This Presentation

Title:

Getting Real, Getting Dirty, without getting real dirty

Description:

Storage Allocation for Real Time. Fast is good but predictable is required ... Free-list segregated by size. All requests input to the system are rounded up to ... – PowerPoint PPT presentation

Number of Views:92

Avg rating:3.0/5.0

Slides: 43

Provided by: roncy

Category:

more less

Transcript and Presenter's Notes

Title: Getting Real, Getting Dirty, without getting real dirty

1
Fast and Bounded-Time Storage Allocation
Funded by the National Science Foundation under
grant ITR-008124
Steven M. Donahue
Matt Hampton, Ron K. Cytron, Mark Franklin Center
for Distributed Object Computing Department of
Computer Science Washington University http//deuc
e.doc.wustl.edu/doc/
Joint work with Krishna Kavi University of North
Texas
WMPI May 2002
2
Outline

Motivation
Background
Our approach
Analysis and results
Conclusions and future work

3
MotivationReal Time
Application
Language
Java
Timber
Ada
Operating System
lynxOS
VxWorks
QNX
Hardware
4
MotivationReal Time Java

Specify time properties
Start
Period
Cost
Deadline (relative)
Storage allocation must be predictable and
reasonably bounded

new Foo()
5
MotivationIntelligent RAM
Storage Management
IRAM
CPU
RAM
alloc, dealloc
Logic
cache
M
M
L2 cache
Data Bus
M
M
6
Storage Allocation for Real Time

Fast is good but predictable is required
Develop allocation system that provides
performance, robustness, portability, and quick
development time

7
Common Storage AllocationFree-list Algorithm

Linked list of free blocks

Return

Search for desired fit

Allocation Times for List Allocator (ns)

Allocation worst-case?
O(n) for n blocks in the list

8
Application Specific Allocator

Example suppose application allocates only
blocks of size 135, 65, 23, and 5
Have n free-list allocators, one for each size
Allocation worst-case?
O(1)

9
Application Specific Disadvantages

Not general purpose
A priori knowledge of allocated blocks
Makes code ugly and nonportable
Can require extra storage
Number of allocators times maxlive

10
Ideal Allocator

General Purpose
Minimal impact on app footprint
Ratio of worst-case and average-case close to 1
Knuths Buddy Algorithm
Overall speed is as fast as possible
Hardware and optimizations

11
Knuths Buddy System

Free-list segregated by size

All requests input to the system are rounded up
to a power of 2

12
Knuths Buddy System (1)
256

System begins with one large block, 256
Example allocate a block of size 16
Three Operations
Find
Wait
Return
First Find a chunk of free storage

128
64
32
16
8
4
2
1
13
Knuths Buddy System (2)
256

Wait
Recursively subdivide

128
64
32
16
8
4
2
1
14
Knuths Buddy System (3)
256
128

Wait
Recursively subdivide

64
32
16
8
4
2
1
15
Knuths Buddy System (4)
256
128

Wait
Recursively subdivide

64
32
16
8
4
2
1
16
Knuths Buddy System (5)
256
128
64
32

Yield 2 blocks size 16

16
8
4
2
1
17
Knuths Buddy System (6)
256
128
64
32

Yield 2 blocks size 16

16
8

Return One of those blocks can be given to the
program

4
2
1
18
General Architecture
19
Finding a Block
Allocate a block of 16
Software approach
Our hardware trick
256
256
128
128
64
64
32
32
16
16
8
8
4
4
2
2
1
1
Find O( log(N) )
Find O( 1 ) in practice
20
Return
Allocate a block of 16
Software approach
Our hardware trick
64
64
32
32
16
16
new Foo()
new Foo()
App
App
Allocator
Allocator
F
R
W
F
W
R
21
Results and Analysis

Two versions of Buddy System
Reference version
Optimized version with Fast Find and Return
Experiments compared the Hardware Buddy Systems
to two other systems
Java VM 1.1.18 free-list allocator
glibc malloc black box allocator
Two suites of benchmarks
SPEC Java Benchmarks
C Malloc Benchmarks

22
Overview of Java Benchmarks
Allocation Time as a percentage of Execution Time
23
Overview of C Benchmarks
Allocation Time as a percentage of Execution Time
24
How Much Does Reference Version Save Java?

The savings that the reference version provides
are not substantial compared to application
execution
Is significantly more efficient in allocation!

25
Fast Find Results

Compare ratio of average-case to worst-case
Optimized version has much better ratio
A system using the optimized version would suffer
from less over provisioning

26
Fast Return Results

Could Fast Return affect performance?
In C Malloc Benchmarks, minimum inter-arrival
time of allocation requests is greater than Wait
time of Buddy System
Wait time could complete in parallel

inter-arrival time
27
Fast Return Results

To what extent could Fast Return affect
performance?
Factor Wait time out of allocation time
Could save 96
Affects overall application efficiency to a
limited extent

28
Results Analysis

Fast Return effectiveness depends on
inter-arrival time of allocations
Some Java allocations came too quickly

But how often?
In one benchmark, less than 15 of the requests
arrived before Wait time could complete in
parallel.

29
Conclusions

Lets revisit ideal allocator
Used without modification by any application
Does not require unreasonable amount of storage
Difference between worst-case and average-case
performance is small
Overall speed is as fast as possible

30
Future Work

Improve Wait time similar to improvements for
Find and Return
Algorithms for defragmentation heap with Buddy
System
Use allocator as a building block
Garbage Collection
Goal of IRAM (intelligent storage).

31
Questions?
32
Motivation

Bring modern, high-level languages to the arena
of real-time applications
Allow embedded developers to author applications
with high level languages
RTSJ?
IRAM?
Somehow tie in storage allocation here!

33
Overview of Java Benchmarksmaybe nuke

Compress data compression utility
Jess Java Expert System Shell
Raytrace graphics raytrace
DB Database engine
Javac java compiler
Mpegaudio Mpeg decompress
MTRT multi-threaded raytrace
Jack parser generator

34
Overview of C Benchmarksmaybe nuke

Cfrac factoring program
Gawk Gnu Awk interpreter
GS Ghostscript image interpreter
P2C Pascal to C converter
PTC Another Pascal to C converter

35
Optimizations

RKC use these terms right when you are
introducing knuth buddy. Then you dont need this
the pictures in knuth buddy will help
Think of allocation as having three parts
Find appropriate size block to return and perhaps
break apart.
Wait while block is broken apart or lists are
manipulated
Return the requested size block
Fast Find
Fast Return

36
Current Allocators

Unorganized Free-List of blocks
Organized Free-list
Segregate available blocks by size
Application Specific Allocator
Application knows what sizes of blocks it
consumes
Use a free-list for each block size

37
Worst-case Free-list Behavior

The longer the free-list, the more pronounced the
effect
No a priori bound on how much worse the
list-based scheme could get
Over provision cost on relatively unbounded worst
case behavior

Allocation Times for List Allocator (ns)
38
Organized Free-list

Example Knuths Buddy System
Free-list segregated by size
Sizes are powers of two
Allocation worst-case?
O(log(N)) where N is size of heap

39
General Architecture

32 bit general purpose registers
32 bit registers for head pointers of segregated
free lists
Shift registers to track size

40
General Architecture

Simple ALU for address calculation and
comparisons
Memory I/O registers
Controller for algorithm

41
Fast Find Results
Average Case
Worst Case

Optimized average-case suffers a little for Fast
Find compared to reference version
But worst-case times are significantly improved

42
How Much Does Reference Version Save C?

Results similar to Java results
Reference version can save a substantial amount
of allocation time, but not overall application
time

43
Finding a Block -- Fast

In FPGA, each level has a bit to indicate its
status
Suppose we want a block of size 16
Levels too small for a given request are masked
out
A leading ones detector Finds the desired level
Allocation completes by recursively subdividing
and returning the block

256
128
64
32
16
8
4
2
1
44
Fast Return

Want 16 bytes
Use Fast Find to get to the size 64 level
Return the first part of that block immediately
to the requestor
Software would have to reorganize

64
32
16
45
Fast Return

Want 16 bytes
Use Fast Find to get to the size 64 level
Return the first part of that block immediately
to the requestor
Adjustment to the structures happens in parallel
with the return

64
32
16
46
Effect of Optimizations

Fast Find improves Find portion of allocation to
O( 1 ) in practice
However, Wait time is still O(log(N))
But, Fast Return could allow Wait time to execute
in parallel to application execution

new Foo()
47
(No Transcript)

Write a Comment

User Comments (0)