Title: Designing a Trace Format for Heap Allocation Events
1Designing a Trace Format forHeap Allocation
Events
- Trishul Chilimbi, Microsoft Research
- Richard Jones, University of Kent
- Ben Zorn, Microsoft Research
2Is Heap Allocation a Solved Problem?
- Yes?
- Numerous techniques, 40 years of research
- Fragmentation not an issue? (Johnstone et al.
ISMM98) - How much faster can it get?
- No!
- Arenas, regions, user-defined heaps, etc.
- Scalability of MP allocators
- Data locality
3Are our Evaluation Methods Sound?
- Heap allocation important in
- Streaming media applications
- Long-running, quasi-real-time
- Server applications (Larson Krishnan ISMM98)
- Heavy load, complex structure, multi-threaded
- Large applications (OS, word proc., etc.)
- Current benchmarks (BZs especially)
- Small, single-threaded, short-running
4Are Traces a Solution?
- Yes?
- Easy to share, portable
- Captures real behaviors, real programs under real
loads - Easy to use for experimental evaluations
- No?
- Fixed format implies potential missing info
- E.g., capturing references problematic
- Trace size a significant issue
- MP interleaving is non-deterministic
5Contributions
- HATF an allocation trace format
- Trace contents focus on important issues
- Representation is flexible, portable
- Traces are compact, processing efficient
- MetaTF a language for describing trace formats
- Raise awareness of issues
- What should be in a trace?
- Do you care about how a trace is represented?
6Assumptions and HATF Design Goals
- We assume
- Long traces (100M events) are necessary
- Consumer will read/process events sequentially
- Ease of consumption critical
- Minimal dependencies, resource requirements
- HATF design goals
- Expressiveness contents must be useful
- Compactness 10 space reduction valuable
- Flexibility allow limited extension (see
MetaTF)
7Trace Content
- Standard allocation events
- Allocate, reallocate, free
- Context
- In a specific region
- In a specific thread
- At a specific time
- Attributes allow additional info
8Trace Representation
- Fixed formats have obvious weaknesses
- Multiple address sizes (32 vs. 64-bit)
- Fields often empty (e.g., thread, heap)
- Fields have exploitable properties
- Skewed or predictable distributions of values
- Size often small, time monotonically increasing
- HATF includes dynamic metadata
- Dynamically vary field width, interpretation
9Changing Field Size with Metadata
HATF format
Fixed binary format
setWidth size 1
size
address
tag
32
0x4a0
alloc
32
0x4a0
alloc
setWidth size 2
1024
0xa10
alloc
1024
0xa10
alloc
1024
0xc10
alloc
1024
0xc10
alloc
setWidth size 1
16
0xf10
alloc
16
0xf10
alloc
10Changing Field Interpretation
HATF format
Fixed binary format
setInterp size default 32
setInterp addr stride 0 100
size
address
tag
32
100
alloc
alloc
32
200
alloc
alloc
32
300
alloc
alloc
32
400
alloc
alloc
setInterp size none, addr none,
1024
5000
alloc
1024
5000
alloc
11Representation Effectiveness
- Is HATF necessary, useful?
- Comparison
- Alternate representations
- HATF (size/time opt), fixed width binary, ASCII
- With/without gzip compression
- Applications
- Single-threaded, single-heap benchmarks
- Multi-threaded, multi-heap MS apps
- Trace size, reading/writing costs
12Trace Compression (Benchmark Avg.)
16
12.3
13Trace Compression (MS Apps, w. gzip)
14Read Processing Time (Benchmark Avg.)
15HATF Evaluation Summary
- Space
- Without compression, HATF smallest
- With compression, ASCII and HATF close
- Representing 64-bit timestamps is expensive
- Time
- Current implementation limited by I/O
- Compression overhead small by comparison
- ASCII marginally slower to decode
16Contributions
- HATF an allocation trace format
- Trace contents focus on important issues
- Representation is flexible, portable
- Traces are compact, processing efficient
- MetaTF a language for describing trace formats
- MetaTF HATF as XML HTML
- HATF reader/writer generated automatically
- Raise awareness of issues
17MetaTF Beyond HATF
- Aim to facilitate exchange of trace data sets
- Generalise HATF
- An expressive way of specifying traces
- Allow easy construction of readers and writers
- Generate readers and writers automatically from
the specification - Separate representation from content
18Component approach
- Idea Provide traces and API as a unit
- Separate representation from content
- A trace contains event types
- Each event has a concrete representation
- Reveal content hide representation
- Implementation jar files?
- Good for reader
- Simple interface, e.g. Event getNextEvent()
- Doesnt help writer
- Design trace format
- Implement interfaces
19MetaTF approach
- A trace comprises
- Document type definition (DTD)
- Trace event data
- Meta-approach say how to specify events, DTD
- Abstract syntax notations
- SGML
- Ride the XML wave
- Verbose, ASCII only
- ASN.1
- Obese, inflexible
20MetaTF DTD example
- Section heap 1 alloc (tag, size, address)
tag.value 4 size.width 4
size.interpretation none address.size
4 address.interpretation none - Metadata can change representation of event, e.g.
Metadata
alloc
2
Width
2
Metadata
alloc
2
Delta
310004
Tag
Event
Field
Interpretation
Value
21MetaTF effectiveness
- Auto-generation of readers and writers from
MetaTF DTD - Simple interface
- Class for each event type, inherited from Event
- Event getNextEvent()
- void Event.putEvent()
- Separation of content and representation to some
degree
22Architecture
Understand interpretations
Client-supplied
Read/write n bytes
23MetaTF Evaluation summary
- Simple but expressive syntax, familiar to
programmers - Generated readers and writers, comparable
performance - Separation of representation and content?
- Field properties
- User-supplied, low-level readers/writers
- Interface
- Event classes
- getNextEvent, putEvent methods
- What else?
24Preliminary Results XML Compression
25.5
25Summary
- Heap allocation research faces challenges
- We want to support easy, effective research
- HATF, MetaTF are suggestions
- Content issues
- What is the minimum content?
- How to we define extensible formats?
- Representation issues
- Is HATF sufficiently better than ASCII?
- How to separate, hide representation?
- Organisation issues
- What other meta-information should be stored?
- What do you think?
26Status
- HATF
- Preliminary implementation complete
- Trying to make code/traces available
- Hoping 3rd party will develop implementation from
specification - Will help fix specification, implementation
- MetaTF
- Preliminary implementation in progress
- Definition converging
27Feedback You tell us
- What else does HATF need to contain?
- How important are references?
- Does anybody really care about representation?
- Should we just pick one and everybody will be
happy?
28Backup Slides
29Talk Overview
- Motivation
- HATF Heap Allocation Trace Format
- Design goals
- Trace content
- Trace representation
- Representation Effectiveness
- MetaTF specifying trace formats
- Design
- Generating readers and writers
- Traces as Components
30Separating Content and Representation
- Ideally, representation and content would be
entirely separate - User could use trace via standard API with no
external dependencies - Trace API would be delivered as a unit
- Similar in spirit to components (Java Beans, COM)
- No standard off-the-shelf way to achieve this
- Best thing we can think of is to make
readers/writers easy to acquire and use
31ASCII versus Binary Representation
- ASCII
- Portable, easy to examine and debug
- Manipulated via text scripting tools (Perl)
- Potential to ride the XML wave
- Binary
- More compact representation (more later)
- Faster to read
- Contents exported to ASCII on demand
- We chose Binary
32HATF Metadata
- Metadata commands embedded in data
- Field sizes range from 0 to 8 bytes
- Field interpretations (mini compression ops)
- Compute field value as some functionExamples
- None, default, base/offset, delta, stride
- Size/interpretation stay in effect until changed
again - Reader interprets value of fields on-the-fly
33Metadata Example
- Goal encode most allocate sizes in 1 byte
- Example trace contents
- Metadata setWidth fieldsize width1
- Data allocate size40, addr0x3ff,
- Metadata setWidth fieldsize width2
- Data allocate size1024, addr0xa10,
- Data allocate size1024, addr0xc10,
- Metadata setWidth fieldsize width1
- Data allocate size16, addr0xf00,
- Data allocate size24, addr0xf10,
34Trace Compression (MS Apps, w/o gzip)
35HATF Compression across Apps
36XML
- lt!-- DTD for HATF-1.0 --gtlt!element size
(PCDATA)gtlt!element address (PCDATA)gtlt!elemen
t time (PCDATA)gtlt!element thread
(PCDATA)gtlt!element heap (PCDATA)gtlt!element
attributes (PCDATA)gt - lt!element alloc size address time thread heap
attributesgtlt!element reallocNoALloc address
address time thread heap attributesgtlt!element
reallocAllocFree address address time thread
heap attributesgtlt!element reallocAlloc address
address time thread heap attributesgtlt!element
reallocFree address address time thread heap
attributesgtlt!element free address time thread
heap attributesgtlt!element createHeap thread
heap attributesgtlt!element destroyHeap thread
heap attributesgtlt!element createThread thread
attributesgtlt!element destroyThread thread
attributesgtlt!element comment attributesgt
37HATF1.0 specified in MetaTF1.1
- tag.width 1 size.width 4
size.interpretation noneaddress.width 4
address.interpretation noneattributes.width
0 attributes.interpretation
nonetime.interpretation default
0thread.interpretation default
0heap.interpretation default 0 - section heap 1 reallocNoAlloc (tag,
address, address, time, thread, heap, vfield)
tag.value 3 reallocAllocFree (tag,
address, address, time, thread, heap, vfield)
tag.value 4 reallocAlloc (tag,
address, address, time, thread, heap, vfield)
tag.value 5 reallocFree (tag,
address, address, time, thread, heap, vfield)
tag.value 6
- alloc (tag, size, address, time, thread, heap,
vfield) tag.value 1 free (tag,
address, time, thread, heap, vfield)
tag.value 2 createHeap (tag, thread,
heap, vfield) tag.value 7
destroyHeap (tag, thread, heap, vfield)
tag.value 8 createThread (tag,
thread, vfield) tag.value 9
destroyThread (tag, thread, vfield)
tag.value 10