Recursive Data Structure Profiling - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Recursive Data Structure Profiling

Description:

significant fraction of memory operations in typical programs ... Speculation recovery costs outweighs benefits if the next pointer field gets ... – PowerPoint PPT presentation

Number of Views:12
Avg rating:3.0/5.0
Slides: 20
Provided by: ResearchM53
Category:

less

Transcript and Presenter's Notes

Title: Recursive Data Structure Profiling


1
Recursive Data Structure Profiling
  • Easwaran Raman
  • David I. August
  • Princeton University

2
Motivation
  • Huge processor-memory performance gap
  • Latency gt 100 cycles
  • significant fraction of memory operations in
    typical programs
  • In many applications, Recursive Data Structures
    (RDS) constitute a large fraction of memory usage

1000
100
10
Year
3
Motivation
  • Techniques to minimize the performance impact of
    this gap
  • Caching, prefetching, out-of-order execution
  • Not very successful for RDS
  • Difficult to statically determine many RDS
    properties
  • Accesses are irregular and usually lie in
    critical path of execution

Short loop body prevents efficient OoO
execution
Non-contiguous layout results in irregular
access patterns
while (valid(node)) //do something
//with node-gtdata node next(node)
0x1000
0x2000
0x3000
0x4000
Traversal Code
An RDS layout example
4
Motivation
  • LinearizationClark76, Luk99
  • Speculation recovery costs outweighs benefits if
    the next pointer field gets overwritten
    frequently
  • Information on the dynamic behavior of entire RDS
    structure is important

head
head
1008
1012
1004
1016
1000
pos
index 0 head posindex while(head)
foo(head) head posindex
check(head)
Placement of the nodes in the figure
correspond to their placement in memory
5
RDS Profile
  • RDS profiling gives a logical understanding of
    runtime behavior
  • Application creates 100 trees instead of
    application allocates 2MB in heap
  • Linked list traversed 10 times instead of
    Address 0x10004000 accessed 200 times
  • Profile for linearization next pointer field in
    list L is modified n times

6
RDS Discovery
  • node tree_create()
  • node n (node )malloc()
  • n-gtleft
  • tree_create()
  • n-gtright
  • tree_create()
  • call malloc id 1
  • mov r10 r8
  • call tree_create
  • call malloc id 2
  • mov r11 r8
  • store r10offset1 r11 create 1-gt2
  • call tree_create
  • call malloc id 3
  • mov r12 r8
  • store r10offset2 r12 create 1-gt3

C function for creating a tree
Dynamic Shape Graph
Execution trace in (pseudo) assembly
  • Assign unique id for value returned by malloc and
    create a node labeled by that id
  • Connect nodes by a directed edge if both the
    address and the value of a store have valid ids

7
RDS Discovery
  • Multiple RDS instances can be connected together
    in the DSG!
  • To separate them, we use properties of the static
    code
  • Use another graph called Static Shape Graph (SSG)

8
RDS discovery
Execution trace in (pseudo) assembly
  • call malloc id 1
  • Mov r20 r8
  • call malloc id 2
  • mov r10 r8
  • call tree_create
  • call malloc id 3
  • mov r11 r8
  • store r10offset1 r11 create 2-gt3
  • call tree_create
  • call malloc id 4
  • mov r12 r8
  • store r10offset2 r12create 2-gt4
  • store r200 r10 create 1-gt2
  • For every static call to malloc, create a node
    with unique id in the Static Shape Graph (SSG)
  • If a store creates an edge, connect the
    corresponding static nodes
  • Check for SCCs in the SSG
  • Connect two dynamic nodes only if their
    corresponding static nodes are in same SCC

1
A
5
2
T
6
7
3
4
SSG
DSG
9
Experimental setup
  • Uses Pin, a dynamic instrumentation tool for
    Itanium
  • Mapping between address ranges and dynamic ids
    are stored in an AVL tree
  • Most recent mapping is cached
  • A mix of benchmarks from SPEC, Olden and other
    pointer intensive applications
  • Dynamic instruction count varies from a few
    million (ks) to over 300 billion (mesa)
  • All experiments run on a 900MHz Itanium 2 with 2
    GB RAM running RH 7.1

10
Profiler Performance
  • Profile RDS size, lifetime, access count
  • Memory lt16 MB for all but 3 applications

Baseline Execution using Pin ( 10 times slower
than native)
11
RDS usage statistics
  • SCCs in static shape graph (RDS types)
  • Usually a few(lt5) per benchmark, a maximum of 31
    in parser
  • RDS instances (connected components in DSG)
  • Exhibits a wide range (1 in mcf to around million
    in parser)
  • Tend to be live for long if the program creates
    only a few of them
  • Sizes of RDS instances
  • Varies from a single node self-loop (parser) to a
    few hundred thousand nodes (mcf, parser)
  • pointer chasing loads
  • Significant in many benchmarks
  • Applications show vast diversity in RDS usage
  • A good reason for profiling them!

12
Temporal distribution
13
Cumulative distribution of RDS lifetimes
14
RDS Stability
  • Stability of an RDS A notion of how
    'array-like' an RDS is
  • Stability index an attempt to quantify this
    notion
  • Identify the time instances (alteration points)
    when changes occur to the RDS structure (by
    stores that replace existing pointers)
  • Count the traversals between successive
    alteration points
  • Stability index intervals that account for
    most of the traversals
  • Lower index means higher stability

15
Cumulative distribution of stability index
16
Conclusion
  • Aggressive data structure level optimization
    techniques for RDS need profile information for
    improved performance
  • RDS profiling gives a better understanding of the
    runtime behavior of RDS
  • RDS usage varies widely across benchmarks

17
Extra Slides
18
RDS Profiling Definitions
  • RDS type The abstract form of the logical data
    structure that is manipulated by the program
  • Examples list, binary tree, graph, etc.
  • Can be mutually recursive (nodes point to their
    incident edges and vice versa to form a graph)
  • RDS instance A concrete realization of the RDS
    type
  • Example the tree created in function foo, the
    list pointed to by the first entry of the hash
    table.

19
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com