J. Bradley Chen and Bradley D. D. Leupen - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

J. Bradley Chen and Bradley D. D. Leupen

Description:

A New Heuristic. Activation Order: Co-locate procedures that are activated. sequentially. ... The AO heuristic is effective. The overhead of JITCL is negligible. ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 24
Provided by: brad84
Category:

less

Transcript and Presenter's Notes

Title: J. Bradley Chen and Bradley D. D. Leupen


1
Improving Instruction Locality with Just-In-Time
Code Layout
  • J. Bradley Chen and Bradley D. D. Leupen
  • Division of Engineering and Applied Sciences
  • Harvard University

2
Goals
  • Improve instruction reference locality
  • big problem for commodity applications
  • Eliminate need for profile information
  • required by current compiler-based solutions

3
How?
  • Implement layout dynamically using Activation
    Order
  • A new heuristic for code layout.
  • Locate procedures in order of use.

4
Requirements
  • No special hardware support.
  • Minimal changes to the operating system.
  • Minimal system overhead.

5
Optimizing Procedure Layout
Bad Layout
Better Layout
6
Current Practice Pettis and Hansen
  • Nodes are procedures.
  • Edges are caller/callee pairs.
  • Weights are call frequency.

7
Pettis and Hansen Layout
layout
layout GetEvent, CheckForInputErrors
layout EventLoop, GetEvent, CheckForInputError
s
layout React, EventLoop, GetEvent, CheckForI
nputErrors
layout HandleCommonCase, React, EventLoop,
GetEvent, CheckForInputErrors
8
A New Heuristic
Activation Order Co-locate procedures that are
activated sequentially. Example
9
Implementing JITCL
__start perform initializations
call thunk_main thunk_main . . .
thunk_foo . . . __InstructionMemory
Thunk routines implement code layout on-the-fly.
10
Thunk routines
// Global variables // ProcPointers - one
element per procedure // INDEX_proc and
LENGTH_proc for each procedure thunk_main if
(InCodeSegment(ProcPointersINDEX_main))
ProcPointersINDEX_main
CopyToTextSegment(ProcPointerINDEX_main, LEN
GTH_main) PatchCallSite(ProcPointerINDEX_main
, ComputeCallSiteFromReturnAddress
(RA)) jmp ProcPointerINDEX_main
The thunk routines copy procedures into the
text segment and update call sites at run-time.
11
Simulation Methodology
12
Workloads
13
Results
  • The AO heuristic is effective.
  • The overhead of JITCL is negligible.
  • JITCL improves procedure layout without requiring
    profile information.
  • JITCL reduces program memory requirements.

14
Results The AO Heuristic
Improvement in I-Cache Miss Rate
Conclusion Effectiveness of heuristic is
comparable to PH.
15
Overhead of JITCL
  • Copy overhead
  • instruction overhead
  • cache overhead
  • Cache consistency
  • Disk overhead - comparable to demand loaded text
    not evaluated.

16
Results Overhead
Overhead Instructions ()
Conclusion JITCL Overhead is less than 0.1 in
all cases.
17
Results Performance
Saved Cycles per Instruction
Conclusion Overall performance is comparable to
PH.
18
JITCL for Win32 Applications
  • Windows applications are composed of multiple
    executable modules.
  • When transitions between modules are frequent,
    intra-module code layout is less effective.
  • With JITCL, inter-module code layout is possible
    and beneficial.

19
Win32 Cache Miss Rates
Conclusion Careful layout did not help Win32
applications.
20
Text Segment Size
Text size in megabytes
Conclusion JITCL typically reduces text size by
50.
21
JITCL vs. PBO
  • JITCL provides an alternative to feedback-based
    procedure layout.
  • Many important optimizations still require
    profile information.
  • instruction scheduling
  • register allocation
  • other intra-procedural optimizations
  • Dont expect profile-based optimization to go
    away!

22
Conclusions
  • Just-In-Time code layout achieves comparable
    benefit to profile-based code layout without the
    need for profiles.
  • The AO heuristic is effective.
  • The overhead of procedure copying is low.
  • Benefit in I-Cache is comparable to Pettis and
    Hansen layout.
  • JITCL can reduce working set size.

23
The Morph Project
For more information http//www.eecs.harvard.edu/
morph/
Write a Comment
User Comments (0)
About PowerShow.com