The Potential for Variable-Granularity Access Tracking for Optimistic Parallelism PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: The Potential for Variable-Granularity Access Tracking for Optimistic Parallelism


1
The Potential for Variable-Granularity Access
Tracking for Optimistic Parallelism
  • Mihai Burcea, J. Gregory Steffan, Cristiana Amza
  • University of Toronto
  • MSPC 2008

2
Getting the Most Out of Your CPUs
AMD Barcelona quad-core
  • Ubiquitous CMPs
  • How do we exploit all this parallelism?
  • How do we improve sequential applications?

Intel Kentsfield quad-core
3
Optimistic Parallelism
  • Flavors
  • Transactional Memory (TM)
  • Thread-Level Speculation (TLS)
  • Implementations hardware, software, hybrid
  • Common required support
  • Buffering speculative memory changes
  • Tracking and detecting memory access conflicts

4
Traditional Access Tracking
  • Most approaches use some fixed granularity
  • Hardware TM/TLS cache-line size
  • Typically 32/64/128 bytes
  • Software TLS word-, object-level
  • Software TM word/page/object granularity
  • Hybrid TM mixture of above (in HW/SW)

Is Fixed Granularity the best approach ?
5
Can We Reduce The Overhead of Dependence Tracking
?
Too much overhead
Too many false conflicts
Fine
Granularity
Coarse
  • Key Intuition best granularity likely
    varies within and across benchmarks

6
False Conflicts when Using Uniform Coarse
Granularity
Measured in a TLS simulator 32/64/128 cache
line sizes (bytes)
Uniform coarse grain approach suffers false
conflicts
7
  • Is there potential for a variable-granularity
    approach?

8
Goals Of Our Work
  • Show potential for Variable-Granularity Access
    Tracking (VGAT)
  • Finest grain too expensive which coarse grain?
  • Show that ideal granularity varies across and
    within applications
  • Suggests need for dynamic, adaptive scheme
  • Show significant reduction in number of tracked
    memory ranges when using VGAT

9
Related Work
  • Hardware TLS / TM track accesses at cache-line
    size (32/64/128 bytes)
  • Stampede (Steffan et. al., ACM Trans. 2005),
    Speculative Versioning Cache (Vijaykumar et. al.,
    HPCA 1998)
  • Unbounded TM (Ananian et. al., HPCA 2005), LogTM
    (Moore et. al., HPCA 2006)
  • Software TLS
  • Word (Cintra et. al., PPoPP 2003)
  • Object (Pickett et. al., LCPC 2005)
  • Software TM
  • Word (McRT-STM Saha et. al., PPoPP 2006)
  • Page (Manassiev et. al., PPoPP 2006)
  • Object RSTM (Marathe et. al., PLDI 2006), DSTM
    (Herlihy et. al., PODC 2003)

Most systems use fixed or object grain - but not
necessarily the best
10
Related Work Bulk Disambiguation
  • Ceze et. al., ISCA 2006
  • Encode read/write sets into signatures
  • Detect conflicts by performing operations on
    signatures (fast)
  • Design of hashing (encoding) addresses into
    signatures includes false positives
  • Reduce conflict-detection traffic, but increase
    false conflicts

Our goal minimize false conflicts
11
Variable Granularity Access Tracking
  • Approaches vary granularity across
  • Time parts of apps. (speculative code regions)
  • Space ranges of memory
  • Can potentially reduce
  • Tracking storage
  • Tracking traffic
  • Commit latency
  • False conflicts

12
Impact On Conflicts Of Increasing Granularity
Granularity (bytes) Number of conflicts
4 100
8 100
16 103
32 120
True (actual) conflicts
?
Same nr. of conflicts, still ok
Extra (false) conflicts!
Coarsest granularity that incurs no false
conflicts Ideal Granularity
13
  • Measuring the Potential for VGAT

14
Experimental Framework
  • TLS simulator (CMU)
  • Subset of SpecINT2000 benchmarks
  • Instrumented for TLS
  • TLS regions mostly loop-based
  • TLS regions pre-selected based on 32-byte reading
    and 4-byte writing granularity
  • Focus on specific aspects
  • Simulate first billion instructions
  • Track only Read-After-Write dependences

Speculative code regions pre-selected for 32
bytes -gt our results are conservative!
15
Variable Granularity at Code Region Level
Memory accessed by Region 1
fork
Speculative Code Region 1
join
Granularity 4 bytes
Memory accessed by Region 2
Speculative Code Region 2
fork
join
Granularity 32 bytes
Memory accessed by Region 3
Speculative Code Region 3
fork
join
Granularity 8 bytes
4 bytes
8 bytes
32 bytes
16
Ideal Granularity at Code Region Level
page-level (4 k)
cache-line level
word-level
Code regions with no conflicts not shown in
figure (in parentheses)
Ideal Granularity varies significantly between
code regions
17
Variable Granularity Across Memory Ranges
Memory accessed by Region 1
fork
Speculative Code Region 1
join
Memory accessed by Region 2
fork
Speculative Code Region 2
join
Memory accessed by Region 3
fork
Speculative Code Region 3
join
4 bytes
8 bytes
32 bytes
18
Ideal Granularity Across Memory Ranges
Cache-line size sometimes good, sometimes not
Word-level rarely necessary
Page-level often sufficient
Ideal Granularity varies widely across memory
ranges
19
  • Can VGAT improve performance?

20
Reducing the Number of Tracked Elements by using
Variable Granularity
51
50
35
458
61
31
9
5
3
VGAT can reduce the of tracked elements more
than 3x!
21
Ongoing Work
  • Should memory-centric or code-centric accesses
    determine granularity ?
  • Dynamic, adaptive system for deciding granularity
    based on iterative sampling
  • How best to use and store profile information
  • May tolerate some percentage of false conflicts
  • Hardware TLS
  • Reduce conflict-detection traffic, possibly power
  • Software TM (lock-based)
  • Reduce number of locks save space and time
  • Reduce lock contention

22
Conclusions (for Stampede TLS)
  • TM/TLS systems with only fixed coarse granularity
    may suffer many false conflicts
  • 2x 4x on average
  • Variable granularity can reduce false conflicts
    and tracking overhead
  • 3x 35x reduction in tracked ranges
  • Ideal granularity varies widely across memory
    ranges and speculative code regions

23
  • Thank you!
  • Questions ?
Write a Comment
User Comments (0)
About PowerShow.com