Memory Consistency in Vector IRAM - PowerPoint PPT Presentation

About This Presentation
Title:

Memory Consistency in Vector IRAM

Description:

Consistency model applies to instructions in a single instruction stream ... Why Relax Memory Consistency? Natural micro-architecture has multiple paths to memory ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 10
Provided by: davidr94
Category:

less

Transcript and Presenter's Notes

Title: Memory Consistency in Vector IRAM


1
Memory Consistencyin Vector IRAM
  • David Martin

2
The Memory Consistency Model
  • Consistency model applies to instructions in a
    single instruction stream (different than
    multi-processor consistency!).

a after V vector R read VP virtual
processor W write no sync required S
scalar sync required
  • Definition of a XaY sync
  • All operations of type Y occurring before the
    sync in program order appear to execute before
    any operation of type X occurring after the sync
    in program order.
  • Definition of a XaY sync to vector register
    vri
  • The most recent operation of type Y to vri
    appears to execute before any operation of type X
    occurring after the sync in program order.

3
Why Relax Memory Consistency?
  • Natural micro-architecture has multiple paths to
    memory
  • Want to decouple scalar and vector units without
    complex hardware

Fetch
Scalar Core
Vector Unit
Sync
Memory
  • Trade-off between more complex hardware
    (speculation, disambiguation, cache coherence)
    and more complex software (sync instructions)
  • Should explore solutions to this trade-off that
    involve more hardware e.g. Hardware guarantees
    SaV and VaS ordering, but leaves VaV and VP
    orderings to software.

4
Software Conventions for Syncs
Vector Function
Conventions 1. Execute VaS and VaV syncs on
entry to vector code. 2. Execute SaV sync on exit
from vector code.
VaS,VaV
Scalar Code
Vector Code
SaV
  • Vector code is responsible for not messing things
    up.
  • Allows us to vectorize libraries to speed up
    existing programs.
  • Dont want to assume that our compiler will
    compile and globally optimize all non-vector code
    that we run.
  • Alternative model Pass around flags to
    communicate sync requirements or history
  • Must assume that our compiler compiles all code
    run on IRAM.
  • Not sure we want to accept that restriction.

5
Sync Implementations and Costs
  • SaV Stall fetch unit until vector unit has
    committed all vector memory instructions.
  • Could take 1000s of cycles with many indexed
    vector memory operations in flight!
  • Very difficult to delay issue since it is often
    issued at the end of a vector routine.
  • VaS Stall fetch unit until scalar unit has
    committed all scalar memory instructions.
  • Not too expensive (10s of cycles?) because scalar
    unit is ahead of the vector unit, because the
    scalar core is simple, and because the data cache
    is write-thru.
  • Easy to delay issue because it is often issued at
    the start of a vector routine.
  • VaV and VPaVP No operation.
  • Nop because we have 1 vector memory unit and no
    vector caches.

6
Current Sync Analysis Tool
  • Executes a program and tells you
  • 1. Whenever two memory references are not
  • Ordered by architectural guarantees
  • Ordered by register dependencies
  • Ordered by an intervening sync instruction
  • 2. Whenever a sync instruction is not used to
    resolve any hazard, as described in (1).
  • Caveats
  • Hazards are detected from a single program
    execution Information may not hold true for all
    possible executions of the program.
  • Hazard detection is conservative in the presence
    of synchronization chains.

Two Examples of Synchronization Chains
Write(A) lt- r1 RAW SYNC Read(A) lt- r2 WAR
SYNC Write(A) lt- r3 Write(A) lt- r1 RAW
SYNC Read(A) lt- r2 Write(A) lt- r2
Hazard?
Hazard?
7
Optimizing Code
  • Basic problem
  • Vector unit requires setup VL, VPW, mask,
    exceptions
  • Vector code responsible for issuing syncs
  • Both of these are required in a vector routine if
    nothing is known about the calling context!
  • All solutions share the notion of giving control
    of the calling context to the compiler. Two
    options
  • (1) Pass around flags so that syncs and setup
    code can be avoided at run-time
  • (2) Do global optimizations so that syncs and
    setup code can be eliminated at compile-time

. . . Scalar code Vector setup VaS and VaV
sync Vector function SaV sync Scalar code Vector
setup VaS and VaV sync Vector function SaV
sync Scalar code . . .
8
Optimization Example
  • Demonstrates potential benefit from optimizing
    scalar-vector communication
  • Code computes ABCDEF in the following manner

A
D
B
C
E
F
  • Unoptimized code calls a general vector add
    routine 5 times
  • First optimization inlines the 5 routines and
    removes vector initialization sequences
  • Second optimization also removes unnecessary sync
    instructions





  • Optimization goal is to avoid sawtooth in
    instantaneous performance graphs caused by
    draining the vector pipelines between vector loops

9
  • Large optimization potential for short vector
    loops.
  • SaV syncs are most important to eliminate or
    delay.
  • VaS sync performance impact is unclear.
  • VaV syncs are virtually free in VIRAM-1.
  • Setup code is expensive. For this example, it is
    as expensive as the SaV syncs.
Write a Comment
User Comments (0)
About PowerShow.com