Verification of cache-coherence protocols with TLA - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Verification of cache-coherence protocols with TLA

Description:

Title: The Standard Theory of Wildfire Author: CRL Last modified by: Mark R. Tuttle Created Date: 6/17/1995 11:31:02 PM Document presentation format – PowerPoint PPT presentation

Number of Views:161
Avg rating:3.0/5.0
Slides: 49
Provided by: crl86
Category:

less

Transcript and Presenter's Notes

Title: Verification of cache-coherence protocols with TLA


1
Verification of cache-coherence protocols with
TLA
  • Homayoon Akhiani, Damien Doligez, Paul Harter,
  • Leslie Lamport, Joshua Scheid, Mark Tuttle, Yuan
    Yu
  • Compaq Computer Corporation

2
TLA
  • A formal specification language based on set
    theory, first-order logic, temporal logic
  • Engineers find reading easy, writing not too hard

CacheUnmodified(adr) \/ SharedMode(adr)
\/ /\
ExclusiveMode(adr)
/\ DirtyBitSet(adr)
Cache Cache EXCEPT !adr.state Invalid
3
Used TLA to demonstrate formal methods to
engineering
  • Analyzed cache-coherence protocols for
  • EV6 Alpha 21264 processor
  • EV7 Alpha 21364 processor
  • Built TLC, a model-checker for TLA
  • Analyzed proposals for industry standards
  • PCI-X,

4
EV6 cache coherence
processors
memory
directory
P1
P2
P3
x
x
copies
owner
5
P4
  • To get x, go to xs directory to see who owns x.

5
Shared read, data in memory
S
owner
adr
copies
S
R
x
S,S,S
none
S
Rd(x)
6
Shared read, remote owner
O
S
owner
adr
copies
S
R
x
S,S,S
O
S
Rd (x)
7
Exclusive read, data in memory
S
owner
adr
copies
S
R
x
S,S,S
none
S
RdEx(x)
8
Exclusive read, remote owner
O
S
owner
adr
copies
S
R
x
S,S,S
O
S
RdEx(x)
9
No InvalAcks
Inval
RdEx(x)
R
Dir
S
InvalAck
NO!
Fewer messages sent, and R not blocked waiting
for InvalAck. Now correctness depends on network
message ordering.
10
No dirty write backs required
O
WriteBack
FwdRdEx(x)
NO!
Data
Dir
R
RdEx(x)
Fewer messages sent. Now correctness depends on
the owner always holding the data.
11
Chains of requests
R1
R2
R3
Dir
12
Memory barriers
All memory ordering imposed by memory barriers.
read flag
MB
read data
How do we know when this ordering has been
determined? The answer is highly optimized.
13
Separate commit/data responses
O
Data
FwdEx(x)
Commit
Dir
R
Rd(x)
MB passed when all outstanding commits are
received. Commits generated as early as possible!
14
Significant speed ups
R
Data can be returned faster.
Inval(y)Inval(z)Commit
Data
MB can be passed faster.
R
read flag
MB
Data
commit
read data
Dir
But now verification is much harder.
15
Hierarchical network
global switch
memory
directory
local switches
processors
At the home node, always satisfy requests locally
if possible...
16
Deadlock the deadly embrace
home x
home y
Deadlock FwdRds are stalled waiting for data to
arrive.
17
Shadow mode
FwdRd(x)
FwdRd is a shadow starter (when the reader is on
the home node)
Rd(x)
Subsequent messages are shadowed in shadow
mode (bounced off the global switch)
18
Shadow mode solves deadlock
FwdRd(x)
FwdRd(y)
FwdRd(x)
FwdRd(y)
home x
home y
Data travels in a separate channel other
messages dont block data. Deadlock gone.
19
This is not your fatherscache coherence
protocol!
  • Protocol is highly optimized
  • No InvalAcks or NoAcks, no Dirty Write Backs
  • Long chains of data forwarding
  • Separate commit/data messages
  • Aggressive early commit generation
  • Shadow mode
  • Protocol was the largest to be analyzed with
    formal methods (to our knowledge as of 1997).

20
EV6 cache coherence in three easy
stepstwo-man years
Model Alpha memory model.(200 lines)
Prove implementation (550 lines, 2 months,
informal)
Model abstract protocol.(500 lines)
Prove implementation (5500 lines, 4 months,
incomplete)
Model complete protocol.(2000 lines, 3 months)
21
Step 1 Alpha memory model
  • We specified the Alpha memory memory model
  • The official specification is an informal
    description of the allowed sequences of reads and
    writes.
  • We needed a precise, state-based specification.
  • We specified a slightly simplified memory model.
  • (whole cache line access, common point of
    synchronization)
  • Compare the specifications
  • Official, English specification 12 pages
  • Logical, precise specification 200 lines

22
Key definition read/write ordering
Before order for an execution orders reads/writes
and determines what values are returned by
reads. GoodExecutionOrder defines good Before
orders, namely the orders allowed by the memory
model.
23
State machine actions
  • ReceiveRequest(proc, req) Receive a request
  • ChooseNewData(proc, idx)
    Choose the return value for a request
  • Respond(proc, idx) Return the value to a request
  • ExtendBefore Expand the Before
    relation
  • Actions preserve GoodExecutionOrder.

24
GoodExecutionOrder
  • This is the hard part --- but look how short it
    is!

GoodExecutionOrder LET some
definitions deleted IN /\ (
)
( Before is a partial order.
) (
) /\
Before \subseteq ReqId \X ReqId /\ \A r1,
r2 \in ReqId IsBefore(r1, r2) gt IsBefore(r2,
r1) /\ \A r1, r2, r3 \in ReqId
IsBefore(r1, r2) /\ IsBefore(r2, r3) gt
IsBefore(r1, r3) /\ (
) (
SourceOrder implies the Before order.
) (
) \A r1,
r2 \in ReqId SourceOrder(r1, r2) gt
IsBefore(r1, r2) /\ (
) (
RequestOrder implies the Before order.
) (
) \A r1,
r2 \in ReqId RequestOrder(r1, r2) gt
IsBefore(r1, r2)
25
/\ (
) ( Writes and successful SCs to
the same location that ) ( have issued a
response are totally ordered. )
(
) \A r1, r2 \in ReqId /\
ReqIdQr1.req.type \in "Wr", "SC" /\
ReqIdQr1.req.newData "Failed" /\
ReqIdQr1.req.responded /\
ReqIdQr2.req.type \in "Wr", "SC" /\
ReqIdQr2.req.newData "Failed" /\
ReqIdQr2.req.responded /\
ReqIdQr1.req.adr ReqIdQr2.req.adr gt
IsBefore(r1, r2) \/ IsBefore(r2, r1)
26
/\ (
) ( LL/SC Axiom For
each successful SC, there is a matching LL and
) ( there is no write to the same address
from a different ) ( processor
between the LL and SC in the Before order.
) (
) \A r2 \in
ReqId /\ ReqIdQr2.req.type "SC"
/\ ReqIdQr2.newData \notin Failed,
NotChosen gt \E r1 \in ReqId
/\ LLSCPair(r1, r2) /\ \A r \in
ReqId /\ \/ ReqIdQr.req.type
"Wr" \/ /\
ReqIdQr.req.type "SC"
/\ ReqIdQr.newData \notin NotChosen, Failed
/\ r1 r21
/\ ReqIdQr2.req.adr ReqIdQr.req.adr
gt IsBefore(r1, r) \/ IsBefore(r, r2)
27
/\ (
) ( Value Axiom A read
reads from the preceding write in the ) (
Before order.
) (
) \A r1, r2 \in
ReqId /\ ReqIdQr2.source NoSource
/\ ReqIdQr1.req.type "Wr" /\
ReqIdQr1.req.adr ReqIdQr2.req.adr gt
IF ReqIdQr2.source FromInitMem
THEN IsBefore(r1, r2) ELSE \/
IsBefore(ReqIdQr2.source, r1)
\/ IsBefore(r1, r2)
28
Step 2 Model abstract protocol
  • protocol abstract protocol implementation
    junk
  • Surprisingly,
  • abstract protocols correctness was far from
    obvious
  • we discovered a bug in the memory model
  • Proved hardest part of correctness
  • 35-line invariant based on 300 lines of
    definitions
  • 550-line proof, cases nested 10 levels deep

29
Step 3 Model complete protocol
  • Protocol 9 man-months, 1900 lines of TLA
  • Partial proof 7 man-months, 1000-line (partial)
    invariant

30
Obstacle multiple descriptions
  • English documents 10 documents, 2-inch stack
  • Lisp code crucial to understanding some details
  • None compact, none mathematically tractable
  • Solution write our own model
  • We used TLA

31
Obstacle algorithm complexity
  • ChangeToDirty DummyRdVic FailedChangeToDirty
    Fetch InvalToDirty InvalToDirtyVic Rd RdMod RdVic
    RdVicMod QV_Fetch QV_Rd QV_RdMod WrVic
    ChangeToDirtyFailure ChangeToDirtySuccess
    FetchFillMarker FillMarkerFillMarkerMod
    ForwardFetch ForwardFetchWithFetchFillMarker
    ForwardRd ForwardRdMod ForwardRdWithFillMarker
    ForwardRdModWithFillMarkerMod InvalAck
    InvalToDirtySuccess Invalidate LoopComsig
    LoopComsigWithInvalAck LoopComsigWithShadowClear
    LoopComsigWithShadowInvalAndShadowClear
    ShadowChangeToDirtySuccess ShadowForwardFetch
    ShadowForwardRd ShadowForwardRdMod
    ShadowInvalToDirtySuccess ShadowInvalidate
    ShadowShortFillMod ShadowSnap ShortFetchFill
    ShortFill ShortFillMod VictimAck FetchFill Fill
    FillMod VCFetchFill VCFill VCFillMod

32
Solution Quarks
  • Ack
  • ChangeToDirty
  • Clear
  • Comsig
  • Fill
  • ForwardedGet
  • GetValue
  • InvalidToDirty
  • QuadInvalidate
  • ReleaseMAF
  • ReleaseVDB
  • SetCacheLineState
  • Victimize
  • Write

Quarks combine to form messages.
33
Protocol example
If a processor receives a Fill quark carrying
cacheable data, then how is the cache is updated?
  • ProcFieldsMessage(proc, msg)
  • /\ ...
  • /\ Cache' CASE ...
  • ("Fill" \in msg) /\
    (subtype("Fill") "Fetch")
  • -gt Cache EXCEPT
  • !proc, cacheIndex.state
  • IF subtype("Fill") "Mod"
  • THEN "ExclusiveDirty"
  • ELSE "Clean",
  • !proc, cacheIndex.tag
    AddressToTag(msg.adr),
  • !proc, cacheIndex.data
    msg.data

34
The low-level invariant
  • Define protocol in terms of quarks.
  • Define an invariant describing all reachable
    states.

We considered only the most difficult parts
messages
messages
cache
dtag
directory
on quad
off quad
35
Dir - Dtag Invariant
DirDTagInvariant \A adr \in MemBlockAddress,
proc \in Processor a.\/ ( local address )
... b.\/ ( nonlocal address )
1./\ ProcToQuad(proc) AddressToQuad(adr)
  • 2./\ a.\/ ( proc is the owner of adr )
  • 1./\ Diradr.owner proc
  • b.\/ ( proc is not the owner of adr
    ) ...

2./\ a.\/ ( dtag is dirty
) 1./\ DTagState(adr,
proc) Dirty...
b.\/ ( dtag is invalid ) ...
c.\/ ( dtag is clean ) ...
2./\ Proj(HomeToArbQ)
FG QFI QI AckWrite QI AGV(mod,1) FG
AckCTD(Success) FG
DTagCacheInvariant ... Mother
DirDTagInvariant /\ DTagCacheInvariant /\ ...
36
DTag-Cache Invariance
  • ASSUME /\ Mother /\ Wildfire
  • /\ DTagCacheInvariant(proc,adr)
  • PROVE DTagCacheInvariant(proc,adr)'
  • lt1gt1. CASE a ( DTagState(proc, adr) "Invalid"
    )
  • lt1gt2. CASE b ( DTagState(proc, adr) "Invalid"
    )
  • lt1gt3. QED

37
DTag-Cache Invariance
  • ASSUME /\ Mother /\ Wildfire
  • /\ DTagCacheInvariant(proc,adr)
  • PROVE DTagCacheInvariant(proc,adr)'
  • lt1gt1. CASE a ( DTagState(proc, adr) "Invalid"
    )
  • lt2gt1. CASE a2a ( AddressCache(proc, adr).state'
    "Invalid" )
  • lt2gt2. CASE a2b ( AddressCache(proc, adr).state'
    "Invalid" )
  • lt2gt3. QED
  • lt1gt2. CASE b ( DTagState(proc, adr) "Invalid"
    )
  • lt1gt3. QED

38
DTag-Cache Invariance
  • ASSUME /\ Mother /\ Wildfire
  • /\ DTagCacheInvariant(proc,adr)
  • PROVE DTagCacheInvariant(proc,adr)'
  • lt1gt1. CASE a ( 1./\ DTagState(proc, adr)
    "Invalid" )
  • lt2gt1. CASE a2a ( 1. AddressCache(proc,
    adr).state' "Invalid" )
  • ...
  • lt14gt1. CASE doing something at the
    proc
  • Pf ....
  • lt14gt2. CASE doing something at the
    arb
  • lt14gt3. QED
  • ...
  • lt2gt2. CASE a2b ( 1. AddressCache(proc,
    adr).state' "Invalid" )
  • lt2gt3. QED
  • lt1gt2. CASE b ( 1./\ DTagState(proc, adr)
    "Invalid" )
  • lt1gt3. QED

39
The low-level refinement
  • For the abstract protocol, we defined the Before
    ordering for the protocol.
  • For the low-level protocol, we defined an
    invariant describing the reachable states.
  • Now use the invariant to prove that the Before
    ordering is the actual low-level ordering.
  • This refinement proof is undone.

40
One bug found
  • Quite unexpected to find only one bug!
  • Fix was an easy bookkeeping modification.
  • Demonstrating the bug requires
  • four processors
  • two memory locations
  • fifteen messages
  • Hand proof appears essential to finding this bug
  • extensive simulation did not find it
  • state space too large for exhaustive model
    checking

41
Wildfire challenge problem
http//www.research.digital.com/SRC/personal/ lamp
ort/tla/wildfire-challenge.html
  • We give you TLA models of
  • the Alpha memory model
  • the abstract protocol with one bug inserted
  • and challenge you to find the bug.
  • Incredibly, Georges Gonthier found it by
    inspection (plus a memory model mistake)!

42
TLC model checker
State machine in rich subset of TLA (Initial,
NextState)
Configuration file making state machine finite
Minimal state trace from an initial state to a
bad state
Invariant
43
TLC implementation
  • Require no changes to TLA specifications
  • use the richness of TLA, no primitive language
  • use configuration files instead
  • Interpret specifications, dont compile them
  • better user interaction possible
  • Use explicit state representation, not BDDs
  • BDD encoding of TLA formulas difficult
  • use canonical state representation
    fingerprinting
  • use efficient disk-based state set and queue
    implem.

44
TLC status
  • 20,000 lines of Java
  • Available to alpha testers under nondisclosure
  • Performance is good, sometimes slow threaded and
    distributed implementations now exist.
  • Liveness checking/livelock detection coming
  • Coverage analysis is desired What does lack of
    an error mean a correct spec or a buggy spec?

45
EV7 cache coherence
  • First intense application of TLC model checker
  • First TLA specification written by engineers
  • Specification is 1800 lines
  • Specification accepted by TLC w/o modification
  • State space reduced 50 by adding 15 lines to
    remove a lot of symmetry in state space

46
Results
  • 73 bugs found (90 found by TLC)
  • 37 minor typos, type errors, etc
  • 12 bugs wrong message/wrong state
  • 14 missing cases
  • 7 spurious cases (dead code)
  • 3 miscellaneous (1 TLA, 1 MC, 1 spec design)
  • War story Find bug B by hand find bug B like B
    by simulation find bug B in bug-fix for B
    find ??? written in original documentation!

47
Lessons learned
  • Learning TLA is not a major task, but writing
    good specifications still requires experience
  • EV6 verification was
  • humbling only one error actually found
  • encouraging the basic method works as expected
  • EV7 verification was very satisfying
  • TLA specifications can be written by engineers
  • TLC can handle industrial-sized specifications
  • Formal specification belongs in design process

48
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com