Title: Verification of cache-coherence protocols with TLA
1Verification of cache-coherence protocols with
TLA
- Homayoon Akhiani, Damien Doligez, Paul Harter,
- Leslie Lamport, Joshua Scheid, Mark Tuttle, Yuan
Yu - Compaq Computer Corporation
2TLA
- A formal specification language based on set
theory, first-order logic, temporal logic - Engineers find reading easy, writing not too hard
CacheUnmodified(adr) \/ SharedMode(adr)
\/ /\
ExclusiveMode(adr)
/\ DirtyBitSet(adr)
Cache Cache EXCEPT !adr.state Invalid
3Used TLA to demonstrate formal methods to
engineering
- Analyzed cache-coherence protocols for
- EV6 Alpha 21264 processor
- EV7 Alpha 21364 processor
- Built TLC, a model-checker for TLA
- Analyzed proposals for industry standards
- PCI-X,
4EV6 cache coherence
processors
memory
directory
P1
P2
P3
x
x
copies
owner
5
P4
- To get x, go to xs directory to see who owns x.
5Shared read, data in memory
S
owner
adr
copies
S
R
x
S,S,S
none
S
Rd(x)
6Shared read, remote owner
O
S
owner
adr
copies
S
R
x
S,S,S
O
S
Rd (x)
7Exclusive read, data in memory
S
owner
adr
copies
S
R
x
S,S,S
none
S
RdEx(x)
8Exclusive read, remote owner
O
S
owner
adr
copies
S
R
x
S,S,S
O
S
RdEx(x)
9No InvalAcks
Inval
RdEx(x)
R
Dir
S
InvalAck
NO!
Fewer messages sent, and R not blocked waiting
for InvalAck. Now correctness depends on network
message ordering.
10No dirty write backs required
O
WriteBack
FwdRdEx(x)
NO!
Data
Dir
R
RdEx(x)
Fewer messages sent. Now correctness depends on
the owner always holding the data.
11Chains of requests
R1
R2
R3
Dir
12Memory barriers
All memory ordering imposed by memory barriers.
read flag
MB
read data
How do we know when this ordering has been
determined? The answer is highly optimized.
13Separate commit/data responses
O
Data
FwdEx(x)
Commit
Dir
R
Rd(x)
MB passed when all outstanding commits are
received. Commits generated as early as possible!
14Significant speed ups
R
Data can be returned faster.
Inval(y)Inval(z)Commit
Data
MB can be passed faster.
R
read flag
MB
Data
commit
read data
Dir
But now verification is much harder.
15Hierarchical network
global switch
memory
directory
local switches
processors
At the home node, always satisfy requests locally
if possible...
16Deadlock the deadly embrace
home x
home y
Deadlock FwdRds are stalled waiting for data to
arrive.
17Shadow mode
FwdRd(x)
FwdRd is a shadow starter (when the reader is on
the home node)
Rd(x)
Subsequent messages are shadowed in shadow
mode (bounced off the global switch)
18Shadow mode solves deadlock
FwdRd(x)
FwdRd(y)
FwdRd(x)
FwdRd(y)
home x
home y
Data travels in a separate channel other
messages dont block data. Deadlock gone.
19This is not your fatherscache coherence
protocol!
- Protocol is highly optimized
- No InvalAcks or NoAcks, no Dirty Write Backs
- Long chains of data forwarding
- Separate commit/data messages
- Aggressive early commit generation
- Shadow mode
- Protocol was the largest to be analyzed with
formal methods (to our knowledge as of 1997).
20EV6 cache coherence in three easy
stepstwo-man years
Model Alpha memory model.(200 lines)
Prove implementation (550 lines, 2 months,
informal)
Model abstract protocol.(500 lines)
Prove implementation (5500 lines, 4 months,
incomplete)
Model complete protocol.(2000 lines, 3 months)
21Step 1 Alpha memory model
- We specified the Alpha memory memory model
- The official specification is an informal
description of the allowed sequences of reads and
writes. - We needed a precise, state-based specification.
- We specified a slightly simplified memory model.
- (whole cache line access, common point of
synchronization) - Compare the specifications
- Official, English specification 12 pages
- Logical, precise specification 200 lines
22Key definition read/write ordering
Before order for an execution orders reads/writes
and determines what values are returned by
reads. GoodExecutionOrder defines good Before
orders, namely the orders allowed by the memory
model.
23State machine actions
- ReceiveRequest(proc, req) Receive a request
- ChooseNewData(proc, idx)
Choose the return value for a request - Respond(proc, idx) Return the value to a request
- ExtendBefore Expand the Before
relation - Actions preserve GoodExecutionOrder.
24GoodExecutionOrder
- This is the hard part --- but look how short it
is!
GoodExecutionOrder LET some
definitions deleted IN /\ (
)
( Before is a partial order.
) (
) /\
Before \subseteq ReqId \X ReqId /\ \A r1,
r2 \in ReqId IsBefore(r1, r2) gt IsBefore(r2,
r1) /\ \A r1, r2, r3 \in ReqId
IsBefore(r1, r2) /\ IsBefore(r2, r3) gt
IsBefore(r1, r3) /\ (
) (
SourceOrder implies the Before order.
) (
) \A r1,
r2 \in ReqId SourceOrder(r1, r2) gt
IsBefore(r1, r2) /\ (
) (
RequestOrder implies the Before order.
) (
) \A r1,
r2 \in ReqId RequestOrder(r1, r2) gt
IsBefore(r1, r2)
25 /\ (
) ( Writes and successful SCs to
the same location that ) ( have issued a
response are totally ordered. )
(
) \A r1, r2 \in ReqId /\
ReqIdQr1.req.type \in "Wr", "SC" /\
ReqIdQr1.req.newData "Failed" /\
ReqIdQr1.req.responded /\
ReqIdQr2.req.type \in "Wr", "SC" /\
ReqIdQr2.req.newData "Failed" /\
ReqIdQr2.req.responded /\
ReqIdQr1.req.adr ReqIdQr2.req.adr gt
IsBefore(r1, r2) \/ IsBefore(r2, r1)
26 /\ (
) ( LL/SC Axiom For
each successful SC, there is a matching LL and
) ( there is no write to the same address
from a different ) ( processor
between the LL and SC in the Before order.
) (
) \A r2 \in
ReqId /\ ReqIdQr2.req.type "SC"
/\ ReqIdQr2.newData \notin Failed,
NotChosen gt \E r1 \in ReqId
/\ LLSCPair(r1, r2) /\ \A r \in
ReqId /\ \/ ReqIdQr.req.type
"Wr" \/ /\
ReqIdQr.req.type "SC"
/\ ReqIdQr.newData \notin NotChosen, Failed
/\ r1 r21
/\ ReqIdQr2.req.adr ReqIdQr.req.adr
gt IsBefore(r1, r) \/ IsBefore(r, r2)
27 /\ (
) ( Value Axiom A read
reads from the preceding write in the ) (
Before order.
) (
) \A r1, r2 \in
ReqId /\ ReqIdQr2.source NoSource
/\ ReqIdQr1.req.type "Wr" /\
ReqIdQr1.req.adr ReqIdQr2.req.adr gt
IF ReqIdQr2.source FromInitMem
THEN IsBefore(r1, r2) ELSE \/
IsBefore(ReqIdQr2.source, r1)
\/ IsBefore(r1, r2)
28Step 2 Model abstract protocol
- protocol abstract protocol implementation
junk - Surprisingly,
- abstract protocols correctness was far from
obvious - we discovered a bug in the memory model
- Proved hardest part of correctness
- 35-line invariant based on 300 lines of
definitions - 550-line proof, cases nested 10 levels deep
29Step 3 Model complete protocol
- Protocol 9 man-months, 1900 lines of TLA
- Partial proof 7 man-months, 1000-line (partial)
invariant
30Obstacle multiple descriptions
- English documents 10 documents, 2-inch stack
- Lisp code crucial to understanding some details
- None compact, none mathematically tractable
- Solution write our own model
- We used TLA
31Obstacle algorithm complexity
- ChangeToDirty DummyRdVic FailedChangeToDirty
Fetch InvalToDirty InvalToDirtyVic Rd RdMod RdVic
RdVicMod QV_Fetch QV_Rd QV_RdMod WrVic
ChangeToDirtyFailure ChangeToDirtySuccess
FetchFillMarker FillMarkerFillMarkerMod
ForwardFetch ForwardFetchWithFetchFillMarker
ForwardRd ForwardRdMod ForwardRdWithFillMarker
ForwardRdModWithFillMarkerMod InvalAck
InvalToDirtySuccess Invalidate LoopComsig
LoopComsigWithInvalAck LoopComsigWithShadowClear
LoopComsigWithShadowInvalAndShadowClear
ShadowChangeToDirtySuccess ShadowForwardFetch
ShadowForwardRd ShadowForwardRdMod
ShadowInvalToDirtySuccess ShadowInvalidate
ShadowShortFillMod ShadowSnap ShortFetchFill
ShortFill ShortFillMod VictimAck FetchFill Fill
FillMod VCFetchFill VCFill VCFillMod
32Solution Quarks
- Ack
- ChangeToDirty
- Clear
- Comsig
- Fill
- ForwardedGet
- GetValue
- InvalidToDirty
- QuadInvalidate
- ReleaseMAF
- ReleaseVDB
- SetCacheLineState
- Victimize
- Write
Quarks combine to form messages.
33Protocol example
If a processor receives a Fill quark carrying
cacheable data, then how is the cache is updated?
- ProcFieldsMessage(proc, msg)
- /\ ...
- /\ Cache' CASE ...
- ("Fill" \in msg) /\
(subtype("Fill") "Fetch") - -gt Cache EXCEPT
- !proc, cacheIndex.state
- IF subtype("Fill") "Mod"
- THEN "ExclusiveDirty"
- ELSE "Clean",
- !proc, cacheIndex.tag
AddressToTag(msg.adr), - !proc, cacheIndex.data
msg.data
34The low-level invariant
- Define protocol in terms of quarks.
- Define an invariant describing all reachable
states.
We considered only the most difficult parts
messages
messages
cache
dtag
directory
on quad
off quad
35Dir - Dtag Invariant
DirDTagInvariant \A adr \in MemBlockAddress,
proc \in Processor a.\/ ( local address )
... b.\/ ( nonlocal address )
1./\ ProcToQuad(proc) AddressToQuad(adr)
- 2./\ a.\/ ( proc is the owner of adr )
- 1./\ Diradr.owner proc
- b.\/ ( proc is not the owner of adr
) ...
2./\ a.\/ ( dtag is dirty
) 1./\ DTagState(adr,
proc) Dirty...
b.\/ ( dtag is invalid ) ...
c.\/ ( dtag is clean ) ...
2./\ Proj(HomeToArbQ)
FG QFI QI AckWrite QI AGV(mod,1) FG
AckCTD(Success) FG
DTagCacheInvariant ... Mother
DirDTagInvariant /\ DTagCacheInvariant /\ ...
36DTag-Cache Invariance
- ASSUME /\ Mother /\ Wildfire
- /\ DTagCacheInvariant(proc,adr)
- PROVE DTagCacheInvariant(proc,adr)'
- lt1gt1. CASE a ( DTagState(proc, adr) "Invalid"
) - lt1gt2. CASE b ( DTagState(proc, adr) "Invalid"
) - lt1gt3. QED
37DTag-Cache Invariance
- ASSUME /\ Mother /\ Wildfire
- /\ DTagCacheInvariant(proc,adr)
- PROVE DTagCacheInvariant(proc,adr)'
- lt1gt1. CASE a ( DTagState(proc, adr) "Invalid"
) - lt2gt1. CASE a2a ( AddressCache(proc, adr).state'
"Invalid" ) - lt2gt2. CASE a2b ( AddressCache(proc, adr).state'
"Invalid" ) - lt2gt3. QED
- lt1gt2. CASE b ( DTagState(proc, adr) "Invalid"
) - lt1gt3. QED
38DTag-Cache Invariance
- ASSUME /\ Mother /\ Wildfire
- /\ DTagCacheInvariant(proc,adr)
- PROVE DTagCacheInvariant(proc,adr)'
- lt1gt1. CASE a ( 1./\ DTagState(proc, adr)
"Invalid" ) - lt2gt1. CASE a2a ( 1. AddressCache(proc,
adr).state' "Invalid" ) - ...
- lt14gt1. CASE doing something at the
proc - Pf ....
- lt14gt2. CASE doing something at the
arb - lt14gt3. QED
- ...
- lt2gt2. CASE a2b ( 1. AddressCache(proc,
adr).state' "Invalid" ) - lt2gt3. QED
- lt1gt2. CASE b ( 1./\ DTagState(proc, adr)
"Invalid" ) - lt1gt3. QED
39The low-level refinement
- For the abstract protocol, we defined the Before
ordering for the protocol. - For the low-level protocol, we defined an
invariant describing the reachable states. - Now use the invariant to prove that the Before
ordering is the actual low-level ordering. - This refinement proof is undone.
40One bug found
- Quite unexpected to find only one bug!
- Fix was an easy bookkeeping modification.
- Demonstrating the bug requires
- four processors
- two memory locations
- fifteen messages
- Hand proof appears essential to finding this bug
- extensive simulation did not find it
- state space too large for exhaustive model
checking
41Wildfire challenge problem
http//www.research.digital.com/SRC/personal/ lamp
ort/tla/wildfire-challenge.html
- We give you TLA models of
- the Alpha memory model
- the abstract protocol with one bug inserted
- and challenge you to find the bug.
- Incredibly, Georges Gonthier found it by
inspection (plus a memory model mistake)!
42TLC model checker
State machine in rich subset of TLA (Initial,
NextState)
Configuration file making state machine finite
Minimal state trace from an initial state to a
bad state
Invariant
43TLC implementation
- Require no changes to TLA specifications
- use the richness of TLA, no primitive language
- use configuration files instead
- Interpret specifications, dont compile them
- better user interaction possible
- Use explicit state representation, not BDDs
- BDD encoding of TLA formulas difficult
- use canonical state representation
fingerprinting - use efficient disk-based state set and queue
implem.
44TLC status
- 20,000 lines of Java
- Available to alpha testers under nondisclosure
- Performance is good, sometimes slow threaded and
distributed implementations now exist. - Liveness checking/livelock detection coming
- Coverage analysis is desired What does lack of
an error mean a correct spec or a buggy spec?
45EV7 cache coherence
- First intense application of TLC model checker
- First TLA specification written by engineers
- Specification is 1800 lines
- Specification accepted by TLC w/o modification
- State space reduced 50 by adding 15 lines to
remove a lot of symmetry in state space
46Results
- 73 bugs found (90 found by TLC)
- 37 minor typos, type errors, etc
- 12 bugs wrong message/wrong state
- 14 missing cases
- 7 spurious cases (dead code)
- 3 miscellaneous (1 TLA, 1 MC, 1 spec design)
- War story Find bug B by hand find bug B like B
by simulation find bug B in bug-fix for B
find ??? written in original documentation!
47Lessons learned
- Learning TLA is not a major task, but writing
good specifications still requires experience - EV6 verification was
- humbling only one error actually found
- encouraging the basic method works as expected
- EV7 verification was very satisfying
- TLA specifications can be written by engineers
- TLC can handle industrial-sized specifications
- Formal specification belongs in design process
48(No Transcript)