Verification of cache-coherence protocols with TLA - PowerPoint PPT Presentation

1 / 48

About This Presentation

Title:

Verification of cache-coherence protocols with TLA

Description:

Title: The Standard Theory of Wildfire Author: CRL Last modified by: Mark R. Tuttle Created Date: 6/17/1995 11:31:02 PM Document presentation format – PowerPoint PPT presentation

Number of Views:164

Avg rating:3.0/5.0

Slides: 49

Provided by: crl86

Category:

more less

Transcript and Presenter's Notes

Title: Verification of cache-coherence protocols with TLA

1
Verification of cache-coherence protocols with
TLA

Homayoon Akhiani, Damien Doligez, Paul Harter,
Leslie Lamport, Joshua Scheid, Mark Tuttle, Yuan
Yu
Compaq Computer Corporation

2
TLA

A formal specification language based on set
theory, first-order logic, temporal logic
Engineers find reading easy, writing not too hard

CacheUnmodified(adr) \/ SharedMode(adr)
\/ /\
ExclusiveMode(adr)
/\ DirtyBitSet(adr)
Cache Cache EXCEPT !adr.state Invalid
3
Used TLA to demonstrate formal methods to
engineering

Analyzed cache-coherence protocols for
EV6 Alpha 21264 processor
EV7 Alpha 21364 processor
Built TLC, a model-checker for TLA
Analyzed proposals for industry standards
PCI-X,

4
EV6 cache coherence
processors
memory
directory
P1
P2
P3
x
x
copies
owner
5
P4

To get x, go to xs directory to see who owns x.

5
Shared read, data in memory
S
owner
adr
copies
S
R
x
S,S,S
none
S
Rd(x)
6
Shared read, remote owner
O
S
owner
adr
copies
S
R
x
S,S,S
O
S
Rd (x)
7
Exclusive read, data in memory
S
owner
adr
copies
S
R
x
S,S,S
none
S
RdEx(x)
8
Exclusive read, remote owner
O
S
owner
adr
copies
S
R
x
S,S,S
O
S
RdEx(x)
9
No InvalAcks
Inval
RdEx(x)
R
Dir
S
InvalAck
NO!
Fewer messages sent, and R not blocked waiting
for InvalAck. Now correctness depends on network
message ordering.
10
No dirty write backs required
O
WriteBack
FwdRdEx(x)
NO!
Data
Dir
R
RdEx(x)
Fewer messages sent. Now correctness depends on
the owner always holding the data.
11
Chains of requests
R1
R2
R3
Dir
12
Memory barriers
All memory ordering imposed by memory barriers.
read flag
MB
read data
How do we know when this ordering has been
determined? The answer is highly optimized.
13
Separate commit/data responses
O
Data
FwdEx(x)
Commit
Dir
R
Rd(x)
MB passed when all outstanding commits are
received. Commits generated as early as possible!
14
Significant speed ups
R
Data can be returned faster.
Inval(y)Inval(z)Commit
Data
MB can be passed faster.
R
read flag
MB
Data
commit
read data
Dir
But now verification is much harder.
15
Hierarchical network
global switch
memory
directory
local switches
processors
At the home node, always satisfy requests locally
if possible...
16
Deadlock the deadly embrace
home x
home y
Deadlock FwdRds are stalled waiting for data to
arrive.
17
Shadow mode
FwdRd(x)
FwdRd is a shadow starter (when the reader is on
the home node)
Rd(x)
Subsequent messages are shadowed in shadow
mode (bounced off the global switch)
18
Shadow mode solves deadlock
FwdRd(x)
FwdRd(y)
FwdRd(x)
FwdRd(y)
home x
home y
Data travels in a separate channel other
messages dont block data. Deadlock gone.
19
This is not your fatherscache coherence
protocol!

Protocol is highly optimized
No InvalAcks or NoAcks, no Dirty Write Backs
Long chains of data forwarding
Separate commit/data messages
Aggressive early commit generation
Shadow mode
Protocol was the largest to be analyzed with
formal methods (to our knowledge as of 1997).

20
EV6 cache coherence in three easy
stepstwo-man years
Model Alpha memory model.(200 lines)
Prove implementation (550 lines, 2 months,
informal)
Model abstract protocol.(500 lines)
Prove implementation (5500 lines, 4 months,
incomplete)
Model complete protocol.(2000 lines, 3 months)
21
Step 1 Alpha memory model

We specified the Alpha memory memory model
The official specification is an informal
description of the allowed sequences of reads and
writes.
We needed a precise, state-based specification.
We specified a slightly simplified memory model.
(whole cache line access, common point of
synchronization)
Compare the specifications
Official, English specification 12 pages
Logical, precise specification 200 lines

22
Key definition read/write ordering
Before order for an execution orders reads/writes
and determines what values are returned by
reads. GoodExecutionOrder defines good Before
orders, namely the orders allowed by the memory
model.
23
State machine actions

ReceiveRequest(proc, req) Receive a request
ChooseNewData(proc, idx)
Choose the return value for a request
Respond(proc, idx) Return the value to a request
ExtendBefore Expand the Before
relation
Actions preserve GoodExecutionOrder.

24
GoodExecutionOrder

This is the hard part --- but look how short it
is!

GoodExecutionOrder LET some
definitions deleted IN /\ (
)
( Before is a partial order.
) (
) /\
Before \subseteq ReqId \X ReqId /\ \A r1,
r2 \in ReqId IsBefore(r1, r2) gt IsBefore(r2,
r1) /\ \A r1, r2, r3 \in ReqId
IsBefore(r1, r2) /\ IsBefore(r2, r3) gt
IsBefore(r1, r3) /\ (
) (
SourceOrder implies the Before order.
) (
) \A r1,
r2 \in ReqId SourceOrder(r1, r2) gt
IsBefore(r1, r2) /\ (
) (
RequestOrder implies the Before order.
) (
) \A r1,
r2 \in ReqId RequestOrder(r1, r2) gt
IsBefore(r1, r2)
25
/\ (
) ( Writes and successful SCs to
the same location that ) ( have issued a
response are totally ordered. )
(
) \A r1, r2 \in ReqId /\
ReqIdQr1.req.type \in "Wr", "SC" /\
ReqIdQr1.req.newData "Failed" /\
ReqIdQr1.req.responded /\
ReqIdQr2.req.type \in "Wr", "SC" /\
ReqIdQr2.req.newData "Failed" /\
ReqIdQr2.req.responded /\
ReqIdQr1.req.adr ReqIdQr2.req.adr gt
IsBefore(r1, r2) \/ IsBefore(r2, r1)
26
/\ (
) ( LL/SC Axiom For
each successful SC, there is a matching LL and
) ( there is no write to the same address
from a different ) ( processor
between the LL and SC in the Before order.
) (
) \A r2 \in
ReqId /\ ReqIdQr2.req.type "SC"
/\ ReqIdQr2.newData \notin Failed,
NotChosen gt \E r1 \in ReqId
/\ LLSCPair(r1, r2) /\ \A r \in
ReqId /\ \/ ReqIdQr.req.type
"Wr" \/ /\
ReqIdQr.req.type "SC"
/\ ReqIdQr.newData \notin NotChosen, Failed
/\ r1 r21
/\ ReqIdQr2.req.adr ReqIdQr.req.adr
gt IsBefore(r1, r) \/ IsBefore(r, r2)
27
/\ (
) ( Value Axiom A read
reads from the preceding write in the ) (
Before order.
) (
) \A r1, r2 \in
ReqId /\ ReqIdQr2.source NoSource
/\ ReqIdQr1.req.type "Wr" /\
ReqIdQr1.req.adr ReqIdQr2.req.adr gt
IF ReqIdQr2.source FromInitMem
THEN IsBefore(r1, r2) ELSE \/
IsBefore(ReqIdQr2.source, r1)
\/ IsBefore(r1, r2)
28
Step 2 Model abstract protocol

protocol abstract protocol implementation
junk
Surprisingly,
abstract protocols correctness was far from
obvious
we discovered a bug in the memory model
Proved hardest part of correctness
35-line invariant based on 300 lines of
definitions
550-line proof, cases nested 10 levels deep

29
Step 3 Model complete protocol

Protocol 9 man-months, 1900 lines of TLA
Partial proof 7 man-months, 1000-line (partial)
invariant

30
Obstacle multiple descriptions

English documents 10 documents, 2-inch stack
Lisp code crucial to understanding some details
None compact, none mathematically tractable
Solution write our own model
We used TLA

31
Obstacle algorithm complexity

ChangeToDirty DummyRdVic FailedChangeToDirty
Fetch InvalToDirty InvalToDirtyVic Rd RdMod RdVic
RdVicMod QV_Fetch QV_Rd QV_RdMod WrVic
ChangeToDirtyFailure ChangeToDirtySuccess
FetchFillMarker FillMarkerFillMarkerMod
ForwardFetch ForwardFetchWithFetchFillMarker
ForwardRd ForwardRdMod ForwardRdWithFillMarker
ForwardRdModWithFillMarkerMod InvalAck
InvalToDirtySuccess Invalidate LoopComsig
LoopComsigWithInvalAck LoopComsigWithShadowClear
LoopComsigWithShadowInvalAndShadowClear
ShadowChangeToDirtySuccess ShadowForwardFetch
ShadowForwardRd ShadowForwardRdMod
ShadowInvalToDirtySuccess ShadowInvalidate
ShadowShortFillMod ShadowSnap ShortFetchFill
ShortFill ShortFillMod VictimAck FetchFill Fill
FillMod VCFetchFill VCFill VCFillMod

32
Solution Quarks

Ack
ChangeToDirty
Clear
Comsig
Fill
ForwardedGet
GetValue

InvalidToDirty
QuadInvalidate
ReleaseMAF
ReleaseVDB
SetCacheLineState
Victimize
Write

Quarks combine to form messages.
33
Protocol example
If a processor receives a Fill quark carrying
cacheable data, then how is the cache is updated?

ProcFieldsMessage(proc, msg)
/\ ...
/\ Cache' CASE ...
("Fill" \in msg) /\
(subtype("Fill") "Fetch")
-gt Cache EXCEPT
!proc, cacheIndex.state
IF subtype("Fill") "Mod"
THEN "ExclusiveDirty"
ELSE "Clean",
!proc, cacheIndex.tag
AddressToTag(msg.adr),
!proc, cacheIndex.data
msg.data

34
The low-level invariant

Define protocol in terms of quarks.
Define an invariant describing all reachable
states.

We considered only the most difficult parts
messages
messages
cache
dtag
directory
on quad
off quad
35
Dir - Dtag Invariant
DirDTagInvariant \A adr \in MemBlockAddress,
proc \in Processor a.\/ ( local address )
... b.\/ ( nonlocal address )
1./\ ProcToQuad(proc) AddressToQuad(adr)

2./\ a.\/ ( proc is the owner of adr )
1./\ Diradr.owner proc
b.\/ ( proc is not the owner of adr
) ...

2./\ a.\/ ( dtag is dirty
) 1./\ DTagState(adr,
proc) Dirty...
b.\/ ( dtag is invalid ) ...
c.\/ ( dtag is clean ) ...
2./\ Proj(HomeToArbQ)
FG QFI QI AckWrite QI AGV(mod,1) FG
AckCTD(Success) FG
DTagCacheInvariant ... Mother
DirDTagInvariant /\ DTagCacheInvariant /\ ...
36
DTag-Cache Invariance

ASSUME /\ Mother /\ Wildfire
/\ DTagCacheInvariant(proc,adr)
PROVE DTagCacheInvariant(proc,adr)'
lt1gt1. CASE a ( DTagState(proc, adr) "Invalid"
)
lt1gt2. CASE b ( DTagState(proc, adr) "Invalid"
)
lt1gt3. QED

37
DTag-Cache Invariance

ASSUME /\ Mother /\ Wildfire
/\ DTagCacheInvariant(proc,adr)
PROVE DTagCacheInvariant(proc,adr)'
lt1gt1. CASE a ( DTagState(proc, adr) "Invalid"
)
lt2gt1. CASE a2a ( AddressCache(proc, adr).state'
"Invalid" )
lt2gt2. CASE a2b ( AddressCache(proc, adr).state'
"Invalid" )
lt2gt3. QED
lt1gt2. CASE b ( DTagState(proc, adr) "Invalid"
)
lt1gt3. QED

38
DTag-Cache Invariance

ASSUME /\ Mother /\ Wildfire
/\ DTagCacheInvariant(proc,adr)
PROVE DTagCacheInvariant(proc,adr)'
lt1gt1. CASE a ( 1./\ DTagState(proc, adr)
"Invalid" )
lt2gt1. CASE a2a ( 1. AddressCache(proc,
adr).state' "Invalid" )
...
lt14gt1. CASE doing something at the
proc
Pf ....
lt14gt2. CASE doing something at the
arb
lt14gt3. QED
...
lt2gt2. CASE a2b ( 1. AddressCache(proc,
adr).state' "Invalid" )
lt2gt3. QED
lt1gt2. CASE b ( 1./\ DTagState(proc, adr)
"Invalid" )
lt1gt3. QED

39
The low-level refinement

For the abstract protocol, we defined the Before
ordering for the protocol.
For the low-level protocol, we defined an
invariant describing the reachable states.
Now use the invariant to prove that the Before
ordering is the actual low-level ordering.
This refinement proof is undone.

40
One bug found

Quite unexpected to find only one bug!
Fix was an easy bookkeeping modification.
Demonstrating the bug requires
four processors
two memory locations
fifteen messages
Hand proof appears essential to finding this bug
extensive simulation did not find it
state space too large for exhaustive model
checking

41
Wildfire challenge problem
http//www.research.digital.com/SRC/personal/ lamp
ort/tla/wildfire-challenge.html

We give you TLA models of
the Alpha memory model
the abstract protocol with one bug inserted
and challenge you to find the bug.

Incredibly, Georges Gonthier found it by
inspection (plus a memory model mistake)!

42
TLC model checker
State machine in rich subset of TLA (Initial,
NextState)
Configuration file making state machine finite
Minimal state trace from an initial state to a
bad state
Invariant
43
TLC implementation

Require no changes to TLA specifications
use the richness of TLA, no primitive language
use configuration files instead
Interpret specifications, dont compile them
better user interaction possible
Use explicit state representation, not BDDs
BDD encoding of TLA formulas difficult
use canonical state representation
fingerprinting
use efficient disk-based state set and queue
implem.

44
TLC status

20,000 lines of Java
Available to alpha testers under nondisclosure
Performance is good, sometimes slow threaded and
distributed implementations now exist.
Liveness checking/livelock detection coming
Coverage analysis is desired What does lack of
an error mean a correct spec or a buggy spec?

45
EV7 cache coherence

First intense application of TLC model checker
First TLA specification written by engineers
Specification is 1800 lines
Specification accepted by TLC w/o modification
State space reduced 50 by adding 15 lines to
remove a lot of symmetry in state space

46
Results

73 bugs found (90 found by TLC)
37 minor typos, type errors, etc
12 bugs wrong message/wrong state
14 missing cases
7 spurious cases (dead code)
3 miscellaneous (1 TLA, 1 MC, 1 spec design)
War story Find bug B by hand find bug B like B
by simulation find bug B in bug-fix for B
find ??? written in original documentation!

47
Lessons learned

Learning TLA is not a major task, but writing
good specifications still requires experience
EV6 verification was
humbling only one error actually found
encouraging the basic method works as expected
EV7 verification was very satisfying
TLA specifications can be written by engineers
TLC can handle industrial-sized specifications
Formal specification belongs in design process

48
(No Transcript)

Write a Comment

User Comments (0)