Formal Verification of Shared Memory Systems During their Design

About This Presentation

Title:

Formal Verification of Shared Memory Systems During their Design

Description:

FM and shared-memory system design. Shared-memory systems are complex! Designers need 'safety net' when exploring optimizations formal verification ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 65

Provided by: ganeshgopa

Learn more at: http://formalverification.cs.utah.edu

Category:

more less

Transcript and Presenter's Notes

Title: Formal Verification of Shared Memory Systems During their Design

1
Formal Verification ofShared Memory
SystemsDuring their Design

Ganesh Gopalakrishnan
Department of Computer Science
University of Utah
http//www.cs.utah.edu/ganesh

2
FM and shared-memory system design

Processor speed increasing at 55 per year -
memory speeds at 7
Mismatch exacerbated by shared memory
multiprocessors
Complex protocols employed to hide memory
latencies
Need for formal verification techniques that can
be employed during design

3
Our Project Utah Verifier
4
A Shared Memory Multiprocessor(a shared memory
system)
...
CPU
CPU
Interconnect
...
Memory
Memory
5
Classification Symmetric Multi-Processors (SMP)
CPU
CPU
CPU
Coherent snooping bus
Memory

Potential bugs in complex bus designs
Deadlocks, lack of forward progress
Lack of coherency
Incorrect shared memory consistency model

6
2. Distributed Shared Memory (DSM) systems

SMP node
High-speed network

Problems due to complex DSM protocols
Deadlocks, lack of forward progress,
Incorrect shared memory consistency models

7
Formal Methods for Shared Memory System Design
Verification
Provably-correct Synthesis
Theorem-proving
Finite-state Reachability
Model-checking
Protocol
Low-level concerns (e.g. deadlocks, progress,...)
Higher-level concerns (e.g. shared memory
consistency models)
8
Results of the UV group

New Partial Order reduction algorithm
Realized in verifier called PV
Outperforms SPIN 10 to 1 on most examples
Selective state-caching is available for free
A DSM Protocol synthesis algorithm
Safety of synthesis proved correct using PVS
Derives realistic (hand-quality) DSM protocols
Incorporates a scalable buffer-reservation scheme
Verifying Formal Memory Models

9
Protocol Refinement
10
Motivations

Distributed directory based coherence protocols
difficult to understand and debug
low-level requests / acks / nacks dont reveal
what is being implemented
transient states are introduced and handled in an
ad-hoc way
buffer allocation is not tied to desired
high-level properties (e.g. progress)
verification is tedious

11
Example of problems due to unexpected msgs
Cache Ctrlr
Directory Ctrlr
12
Our approach

Based on synthesis
Transient states introduced automatically
Buffer allocation is tied to desired high-level
properties (e.g. progress
Verification becomes much easier
Synthesized protocols seem efficient

13
Overview of Synthesis Method
Cache Ctrlr
I
E
I
E
Req
(N)ack
Dir Ctrlr
F
E
F
E
14
Model-checking Efficiency
15
An Illustration Migratory Protocol (i)
Process h
r(j)?req
r(o)!inv
r(i)!gr(data)
r(i)?req
F
E
I2
I1
r(o)?LR(data)
r(o)?LR(data)
r(j)!gr(data)
r(o)?ID(data)
Process r(i)
I3
V1
h!LR(data)
evict
h!req
I
V
rw
h?gr(data)
h!ID(data)
h?inv
V2
16
An Illustration Migratory Protocol (ii)
Process h
r(j)?req
r(o)!inv
r(i)!gr(data)
r(i)?req
F
E
I2
I1
r(o)?LR(data)
r(o)?LR(data)
r(j)!gr(data)
r(o)?ID(data)
Process r(i)
I3
V1
h!LR(data)
evict
h!req
I
V
rw
h?gr(data)
h!ID(data)
h?inv
V2
17
A Generic Example
P
Q
R
P?x
R!b
Q!a
Q!c
18
Async Implementation of Example (i)
P
Q
R
R!b
Q!a
R!!b
Q!!a
1 msg buffer location for Ack/Nack
19
Async Implementation of Example (ii)
P
Q
R
R!b
Q!a
Progress Buffer
Q!!c
R!!b
Q!!a
P!!ack
20
Organization of Protocol - per Cache Line
Remote Nodes
Home Node
- Remote nodes (cache ctrlrs) communicate w.
home directory controller only - If Remote
and Home requests cross in medium, . Remote
request treated as Nack by Home . Home request
is dropped by Remote - Pt-to-pt order-preserving
error-free communication
21
General Nature of Communication States
h?m2
T
h!msg
h?m1
(Remote)
r(j)!m2
T
r(i)?m1
(Home)
22
Summary Remote node rules
23
Summary Home node (i)
24
Summary Home node (ii)
25
Status of Work

Correctness of Protocol Synthesis Proved in PVS
Write-invalidate protocol also synthesized
Offers a general synthesis method for protocols
(not necessarily for DSM)
Related work Buckley and Silberschatz, Chandra
et.al., Park and Dill, Gribomont, ...

26
Verifying Conformance toFormal Memory Models
27
FM and shared-memory system design

Shared-memory systems are complex!
Designers need safety net when exploring
optimizations formal verification
We focus on verifying that a (finite-state model
of a) shared memory system provides the required
memory model (mainly Sequential Consistency)
E.g. Verify a Cache Coherence Protocol for SC
Our approach finite-state reachability analysis

28
Importance of Memory Models -- An Example

Petersons algorithm for mutex under a memory
model called TSO
P1 A 1 turn 2 while (B /\ turn2
) ..CS..
P2 B 1 turn 1 while (A /\ turn1
) ..CS..
Must Specify Synchronization Routines and the
Shared Memory Consistency Model(s) under which
they work!
29
Impact on CPU design -- Do Read-Speculation Right!
MEM
..wr(a,2).. wr(b,3)..
bus
CPU1
CPU2
wr(a,2) - Miss rd(b, 0) - Speculate Snoop wr(a) -
Spec OK
wr(b,3) - Miss rd(a, 0) - Speculate Snoop wr(a)
Spec not OK reissue rd(a, 2)
Without reissue, results are inconsistent with SC
30
Basis for our work ARCHTEST (Collier)

Multi-threaded C programs
Used to debug actual multiprocessor machines
unavailable at design-time
Based on the theory of graph-sets
used in our work also
Our CAV98 work adapt Colliers tests for
model-checking
incomplete
This work a complete verification method (sound
too!)

31
What is a shared memory model?
Captured by the set of all executions of a
concurrent program!
Execution 1
Execution 2
SC
TSO
TSO allows more executions than SC (hence
weaker)
32
An Operational Definition of SC and TSO
cpu1
cpu2
cpu1
cpu2
MUX
fifo
fifo
SC
TSO
Memory
Memory
33
How are allowed executions specified?
As constraints on events generated by the
execution!
Constraints are expressed in terms of ordering
rules RO - Read Ordering ROA - RO over the
same address WOS - Write Ordering by Storage POS
- Program Ordering by Storage CMP -
Computational Ordering WA - Write
Atomicity Ordering rules specify constrains on
EVENTS
Memory Model Collier Cocktail! - e.g. (CMP,
RO, WOS)
34
Definition of POS (and also RO and WOS)
PO includes RR, RW, WR, and WW orders
35
Definition of CMP (defined per CPU per address)
CPU_j
STORE_j
36
Assumptions in defining CMP... and in the rest
of this talk

We are interested in more than SC
We would like to set-up a general framework for
defining and verifying memory models
Assume that RO is obeyed by every memory model of
interest to us
We Assume
Projectability,
Data Independence
Unambiguous executions

37
Assume Projectability, Data Independence,and
consider only Unambiguous executions
Projectible
Data independent
Same datum never written twice (so we can
uniquely trace source of data!)
Unambiguous
38
Definition of CMP for CPU i for address d
CMP includes ROA also is an implied edge
R1(d,T)
CPU_j
STORE_j
W4(d,2)
R2(d,2)
R1(d,T) R2(d,2)
W4(d,2)
ROA
W2(d,4)
R3(d,2)
R3(d,2)
W3(d,5)
W4(d,2)
ROA
W2(d,4)
W3(d,5)
R4(d,5)
W2(d,4)
39
Lets study (CMP, RO, WOS) - a useful drosophila!
Initially a 0 R1(a,1) W2(a,1)
Even this execution is possible under
(CMP,RO,WOS)
..no writes to a..
CPU_j
STORE_j
40
An execution satisfying (CMP, RO, WOS)
Execution satisfies (CMP, RO, WOS) as there
are no cycles created by adding their arcs!
41
An execution that violates (CMP,RO,WOS)
rd(A,3) rd(A,2)
wr(A,2) wr(A,3)
wr(A,2)
rd(A,2)
WOS
ROA
wr(A,3)
rd(A,3)
42
Verification Techniques for Memory Models

Consider all possible executions
involving all possible addresses A
and all possible data D
for all possible concurrent programs P
Introduce the arcs due ordering rules
Look for cycles
Impractical!
So, look for ways to limit A, D, and P

43
Our approach

Assume address projectability (or
projectability)
and data independence
Prove limited address theorems (helps limit A)
Characterize all violating executions E_i
over A
Come up with finite-state abstractions for each
E_i
using data independence to limit D, and
using non-determinism
to arrive at a finite number of test automata
aut_i
Explore state-space of each aut_i
memory-system
Look for entry into error-states

44
Use of data abstraction non-determinism
45
Limited Address Theorem for (CMP,RO,WOS)
Two addresses suffice!
46
PowerPoint proof of the limited address theorem
for (CMP,RO,WOS)
RO
RO
R

Involves two addrs!
47
Exhaustive characterization of violations of
(CMP, RO, WOS) over one address, a
48
Test automata for 1-address (CMP,RO,WOS)
violations
Error states E1, E2
49
Exhaustive characterization of two addresses
violations of (CMP, RO, WOS)
50
Test automata for 2-address (CMP,RO,WOS)
violations
Error states E1, E2
51
Limited Address Theorem for (CMP,POS)

2 addresses suffice

52
1-address (CMP,POS) verification
Error states E1, E2
53
2-address (CMP,POS) verification
Error states E1, E2
54
SC (CMP, POS, WA)
55
Definition of WA - by showing what is not WA!
56
The limited-address theorem for SC (CMP, POS,
WA)

In an N-processor system, N addresses are
sufficient
IF concurrent program P using M gt N addresses
shows a violation
THEN there exists a subset A of N addresses
such that P projected onto A yields concurrent
program P that also shows a
violation.
PowerPoint proof to follow
and necessary

57
PowerPoint proof of the limited address theorem
for SC (CMP, POS, WA)
- Suppose C is the cycle containing the smallest
number of events that involves more than N
ltpos edges. - Then two ltpos edges connect events
generated by the same processor, say g, and
observed by a and b. - If ab, we can
eliminate one of these POS edges - if a ltgt b,
consider g ltgt a, and possibly equal to b. - a0
and a1 are writes. Find corresp events in b.
b0
One linearization
wa
a0
b2
a0
b2
Pos(g)
Pos(g)
Pos(g)
Pos(g)
a1
b3
a1
b3
58
All N-address (CMP, POS, WA) violations
(2)
Two processors see two writes w1 and w2 in
different orders
(CMP, POS) violations
59
Complete test for SC for 1-address programs
Error states - lt P14, Q41 gt - P41a, P41b x
Q14a, Q14b
60
Complete test for SC for 2-address programs
Error states - lt P14, Q41 gt - P41a, P41b x
Q14a, Q14b
61
Case Studies

Runway/PA system model
Bus based design
An aggressive split transaction protocol
Out-of-order (speculative) completion of
transactions on Runway for high-performance
not modeled in current experiments
In-order completion of instructions in PA for
sequential consistency

62
SC verification of the HP/Runway model
63
Conclusions

Promising
Violations caught very quickly
Need to try larger examples
Currently studying weaker memory models
Future work
Combatting state-explosion
Symmetries
Better automata
Integrate into design cycle of CPUs
Support performance optimizations
and verification regressions

64
Related Work

Graf (CAV94)
for more than SC (hence unsound for SC)
properties depend on design
Alur, McMillan, Peled (LICS96)
undecidable if data can be compared
Nalumasu, Ghughal, Mokkedem, Gopalakrishnan
(CAV98)
incomplete
Henzinger, Qadeer, Rajamani (CAV99)
needs invariants
invariants depend on design
assumes address-symmetry
Collier (80s)
not available at design-time

Write a Comment

User Comments (0)

About PowerShow.com

Formal Verification of Shared Memory Systems During their Design - PowerPoint PPT Presentation

Formal Verification of Shared Memory Systems During their Design

FM and shared-memory system design. Shared-memory systems are complex! Designers need 'safety net' when exploring optimizations formal verification ... – PowerPoint PPT presentation