Formal Verification of Shared Memory Systems During their Design - PowerPoint PPT Presentation

About This Presentation
Title:

Formal Verification of Shared Memory Systems During their Design

Description:

FM and shared-memory system design. Shared-memory systems are complex! Designers need 'safety net' when exploring optimizations formal verification ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 65
Provided by: ganeshgopa
Category:

less

Transcript and Presenter's Notes

Title: Formal Verification of Shared Memory Systems During their Design


1
Formal Verification ofShared Memory
SystemsDuring their Design
  • Ganesh Gopalakrishnan
  • Department of Computer Science
  • University of Utah
  • http//www.cs.utah.edu/ganesh

2
FM and shared-memory system design
  • Processor speed increasing at 55 per year -
    memory speeds at 7
  • Mismatch exacerbated by shared memory
    multiprocessors
  • Complex protocols employed to hide memory
    latencies
  • Need for formal verification techniques that can
    be employed during design

3
Our Project Utah Verifier
4
A Shared Memory Multiprocessor(a shared memory
system)
...
CPU
CPU
Interconnect
...
Memory
Memory
5
Classification Symmetric Multi-Processors (SMP)
CPU
CPU
CPU
Coherent snooping bus
Memory
  • Potential bugs in complex bus designs
  • Deadlocks, lack of forward progress
  • Lack of coherency
  • Incorrect shared memory consistency model

6
2. Distributed Shared Memory (DSM) systems

SMP node
High-speed network
  • Problems due to complex DSM protocols
  • Deadlocks, lack of forward progress,
  • Incorrect shared memory consistency models

7
Formal Methods for Shared Memory System Design
Verification
Provably-correct Synthesis
Theorem-proving
Finite-state Reachability
Model-checking
Protocol
Low-level concerns (e.g. deadlocks, progress,...)
Higher-level concerns (e.g. shared memory
consistency models)
8
Results of the UV group
  • New Partial Order reduction algorithm
  • Realized in verifier called PV
  • Outperforms SPIN 10 to 1 on most examples
  • Selective state-caching is available for free
  • A DSM Protocol synthesis algorithm
  • Safety of synthesis proved correct using PVS
  • Derives realistic (hand-quality) DSM protocols
  • Incorporates a scalable buffer-reservation scheme
  • Verifying Formal Memory Models

9
Protocol Refinement
10
Motivations
  • Distributed directory based coherence protocols
    difficult to understand and debug
  • low-level requests / acks / nacks dont reveal
    what is being implemented
  • transient states are introduced and handled in an
    ad-hoc way
  • buffer allocation is not tied to desired
    high-level properties (e.g. progress)
  • verification is tedious

11
Example of problems due to unexpected msgs
Cache Ctrlr
Directory Ctrlr
12
Our approach
  • Based on synthesis
  • Transient states introduced automatically
  • Buffer allocation is tied to desired high-level
    properties (e.g. progress
  • Verification becomes much easier
  • Synthesized protocols seem efficient

13
Overview of Synthesis Method
Cache Ctrlr
I
E
I
E
Req
(N)ack
Dir Ctrlr
F
E
F
E
14
Model-checking Efficiency
15
An Illustration Migratory Protocol (i)
Process h
r(j)?req
r(o)!inv
r(i)!gr(data)
r(i)?req
F
E
I2
I1
r(o)?LR(data)
r(o)?LR(data)
r(j)!gr(data)
r(o)?ID(data)
Process r(i)
I3
V1
h!LR(data)
evict
h!req
I
V
rw
h?gr(data)
h!ID(data)
h?inv
V2
16
An Illustration Migratory Protocol (ii)
Process h
r(j)?req
r(o)!inv
r(i)!gr(data)
r(i)?req
F
E
I2
I1
r(o)?LR(data)
r(o)?LR(data)
r(j)!gr(data)
r(o)?ID(data)
Process r(i)
I3
V1
h!LR(data)
evict
h!req
I
V
rw
h?gr(data)
h!ID(data)
h?inv
V2
17
A Generic Example
P
Q
R
P?x
R!b
Q!a
Q!c
18
Async Implementation of Example (i)
P
Q
R
R!b
Q!a
R!!b
Q!!a
1 msg buffer location for Ack/Nack
19
Async Implementation of Example (ii)
P
Q
R
R!b
Q!a
Progress Buffer
Q!!c
R!!b
Q!!a
P!!ack
20
Organization of Protocol - per Cache Line
Remote Nodes
Home Node
- Remote nodes (cache ctrlrs) communicate w.
home directory controller only - If Remote
and Home requests cross in medium, . Remote
request treated as Nack by Home . Home request
is dropped by Remote - Pt-to-pt order-preserving
error-free communication
21
General Nature of Communication States
h?m2
T
h!msg
h?m1
(Remote)
r(j)!m2
T
r(i)?m1
(Home)
22
Summary Remote node rules
23
Summary Home node (i)
24
Summary Home node (ii)
25
Status of Work
  • Correctness of Protocol Synthesis Proved in PVS
  • Write-invalidate protocol also synthesized
  • Offers a general synthesis method for protocols
    (not necessarily for DSM)
  • Related work Buckley and Silberschatz, Chandra
    et.al., Park and Dill, Gribomont, ...

26
Verifying Conformance toFormal Memory Models
27
FM and shared-memory system design
  • Shared-memory systems are complex!
  • Designers need safety net when exploring
    optimizations formal verification
  • We focus on verifying that a (finite-state model
    of a) shared memory system provides the required
    memory model (mainly Sequential Consistency)
  • E.g. Verify a Cache Coherence Protocol for SC
  • Our approach finite-state reachability analysis

28
Importance of Memory Models -- An Example

Petersons algorithm for mutex under a memory
model called TSO
P1 A 1 turn 2 while (B /\ turn2
) ..CS..
P2 B 1 turn 1 while (A /\ turn1
) ..CS..
Must Specify Synchronization Routines and the
Shared Memory Consistency Model(s) under which
they work!
29
Impact on CPU design -- Do Read-Speculation Right!
MEM
..wr(a,2).. wr(b,3)..
bus
CPU1
CPU2
wr(a,2) - Miss rd(b, 0) - Speculate Snoop wr(a) -
Spec OK
wr(b,3) - Miss rd(a, 0) - Speculate Snoop wr(a)
Spec not OK reissue rd(a, 2)
Without reissue, results are inconsistent with SC
30
Basis for our work ARCHTEST (Collier)
  • Multi-threaded C programs
  • Used to debug actual multiprocessor machines
  • unavailable at design-time
  • Based on the theory of graph-sets
  • used in our work also
  • Our CAV98 work adapt Colliers tests for
    model-checking
  • incomplete
  • This work a complete verification method (sound
    too!)

31
What is a shared memory model?
Captured by the set of all executions of a
concurrent program!
Execution 1
Execution 2
SC
TSO
TSO allows more executions than SC (hence
weaker)
32
An Operational Definition of SC and TSO
cpu1
cpu2
cpu1
cpu2
MUX
fifo
fifo
SC
TSO
Memory
Memory
33
How are allowed executions specified?
As constraints on events generated by the
execution!
Constraints are expressed in terms of ordering
rules RO - Read Ordering ROA - RO over the
same address WOS - Write Ordering by Storage POS
- Program Ordering by Storage CMP -
Computational Ordering WA - Write
Atomicity Ordering rules specify constrains on
EVENTS
Memory Model Collier Cocktail! - e.g. (CMP,
RO, WOS)
34
Definition of POS (and also RO and WOS)
PO includes RR, RW, WR, and WW orders
35
Definition of CMP (defined per CPU per address)
CPU_j
STORE_j
36
Assumptions in defining CMP... and in the rest
of this talk
  • We are interested in more than SC
  • We would like to set-up a general framework for
    defining and verifying memory models
  • Assume that RO is obeyed by every memory model of
    interest to us
  • We Assume
  • Projectability,
  • Data Independence
  • Unambiguous executions

37
Assume Projectability, Data Independence,and
consider only Unambiguous executions
Projectible
Data independent
Same datum never written twice (so we can
uniquely trace source of data!)
Unambiguous
38
Definition of CMP for CPU i for address d
CMP includes ROA also is an implied edge
R1(d,T)
CPU_j
STORE_j
W4(d,2)
R2(d,2)
R1(d,T) R2(d,2)
W4(d,2)
ROA
W2(d,4)
R3(d,2)
R3(d,2)
W3(d,5)
W4(d,2)
ROA
W2(d,4)
W3(d,5)
R4(d,5)
W2(d,4)
39
Lets study (CMP, RO, WOS) - a useful drosophila!
Initially a 0 R1(a,1) W2(a,1)
Even this execution is possible under
(CMP,RO,WOS)
..no writes to a..
CPU_j
STORE_j
40
An execution satisfying (CMP, RO, WOS)
Execution satisfies (CMP, RO, WOS) as there
are no cycles created by adding their arcs!
41
An execution that violates (CMP,RO,WOS)
rd(A,3) rd(A,2)
wr(A,2) wr(A,3)
wr(A,2)
rd(A,2)
WOS
ROA
wr(A,3)
rd(A,3)
42
Verification Techniques for Memory Models
  • Consider all possible executions
  • involving all possible addresses A
  • and all possible data D
  • for all possible concurrent programs P
  • Introduce the arcs due ordering rules
  • Look for cycles
  • Impractical!
  • So, look for ways to limit A, D, and P

43
Our approach
  • Assume address projectability (or
    projectability)
  • and data independence
  • Prove limited address theorems (helps limit A)
  • Characterize all violating executions E_i
    over A
  • Come up with finite-state abstractions for each
    E_i
  • using data independence to limit D, and
  • using non-determinism
  • to arrive at a finite number of test automata
    aut_i
  • Explore state-space of each aut_i
    memory-system
  • Look for entry into error-states

44
Use of data abstraction non-determinism
45
Limited Address Theorem for (CMP,RO,WOS)
Two addresses suffice!
46
PowerPoint proof of the limited address theorem
for (CMP,RO,WOS)
RO
RO
R

Involves two addrs!
47
Exhaustive characterization of violations of
(CMP, RO, WOS) over one address, a
48
Test automata for 1-address (CMP,RO,WOS)
violations
Error states E1, E2
49
Exhaustive characterization of two addresses
violations of (CMP, RO, WOS)
50
Test automata for 2-address (CMP,RO,WOS)
violations
Error states E1, E2
51
Limited Address Theorem for (CMP,POS)
  • 2 addresses suffice

52
1-address (CMP,POS) verification
Error states E1, E2
53
2-address (CMP,POS) verification
Error states E1, E2
54
SC (CMP, POS, WA)
55
Definition of WA - by showing what is not WA!
56
The limited-address theorem for SC (CMP, POS,
WA)
  • In an N-processor system, N addresses are
  • sufficient
  • IF concurrent program P using M gt N addresses
    shows a violation
  • THEN there exists a subset A of N addresses
  • such that P projected onto A yields concurrent
    program P that also shows a
    violation.
  • PowerPoint proof to follow
  • and necessary

57
PowerPoint proof of the limited address theorem
for SC (CMP, POS, WA)
- Suppose C is the cycle containing the smallest
number of events that involves more than N
ltpos edges. - Then two ltpos edges connect events
generated by the same processor, say g, and
observed by a and b. - If ab, we can
eliminate one of these POS edges - if a ltgt b,
consider g ltgt a, and possibly equal to b. - a0
and a1 are writes. Find corresp events in b.
b0
One linearization
wa
a0
b2
a0
b2
Pos(g)
Pos(g)
Pos(g)
Pos(g)
a1
b3
a1
b3
58
All N-address (CMP, POS, WA) violations
(2)
Two processors see two writes w1 and w2 in
different orders
(CMP, POS) violations
59
Complete test for SC for 1-address programs
Error states - lt P14, Q41 gt - P41a, P41b x
Q14a, Q14b
60
Complete test for SC for 2-address programs
Error states - lt P14, Q41 gt - P41a, P41b x
Q14a, Q14b
61
Case Studies
  • Runway/PA system model
  • Bus based design
  • An aggressive split transaction protocol
  • Out-of-order (speculative) completion of
    transactions on Runway for high-performance
  • not modeled in current experiments
  • In-order completion of instructions in PA for
    sequential consistency

62
SC verification of the HP/Runway model
63
Conclusions
  • Promising
  • Violations caught very quickly
  • Need to try larger examples
  • Currently studying weaker memory models
  • Future work
  • Combatting state-explosion
  • Symmetries
  • Better automata
  • Integrate into design cycle of CPUs
  • Support performance optimizations
  • and verification regressions

64
Related Work
  • Graf (CAV94)
  • for more than SC (hence unsound for SC)
  • properties depend on design
  • Alur, McMillan, Peled (LICS96)
  • undecidable if data can be compared
  • Nalumasu, Ghughal, Mokkedem, Gopalakrishnan
    (CAV98)
  • incomplete
  • Henzinger, Qadeer, Rajamani (CAV99)
  • needs invariants
  • invariants depend on design
  • assumes address-symmetry
  • Collier (80s)
  • not available at design-time
Write a Comment
User Comments (0)
About PowerShow.com