Algorithm-Based Fault Tolerance Matrix Multiplication - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

Algorithm-Based Fault Tolerance Matrix Multiplication

Description:

Ask a matrix-matrix-multiply (MMM) implementation to compute product ... No error can morph one codeword into another. May correct errors in (dmin-1)/2 spots ... – PowerPoint PPT presentation

Number of Views:96

Avg rating:3.0/5.0

Slides: 36

Provided by: gregBron

Category:

more less

Transcript and Presenter's Notes

Title: Algorithm-Based Fault Tolerance Matrix Multiplication

1
Algorithm-Based Fault ToleranceMatrix
Multiplication

Greg Bronevetsky

2
Problem at Hand

Have matrices A and B
Want to compute their product AB
Ask a matrix-matrix-multiply (MMM) implementation
to compute product
Answer C
Question Is C the correct answer?
How could we know for sure?

3
Algorithm-Based Fault Tolerance

Encode input matrices via error-correcting code
Run regular MMM algorithm on encoded matrices
Encoding invariant under MMM
Naturally outputs encoded matrices
Encoding guarantees
If upto t errors in output, will detect error
If upto cltt errors in output, can decode correct
output matrix

4
Outline
Linear Error Correcting Codes
ABFT Linear Encoding of Matrices
Algorithm-Based Fault Tolerance
5
Error Correcting Codes

Map f ?k ? ?n
k-long data words ? n-long codewords
We use ?0, 1
Code of length n is a sparse subset of ?n
Very few possible words are valid codewords
Rate of code
Amount of information communicated by each
codeword

6
Minimum Distance

Minimum Distance d() Hamming distance
Hamming distance number of spots where words
differ
Measures difficulty of decoding/correcting
corrupted codewords

7
Detection and Correction

Code may detect errors in ?dmin spots
No error can morph one codeword into another
May correct errors in ?(dmin-1)/2 spots
Can still find closest codeword
More details later

Each codeword defines circle around itself of
radius dmin/2
8
Linear Codes

Codewords form linear subspace inside ?n
In rowspace of generator matrix G

a (n7, k3) code
9
Property 1

Linear combination of any codewords is also a
codeword
For any x,y?C, (xy)?C
Codewordconstant is codeword
For any z?C, kz?C
lt0,00gt always a codeword
Proof basic properties of linear spaces

10
Property 2

Minimum distance of linear code
Where
Proof

11
Parity Check Matrix

H dual matrix to G
Contains basis of space orthogonal to Gs row
space
n-k dimentional space
H is (n-k)xn
Space defined as
Note H also defines a linear code

12
Property 3

dminmin of columns of H that can sum to 0
Proof

13
Property 4

Minimum distance of linear code ? n-k1
Proof
Total n dimensions (since codewords are
n-vectors)
Gs rowspace rank k
Thus, Hs columspace rank n-k
Thus, n-k1 columns will be linearly dependent
Add up to 0
By Property 3, this is ? dmin

14
Outline
Linear Error Correcting Codes
ABFT Linear Encoding of Matrices
Algorithm-Based Fault Tolerance
15
Encoding a Matrix

Algorithm-Based Fault Tolerance introduced by
Huang and Abraham in 1984
Encode each row of matrix via extra column
Column entries sums of matrix rows

16
Encoding a Matrix

Encode each column of matrix via extra row
Row entries sums of matrix columns
Full Encoding

17
Detecting Errors

Suppose matrix A is corrupted to matrix Â
entry âi,j is wrong
Can detect errors exact position lti,jgt

18
Correcting Errors

Can correct error using row or col checksum

19
Big Trick Preservation of Encoding

Column-encoded mtx Row-encoded mtx
Fully-encoded mtx
Can check MMM computation by checking encoding of
output
If product matrix has an erroneous entry
Can detect
Can correct

20
Applications

Matrix Multiplication
Given encoded A and B,
Check whether MMM result C (?AB) has valid
encoding
Matrix Factorization
Given a factorization AWZ
Verify correctness by verifying encodings of
factors
Factors row- OR column-encoded
Can only detect, not correct errors

21
Weighted ABFT

Oftentimes need to check row- or column-encoded
matrices
Ex factorization, data integrity check
Can only detect errors in such matrices
Can we also correct?
Yes, by generalizing to weighted checking
rows/columns

22
Weighting

Suppose we have d n-vectors w1wd
Can column-encode matrix A
Lets try out

23
Weighted Error Detection
24
Weighted Error Correction

Weighted encoding Detects and Corrects single
errors
Even for non full-encoding

25
Outline
Linear Error Correcting Codes
ABFT Linear Encoding of Matrices
Algorithm-Based Fault Tolerance
26
Surprise

But this is all just a linear code!
Generator matrix for above scheme

27
Generating Encodings

Given mltai,1, ai,2, , ai,kgt as message word
(or matrix row/column)

28
Surprise??

Not too surprising really
Why else would MMM preserve encoding?
Another possibility
Efficient can be implemented via bit shifts
Room open for using any linear code!

29
Error Detection/Correction in General

To show for linear codes
Can detect ?dmin errors
Can correct ?(dmin-1)/2 errors
Let be original codeword
Let be the corrupted codeword
e error vector

30
Error Detection in General

s called the syndrome vector
Independent of original codeword
Note weight(e) ltdmin since ltdmin errors
Thus
Detection if , then ERROR

31
Error Correction in General

Clearly e is correction vector
corrects error in
Sufficient to prove
weight(e)?(dmin-1)/2 ? H is isomorphism
correction vectors ?
syndrome vectors
i.e. for each correction vector (want to know)
? unique syndrome vector
Thus, possible to correct any error
may not be efficient

32
H is Onto

weight(e) ? (dmin-1)/2 lt dmin
rank(H) n-k ? (dmin-1)/2
Thus, rank(H) ? weight(e) and He ? 0
Not enough 1s in e to sum Hs columns to 0
H maps onto its range
Thus,

33
H is 1-1

Let e1 and e2 be correction vectors, e1 ? e2
Suppose that
weight(e1e2) ? (dmin-1)/2
He1 He2 s
He1-He2 H(e1-e2) s-s 0
And so, (e1-e2) is a codeword
Thus, weight(e1-e2) ? dmin
But weight(e1e2) ? (dmin-1)/2 and so
weight(e1-e2) ?dmin-1
Contradiction! e1 e2

34
Other Encoding Schemes

Linear codes preserved by matrix multiplication
Presumably, fancier codes might be preserved by
fancier computations
Limit
S. Winograd showed in 1962 that any code s.t.
f(x?y) f(x) ? f(y) has rate (k/n) or minimum
weight?0 as k??
How general can we get?
Do good solutions exist for small k?
k64 bits should be good enough

35
Summary