Algorithm-Based Fault Tolerance Matrix Multiplication - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Algorithm-Based Fault Tolerance Matrix Multiplication

Description:

Ask a matrix-matrix-multiply (MMM) implementation to compute product ... No error can morph one codeword into another. May correct errors in (dmin-1)/2 spots ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 36
Provided by: gregBron
Category:

less

Transcript and Presenter's Notes

Title: Algorithm-Based Fault Tolerance Matrix Multiplication


1
Algorithm-Based Fault ToleranceMatrix
Multiplication
  • Greg Bronevetsky

2
Problem at Hand
  • Have matrices A and B
  • Want to compute their product AB
  • Ask a matrix-matrix-multiply (MMM) implementation
    to compute product
  • Answer C
  • Question Is C the correct answer?
    How could we know for sure?

3
Algorithm-Based Fault Tolerance
  • Encode input matrices via error-correcting code
  • Run regular MMM algorithm on encoded matrices
  • Encoding invariant under MMM
  • Naturally outputs encoded matrices
  • Encoding guarantees
  • If upto t errors in output, will detect error
  • If upto cltt errors in output, can decode correct
    output matrix

4
Outline
Linear Error Correcting Codes
ABFT Linear Encoding of Matrices
Algorithm-Based Fault Tolerance
5
Error Correcting Codes
  • Map f ?k ? ?n
  • k-long data words ? n-long codewords
  • We use ?0, 1
  • Code of length n is a sparse subset of ?n
  • Very few possible words are valid codewords
  • Rate of code
  • Amount of information communicated by each
    codeword

6
Minimum Distance
  • Minimum Distance d() Hamming distance
  • Hamming distance number of spots where words
    differ
  • Measures difficulty of decoding/correcting
    corrupted codewords

7
Detection and Correction
  • Code may detect errors in ?dmin spots
  • No error can morph one codeword into another
  • May correct errors in ?(dmin-1)/2 spots
  • Can still find closest codeword
  • More details later

Each codeword defines circle around itself of
radius dmin/2
8
Linear Codes
  • Codewords form linear subspace inside ?n
  • In rowspace of generator matrix G

a (n7, k3) code
9
Property 1
  • Linear combination of any codewords is also a
    codeword
  • For any x,y?C, (xy)?C
  • Codewordconstant is codeword
  • For any z?C, kz?C
  • lt0,00gt always a codeword
  • Proof basic properties of linear spaces

10
Property 2
  • Minimum distance of linear code
  • Where
  • Proof

11
Parity Check Matrix
  • H dual matrix to G
  • Contains basis of space orthogonal to Gs row
    space
  • n-k dimentional space
  • H is (n-k)xn
  • Space defined as
  • Note H also defines a linear code

12
Property 3
  • dminmin of columns of H that can sum to 0
  • Proof

13
Property 4
  • Minimum distance of linear code ? n-k1
  • Proof
  • Total n dimensions (since codewords are
    n-vectors)
  • Gs rowspace rank k
  • Thus, Hs columspace rank n-k
  • Thus, n-k1 columns will be linearly dependent
  • Add up to 0
  • By Property 3, this is ? dmin

14
Outline
Linear Error Correcting Codes
ABFT Linear Encoding of Matrices
Algorithm-Based Fault Tolerance
15
Encoding a Matrix
  • Algorithm-Based Fault Tolerance introduced by
    Huang and Abraham in 1984
  • Encode each row of matrix via extra column
  • Column entries sums of matrix rows

16
Encoding a Matrix
  • Encode each column of matrix via extra row
  • Row entries sums of matrix columns
  • Full Encoding

17
Detecting Errors
  • Suppose matrix A is corrupted to matrix Â
  • entry âi,j is wrong
  • Can detect errors exact position lti,jgt

18
Correcting Errors
  • Can correct error using row or col checksum

19
Big Trick Preservation of Encoding
  • Column-encoded mtx Row-encoded mtx
    Fully-encoded mtx
  • Can check MMM computation by checking encoding of
    output
  • If product matrix has an erroneous entry
  • Can detect
  • Can correct

20
Applications
  • Matrix Multiplication
  • Given encoded A and B,
  • Check whether MMM result C (?AB) has valid
    encoding
  • Matrix Factorization
  • Given a factorization AWZ
  • Verify correctness by verifying encodings of
    factors
  • Factors row- OR column-encoded
  • Can only detect, not correct errors

21
Weighted ABFT
  • Oftentimes need to check row- or column-encoded
    matrices
  • Ex factorization, data integrity check
  • Can only detect errors in such matrices
  • Can we also correct?
  • Yes, by generalizing to weighted checking
    rows/columns

22
Weighting
  • Suppose we have d n-vectors w1wd
  • Can column-encode matrix A
  • Lets try out

23
Weighted Error Detection
24
Weighted Error Correction
  • Weighted encoding Detects and Corrects single
    errors
  • Even for non full-encoding

25
Outline
Linear Error Correcting Codes
ABFT Linear Encoding of Matrices
Algorithm-Based Fault Tolerance
26
Surprise
  • But this is all just a linear code!
  • Generator matrix for above scheme

27
Generating Encodings
  • Given mltai,1, ai,2, , ai,kgt as message word
    (or matrix row/column)

28
Surprise??
  • Not too surprising really
  • Why else would MMM preserve encoding?
  • Another possibility
  • Efficient can be implemented via bit shifts
  • Room open for using any linear code!

29
Error Detection/Correction in General
  • To show for linear codes
  • Can detect ?dmin errors
  • Can correct ?(dmin-1)/2 errors
  • Let be original codeword
  • Let be the corrupted codeword
  • e error vector

30
Error Detection in General
  • s called the syndrome vector
  • Independent of original codeword
  • Note weight(e) ltdmin since ltdmin errors
  • Thus
  • Detection if , then ERROR

31
Error Correction in General
  • Clearly e is correction vector
  • corrects error in
  • Sufficient to prove
  • weight(e)?(dmin-1)/2 ? H is isomorphism
    correction vectors ?
    syndrome vectors
  • i.e. for each correction vector (want to know)
    ? unique syndrome vector
  • Thus, possible to correct any error
  • may not be efficient

32
H is Onto
  • weight(e) ? (dmin-1)/2 lt dmin
  • rank(H) n-k ? (dmin-1)/2
  • Thus, rank(H) ? weight(e) and He ? 0
  • Not enough 1s in e to sum Hs columns to 0
  • H maps onto its range
  • Thus,

33
H is 1-1
  • Let e1 and e2 be correction vectors, e1 ? e2
  • Suppose that
  • weight(e1e2) ? (dmin-1)/2
  • He1 He2 s
  • He1-He2 H(e1-e2) s-s 0
  • And so, (e1-e2) is a codeword
  • Thus, weight(e1-e2) ? dmin
  • But weight(e1e2) ? (dmin-1)/2 and so
    weight(e1-e2) ?dmin-1
  • Contradiction! e1 e2

34
Other Encoding Schemes
  • Linear codes preserved by matrix multiplication
  • Presumably, fancier codes might be preserved by
    fancier computations
  • Limit
  • S. Winograd showed in 1962 that any code s.t.
    f(x?y) f(x) ? f(y) has rate (k/n) or minimum
    weight?0 as k??
  • How general can we get?
  • Do good solutions exist for small k?
  • k64 bits should be good enough

35
Summary
  • For Matrix Multiplication can encode input via
    linear codes
  • Solutions exist for more complex codes
  • Ex Fourier Transforms
  • On parallel systems must ensure
  • No processor touches gt1 element per row/column
  • Else, if one processor fails, encoding
    overwhelmed with errors
  • To ensure this must modify algorithm
  • Separate check placement theory
Write a Comment
User Comments (0)
About PowerShow.com