Title: Review last class
1Review last class
- Fundamental concepts in Fault Tolerant Computing
- Importance of FTC
- Reliability
- Availability
- Faults, Errors, Failure
- Modeling
2Todays Topics
- Information redundancy temporal redundancy
- Error Detection and Correction Codes
- Applications
-
3Information Redundancy
- Idea Add redundant information to data to allow
- Fault detection
- Fault masking
- Fault tolerance
- Mechanisms
- Error detecting codes and error correcting codes
(ECC)
4Information Redundancy
- Useful to distinguish
- Data words ? the actual information contents
- Code words ? the transmitted information
(redundant) - Codes can be
- Separable ? if the code word contains all
original data bits plus additional check bits - Non-separable ? otherwise
- Example ASCII coding of single digit (separable)
- Data word 9
- Code word 49
5Information Redundancy
- Dataword with d bits is encoded into a codeword
with c bits where c gt d - Not all 2c combinations are valid codewords
- If c bits are not a valid codeword an error is
detected - Extra bits may be used to correct errors
- Overhead time to encode and decode
6Data Communication
- Error coding provides reliable digital data
transmission (and/or storage) when the
communication medium used has an unacceptable bit
error rate (BER) and a low signal-to-noise ratio
(SNR)
Noise
ECC
ECC
7Shannons Theorem
- Shannon theorem1 states the maximum amount of
error-free data (i.e, information) that can be
transmitted over a communication link with a
specified bandwidth in the presence of noise
interference
C is the channel capacity in bits per second
(including error correction) BW is the bandwidth
of the channel S/N is the signal-to-noise
ratio 1C. E. Shannon, A Mathematical Theory of
Communication, Bell System Technical
Journal, Volume 27, pp. 379 - 423 and pp. 623 -
656, 1948.
8Error-Detection System using Check Bits
9Parity Code
- The simplest separable codes are the parity
codes. - Parity codes are used to transmit information.
- A parity-coded word includes n data bits and an
extra bit which holds the parity - In even (odd) parity code - the extra bit is set
so that the total number of 1's in the (n1)-bit
word (including the parity bit) is even (odd) - The overhead fraction of this parity code is 1/n.
- Example
- 0100 1001 0
- 0110 1001 1
- HD 2 (will detect all single-bit parity-errors)
Parity bit
10Encoding-Decoding Parity Code
- The encoder a modulo-2 adder -generating a 0 if
the number of 1's is even - The output is the parity signal
- The decoder generates the
- parity from the received
- data bits and compares it
- with the received parity bit
Double-bit errors cant be detected by parity
check
11Error Correcting Parity Codes
- Simplest scheme - data is organized in a
2-dimensional array - Bits at the end of row - parity over that row
- Bits at the bottom of column - parity over column
- A single-bit error anywhere will cause a row and
a column to be erroneous - This identifies a unique erroneous bit. This is
an example of overlapping parity - each bit is
covered by more than one parity bit
12CheckSum
- Primarily used to detect errors in data
transmission on communication networks - Also used in memory systems
- Basic idea - add up the block of data being
- transmitted and transmit this sum as well
- Receiver adds up the data it received and
compares it with the checksum it received - If the two do not match - an error is indicated
and data is sent again - Temporal redundancy
13Versions of Checksum
- Data words are d bits long
- Versions
- Single-precision - checksum is a modulo 2d
addition - Double-precision - modulo 22d addition
- In general - single-precision checksum catches
fewer errors than double-precision, since it only
keeps the rightmost d bits of the sum - Residue checksum takes into account the carry out
of the d-th bit as an end-around carry somewhat
more reliable - The Honeywell checksum concatenates words into
pairs for the checksum calculation (done modulo
2d ) - guards against errors in the same position
14Comparing Versions of Checksum
All checksum schemes allow error detection but
not error location - entire block of data must
be retransmitted if an error is detected
0110 1
Single precision does not detect error but
Honeywell method does
15Data CommunicationPoor solutions
- Repeats
- Data 1 1 1 1
- Message
- 1 1 1 1
- 1 1 1 1
- 1 1 1 1
- Single CheckSum -
- Truth table
- General form
- Data1 1 1 1
- Message1 1 1 1 0
16Why they are poor
- Repeat 3 times
- This divide W by 3
- It divides overall capacity by at least a factor
of 3x. - Single Checksum
- Allows an error to be detected but requires the
message to be discarded and resent. - Each error reduces the channel capacity by at
least a factor of 2 because of the thrown away
message.
Shannon Efficiency
17Information Redundancy
More code bits
More error tolerance
Less bandwidth available for real information
18Hamming Distance
- Hamming Distance for a pair of code words
- The number of bits that are different between the
two code words HW(v1, v2) HW(v1?v2) - E.g. 0000, 0001 ? HD1
- E.g. 0100, 0011 ? HD3
- Minimum Hamming Distance for a code
- MinHD(code) Minx,yHD(x,y)
19Hamming Distance
- Hamming Distance of 2 means that a single bit
error will not change one of the codewords into
other
001,010,100,111 codeword has distance 2 The
code can detect a single bit error
000,111 codeword has distance 3 The code can
detect a single or double bit error
20Error Detection/Correction
- To detect up to D bit errors, the code distance
should be at least D1 - To correct up to C bit errors, the code distance
should be at least 2C1
e
a
a
b
b
C
C1
2C1
21Coding and Redundancy
- The code 000,111 can be used to encode a single
data bit - 0 can be encoded as 000 and 1 as 111
- Many redundancy techniques can be considered as
coding schemes
22Hamming Codes
- We write the equations as follows (easy to
remember) - p1 p2 i1 p4 i2 i3 i4
- 1 0 1 0 1 0 1
- 0 1 1 0 0 1 1
- 0 0 0 1 1 1 1
- 1 2 3 4 5 6 7
bit position - This encodes a 4-bit information word a to 7-bit
codeword (called a (7,4) code)
23Hamming(7,4) Code
- The Hamming(7,4) code may be defined with the use
of a Venn diagram. - Place the four digits of the un-encoded binary
word and place them in sections 1, 2, 3, and 4 of
the diagram. - Choose digits 5, 6, and 7 so that the parity of
each circle is even.
24Hamming Codes Single Error Correcting (SEC)
- Properties of the code
- If there is no error, all parity equations will
be satisfied - Denote the outcomes of these equation checks as
c1, c2, c4 - If there is exactly one error, the c1, c2, c4
point to the error - The vector c1, c2, c4 is called syndrome
- The previous (7,4) Hamming code is SEC code
25Hamming Codes
- Previous method of construction can be
generalized to construct an (n,k) Hamming code - Simple bound
- k number f information bits
- r number of check bits
- n k r total number of bits
- n 1 number of single errors or no error
- Each error (including no error) must have a
distinct syndrome - With r check bits max possible syndrome 2r
- Hence 2r ? n 1
26Hamming Codes by example (contd.)
- Simple bound
- When 2r n 1 the corresponding Hamming code
is a perfect code - Perfect Hamming codes can be constructed as
follows - p1 p2 i1 p4 i2 i3 i4 p8 i5
. . . . . . - 20 21 3 22 5 6 7 23 9 . . .
. . . - Parity equations can be written as before from
the above matrix representation
27Hamming Single Error correcting Code (SEC)
- Check bits are calculated by computing the ex-or
of 3 appropriate message bits - c1 b1 ? b2 ? b4
- c2 b1 ? b3 ? b4
- c3 b2 ? b3 ? b4
Pattern of parity bit checks for a Hamming (7,4)
SEC code
Message is b1b2b3b41010 with c1c2c3101
produces c1c2b1c3b2b3b41011010
28Hamming Single Error correcting Code (SEC)
- Message is b1b2b3b41010 and c1c2c3101 produces
c1c2b1c3b2b3b41011010 - To check transmitted word we recalculate ci
- Suppose b3 changes from 1 to 0 then
c1c2b1c3b2b3b41011000 - c1 b1 ? b2 ? b4 1 ? 0 ? 0 1
- c2 b1 ? b3 ? b4 1 ? 0 ? 0 1
- c3 b2 ? b3 ? b4 0 ? 0 ? 00
- Calculating the equations for ci we get
- e1 c1 ? c11 ? 1 0
- e2c2 ? c2 0 ? 1 1
- e3c3 ? c3 1 ? 0 1
- The binary address of the error bit is given by
e3e2e1110 or position 6 in our example. We
replace the error bit with its complement
29Hamming SECDED
- Its a distance 4 code which can be seen as a
distance 3 code with additional check bit - We can design first a SECSED and then append a
check bit, which is a parity bit over the other
message and check bits. - c4 c1 ? c2 ? b1 ? c3 ? b2 ? b3 ? b4
- e4 c4 ? c4
- The new coded word is c1c2b1c3b2b3b4c4
- The syndrome is interpreted as
30Cyclic Codes
- A code C is cyclic if every cyclic shift of c
also belongs to C. That is if C is cyclic then -
- Example A 5-bit cyclic code
- Cyclic codes are easy to generate (with shift
register)
31Cyclic Codes
- Cyclic codes are often non-separable - separable
cyclic codes do exist - Encoding consists of multiplying (modulo-2) the
data word by a constant number - The coded word is the product
- Decoding is dividing by the same constant - if
the remainder is non-zero, an error has occurred - Cyclic codes are widely used in data storage and
communication
32Cyclic Code Theory
- k - number of bits of data that are encoded
- Encoded word of length n bits - obtained by
multiplying the given k data bits by a number
that is n-k1 bits long - The multiplier is represented as a polynomial
the generator polynomial - 1s and 0s in the n-k1-bit multiplier are treated
as coefficients of an (n-k) -degree polynomial - Example multiplier is 11001 - generator
polynomial is G(x)1 X 0 0 X1 0 X2 1 X3 1
X4 1 X 3 X4
33Cyclic Code Theory
- (n,k) Cyclic Code
- A cyclic code using a generator polynomial of
degree n-k and total number of encoded bits n - An (n,k) cyclic code can detect all single errors
and all runs of adjacent bit errors shorter than
n-k - Useful in applications like wireless
communication - channels are frequently noisy and
have bursts of interference resulting in runs of
adjacent bit errors
34Cyclic Codes Theory
- For a polynomial to be a generator polynomial for
an (n,k) cyclic code it must be a factor (divides
evenly) of Xn-1 - 1X3X4 is a factor of X15-1 gt (15,11) code
- For the 5-bit code, (X1) - generator polynomial
- X5-1(X1)(X4X3X2X11)
- Multiply 0000, ,1111 by (X1) to obtain all
codewords of the (5,4) code
35Cyclic Redundancy Code (CRC)
- Basic idea
- Treat the message as a large binary number, to
divide it by another fixed binary number, and to
make the remainder from this division the error
checking information - Upon receipt of the message, the receiver can
perform the same division and compare the
remainder with the transmitted remainder - CRC calculations are based on
- polynomial division
- arithmetic over the field of integers mod 2.
36CRC
- Divisor is the generator polynomial
- Given a message to be transmitted bn bn-1 bn-2 .
. . b2 b1 b0 - View the bits of the message as the coefficients
of a polynomial - B(x) bn xn bn-1 xn-1 bn-2 xn-2 . . . b2
x2 b1 x b0 - Multiply the polynomial corresponding to the
message by xk where k is the degree of the
generator polynomial and then divide this product
by the generator to obtain polynomials Q(x) and
R(x) such that - xk B(x) Q(x) G(x) R(x)
- Treating all the coefficients not as integers but
as integers modulo 2. - Finally, treat the coefficients of the remainder
polynomial, R(X) as "parity bits". That is,
append them to the message before actually
transmitting it.
37CRC
- xk B(x) - R(x) Q(x) G(x)
- In other words, if the transmitted message's bits
are viewed as the coefficients of a polynomial,
then that polynomial will be divisible by G(X). - When a message is received the corresponding
polynomial is divided by G(x). If the remainder
is non-zero, an error is detected. Otherwise, the
message is assumed to be correct.
38CRC Example
- Suppose we want to send the short message
11010111 using the CRC with the polynomial x3
x2 1 as our generator - The message corresponds to the polynomial x7
x6 x4 x2 x 1 - Given G(x) is of degree 3, we need to multiply
this polynomial by x3 and then divide the result
by G(x) (x10 x9 x7 x5 x4 x3)
x7 x2 1
x3 x2 1 x10 x9 x7 x5 x4 x3
x10 x9 x7
x5 x4 x3
x5 x4 x2
x3 x2
x3 x2 1
Residue but in module 2 arithmetic Therefore the
parity will be 001
-2x2 1 1
-2x2 - 1
39Generating Polynomials
- CRC-16 G(x) x16 x15 x2 1
- detects single and double bit errors
- All errors with an odd number of bits
- Burst errors of length 16 or less
- Most errors for longer bursts
- CRC-32 G(x) x32 x26 x23 x22 x16 x12
x11 x10 x8 x7 x5 x4 x2 x 1 - Used in Ethernet
- Also 32 bits of 1 added on front of the message
- Initialize the LFSR to all 1s
40Reed-Solomon (RS) Codes
- RS codes are block-based error correcting codes
with a wide range of applications in digital
communications and storage. - Storage devices (including tape, Compact Disk,
DVD, barcodes, etc) - Wireless or mobile communications (including
cellular telephones, microwave links, etc) - Satellite communications
- Digital television / DVB
- High-speed modems such as ADSL, xDSL, etc.
41Reed-Solomon (RS) Codes
42Reed-Solomon (RS) Codes
- A Reed-Solomon code is specified as RS(n,k) with
s-bit symbols. - Encoder takes k data symbols of s bits each and
adds parity symbols to make an n symbol codeword.
- There are n-k parity symbols of s bits each.
- A Reed-Solomon decoder can correct up to t
symbols that contain errors in a codeword, where
2t n-k.
43Reed-Solomon (RS) Codes
- Typical Reed-Solomon codeword
Example A popular Reed-Solomon code is
RS(255,223) with 8-bit symbols. Each codeword
contains 255 code word bytes, of which 223 bytes
are data and 32 bytes are parity. For this
code n 255, k 223, s 8 2t 32, t
16 The decoder can correct any 16 symbol errors
in the code word i.e. errors in up to 16 bytes
anywhere in the codeword can be automatically
corrected.
44Reed-Solomon (RS) Codes
- A Reed-Solomon codeword is generated using a
special polynomial. All valid codewords are
exactly divisible by the generator polynomial.
The general form of the generator polynomial is
and the codeword is constructed using c(x)
g(x).i(x) g(x) is the generator polynomial,
i(x) is the information block, c(x) is a valid
codeword and a is referred to as a primitive
element of the field
45Reed-Solomon (RS) Codes
- Example Generator for RS(255,249)