Title: CS252 Graduate Computer Architecture Lecture 17 ECC (continued), CRC
1CS252Graduate Computer ArchitectureLecture
17ECC (continued), CRC
- John Kubiatowicz
- Electrical Engineering and Computer Sciences
- University of California, Berkeley
- http//www.eecs.berkeley.edu/kubitron/cs252
2Review Code Vector Space
Code Space
C0f(v0)
Code Distance (Hamming Distance)
v0
- Not every vector in the code space is valid
- Hamming Distance (d)
- Minimum number of bit flips to turn one code word
into another - Number of errors that we can detect (d-1)
- Number of errors that we can fix ½(d-1)
3Review How to Generate code words?
- Consider a linear code. Need a Generator Matrix.
- Let vi be the data value (k bits), Ci be
resulting code (n bits) - Are there 2k unique code values?
- Only if the k columns of G are linearly
independent! - Of course, need some way of decoding as well.
-
- Is this linear??? Why or why not?
- A code is systematic if the data is directly
encoded within the code words. - Means Generator has form
- Can always turn non-systematiccode into a
systematic one (row ops)
4Implicitly Defining Codes by Check Matrix
- But what is the distance of the code? Not
obvious - Instead, consider a parity-check matrix H
(n?n-k) - Compute the following syndrome Si given code
element Ci - Define valid code words Ci as those that give
Si0 (null space of H) - Size of null space? (n-rank H)k if (n-k)
linearly independent columns in H - Suppose you transmit code word C, and there is an
error. Model this as vector E which flips
selected bits of C to get R (received) - Consider what happens when we multiply by H
- What is distance of code?
- Code has distance d if no sum of d-1 or less
columns yields 0 - I.e. No error vectors, E, of weight lt d have zero
syndromes - Code design Design H matrix with these
properties
5 How to relate G and H (Binary Codes)
- Defining H makes it easy to understand distance
of code, but hard to generate code (H defines
code implicitly!) - However, let H be of following form
- Then, G can be of following form (maximal code
size) - Notice G generates values in null-space of H
6Simple example (Parity, D2)
- Parity code (8-bits)
- Note Complexity of logic depends on number of 1s
in row!
7Simple example Repetition (voting, D3)
- Repetition code (1-bit)
- Positives simple
- Negatives
- Expensive only 33 of code word is data
- Not packed in Hamming-bound sense (only D3).
Could get much more efficient coding by encoding
multiple bits at a time
8Simple Example Hamming Code (d3)
- Example (7,4) code
- Protect 4 data bits with 3 parity bits
- 1 2 3 4 5 6 7
- p1 p2 d1 p3 d2 d3 d4
- Bit position number
- 001 110
- 011 310
- 101 510
- 111 710
- 010 210
- 011 310
- 110 610
- 111 710
- 100 410
- 101 510
- 110 610
- 111 710
9How to correct errors?
- But what is the distance of the code? Not
obvious - Instead, consider a parity-check matrix H
(n?n-k) - Compute the following syndrome Si given code
element Ci - Suppose that two correctable error vectors E1 and
E2 produce same syndrome - But, since both E1 and E2 have ? (d-1)/2 bits,
E1 E2 ? d-1 bits set this cannot be true! - So, syndrome is unique indicator of correctable
error vectors
10Example, d4 code (SEC-DED)
- Design H with
- All columns non-zero, odd-weight, distinct
- Note that odd-weight refers to Hamming Weight,
i.e. number of zeros - Why does this generate d4?
- Any single bit error will generate a distinct,
non-zero value - Any double error will generate a distinct,
non-zero value - Why? Add together two distinct columns, get
distinct result - Any triple error will generate a non-zero value
- Why? Add together three odd-weight values, get an
odd-weight value - So need four errors before indistinguishable
from code word - Because d4
- Can correct 1 error (Single Error Correction,
i.e. SEC) - Can detect 2 errors (Double Error Detection, i.e.
DED) - Example
- Note log size of nullspace will be (columns
rank) 4, so - Rank 4, since rows independent, 4 cols indpt
- Clearly, 8 bits in code word
- Thus (8,4) code
11Tweeks
- No reason cannot make code shorter than required
- Suppose n-k8 bits of parity. What is max code
size (n) for d4? - Maximum number of unique, odd-weight columns 27
128 - So, n 128. But, then k n (n k) 120.
Weird! - Just throw out columns of high weight and make
72, 64 code! - But shortened codes like this might have d gt 4
in some special directions - Example Kaneda paper, catches failures of groups
of 4 bits - Good for catching chip failures when DRAM has
groups of 4 bits - What about EVENODD code?
- Can be used to handle two erasures
- What about two dead DRAMs? Yes, if you can
really know they are dead
12(No Transcript)
13Galois Field
- Definition Field a complete group of elements
with - Addition, subtraction, multiplication, division
- Completely closed under these operations
- Every element has an additive inverse
- Every element except zero has a multiplicative
inverse - Examples
- Real numbers
- Binary, called GF(2) ? Galois Field with base 2
- Values 0, 1. Addition/subtraction use xor.
Multiplicative inverse of 1 is 1 - Prime field, GF(p) ? Galois Field with base p
- Values 0 p-1
- Addition/subtraction/multiplication modulo p
- Multiplicative Inverse every value except 0 has
inverse - Example GF(5) 1?1 ? 1 mod 5, 2?3 ? 1mod 5, 4?4
? 1 mod 5 - General Galois Field GF(pm) ? base p (prime!),
dimension m - Values are vectors of elements of GF(p) of
dimension m - Add/subtract vector addition/subtraction
- Multiply/divide more complex
- Just like read numbers but finite!
14Specific Example Galois Fields GF(2n)
- Consider polynomials whose coefficients come from
GF(2). - Each term of the form xn is either present or
absent. - Examples 0, 1, x, x2, and x7 x6 1
- 1x7 1 x6 0 x5 0 x4 0 x3 0
x2 0 x1 1 x0 - With addition and multiplication these form a
field - Add XOR each element individually with no
carry - x4 x3 x 1
- x4 x2 x
- x3 x2 1
- Multiply multiplying by x is like shifting to
the left. -
- x2 x 1
- ? x 1
- x2 x 1
- x3 x2 x
- x3 1
15So what about division (mod)
x4 x2
x3 x with remainder 0
x
x4 x2 1
x3 x2 with remainder 1
X 1
x3
x2
0x
0
x4 0x3 x2 0x 1
X 1
x3 x2
x3 x2
0x2 0x
0x 1
Remainder 1
16Producing Galois Fields
- These polynomials form a Galois (finite) field if
we take the results of this multiplication modulo
a prime polynomial p(x). - A prime polynomial is one that cannot be written
as the product of two non-trivial polynomials
q(x)r(x) - Perform modulo operation by subtracting a
(polynomial) multiple of p(x) from the result.
If the multiple is 1, this corresponds to XOR-ing
the result with p(x). - For any degree, there exists at least one prime
polynomial. - With it we can form GF(2n)
- Additionally,
- Every Galois field has a primitive element, ?,
such that all non-zero elements of the field can
be expressed as a power of ?. By raising ? to
powers (modulo p(x)), all non-zero field elements
can be formed. - Certain choices of p(x) make the simple
polynomial x the primitive element. These
polynomials are called primitive, and one exists
for every degree. - For example, x4 x 1 is primitive. So ? x
is a primitive element and successive powers of ?
will generate all non-zero elements of GF(16).
Example on next slide.
17Galois Fields Primitives
- ?0 1
- ?1 x
- ?2 x2
- ?3 x3
- ?4 x 1
- ?5 x2 x
- ?6 x3 x2
- ?7 x3 x 1
- ?8 x2 1
- ?9 x3 x
- ?10 x2 x 1
- ?11 x3 x2 x
- ?12 x3 x2 x 1
- ?13 x3 x2 1
- ?14 x3 1
- ?15 1
- Note this pattern of coefficients matches the
bits from our 4-bit LFSR example. - In general finding primitive polynomials is
difficult. Most people just look them up in a
table, such as
?4 x4 mod x4 x 1 x4 xor x4 x 1
x 1
18Primitive Polynomials
- x2 x 1
- x3 x 1
- x4 x 1
- x5 x2 1
- x6 x 1
- x7 x3 1
- x8 x4 x3 x2 1
- x9 x4 1
- x10 x3 1
- x11 x2 1
x12 x6 x4 x 1 x13 x4 x3 x 1 x14
x10 x6 x 1 x15 x 1 x16 x12 x3 x
1 x17 x3 1 x18 x7 1 x19 x5 x2 x
1 x20 x3 1 x21 x2 1
x22 x 1 x23 x5 1 x24 x7 x2 x 1 x25
x3 1 x26 x6 x2 x 1 x27 x5 x2 x
1 x28 x3 1 x29 x 1 x30 x6 x4 x
1 x31 x3 1 x32 x7 x6 x2 1
Galois Field Hardware Multiplicat
ion by x ? shift left Taking the result
mod p(x) ? XOR-ing with the coefficients of
p(x) when the most significant coefficient
is 1. Obtaining all 2n-1 non-zero ? Shifting and
XOR-ing 2n-1 times. elements by evaluating xk for
k 1, , 2n-1
19Building an LFSR from a Primitive Poly(Cycle
through all non-zero values)
- For k-bit LFSR number the flip-flops with FF1 on
the right. - The feedback path comes from the Q output of the
leftmost FF. - Find the primitive polynomial of the form xk
1. - The x0 1 term corresponds to connecting the
feedback directly to the D input of FF 1. - Each term of the form xn corresponds to
connecting an xor between FF n and n1. - 4-bit example, uses x4 x 1
- x4 ? FF4s Q output
- x ? xor between FF1 and FF2
- 1 ? FF1s D input
- To build an 8-bit LFSR, use the primitive
polynomial x8 x4 x3 x2 1 and connect xors
between FF2 and FF3, FF3 and FF4, and FF4 and FF5.
20Reed-Solomon Codes
- Galois field codes code words consist of symbols
- Rather than bits
- Reed-Solomon codes
- Based on polynomials in GF(2k) (I.e. k-bit
symbols) - Data as coefficients, code space as values of
polynomial - P(x)a0a1x1 ak-1xk-1
- Coded P(0),P(1),P(2).,P(n-1)
- Can recover polynomial as long as get any k of n
- Properties can choose number of check symbols
- Reed-Solomon codes are maximum distance
separable (MDS) - Can add d symbols for distance d1 code
- Often used in erasure code mode as long as no
more than n-k coded symbols erased, can recover
data - Side note Multiplication by constant in GF(2k)
can be represented by k?k matrix a?x - Decompose unknown vector into k bits
xx02x12k-1xk-1 - Each column is result of multiplying a by 2i
21Reed-Solomon Codes (cont)
- Reed-solomon codes (Non-systematic)
- Data as coefficients, code space as values of
polynomial - P(x)a0a1x1 a6x6
- Coded P(0),P(1),P(2).,P(6)
- Called Vandermonde Matrix maximum rank
- Different representation(This H and G not
related) - Clear that all combinations oftwo or less
columns independent ? d3 - Very easy to pick whatever d you happen to want
add more rows - Fast, Systematic version of Reed-Solomon
- Cauchy Reed-Solomon, others
22Another Example Redundant Check
- Send a message M and a check word C
- Simple function on ltM,Cgt to determine if both
received correctly (with high probability) - Example XOR all the bytes in M and append the
checksum byte, C, at the end - Receiver XORs ltM,Cgt
- What should result be?
- What errors are caught?
bit i is XOR of ith bit of each byte
23Example TCP Checksum
TCP Packet Format
Application (HTTP,FTP, DNS)
7
Transport (TCP, UDP)
4
Network (IP)
3
Data Link (Ethernet, 802.11b)
2
- TCP Checksum a 16-bit checksum, consisting of
the one's complement of the one's complement sum
of the contents of the TCP segment header and
data, is computed by a sender, and included in a
segment transmission. (note end-around carry) - Summing all the words, including the checksum
word, should yield zero
Physical
1
24Example Ethernet CRC-32
Application (HTTP,FTP, DNS)
7
Transport (TCP, UDP)
4
Network (IP)
3
Data Link (Ethernet, 802.11b)
2
Physical
1
25CRC concept
- I have a msg polynomial M(x) of degree m
- We both have a generator poly G(x) of degree n
- Let r(x) remainder of M(x) xn / G(x)
- M(x) xn G(x)p(x) r(x)
- r(x) is of degree n
- What is (M(x) xn r(x)) / G(x) ?
- So I send you M(x) xn r(x)
- mn degree polynomial
- You divide by G(x) to check
- M(x) is just the m most signficant coefficients,
r(x) the lower n - x-bit Message is viewed as coefficients of
x-degree polynomial over binary numbers
26Polynomial division
0 0 0 0
1
0
1
1 0 0 1 1
1 0 1 1 0 0 1 0 0 0 0
1 0 0 1 1
- When MSB is zero, just shift left, bringing in
next bit - When MSB is 1, XOR with divisor and shift eft
0 0 1 0 1
0 1 0 1 0
1 0 1 0 1
1 0 0 1 1
0 0 1 0 0
27CRC encoding
1 0 1 1 0 0 1 0 0 0 0
0 0 0 0
0 0 0 1
0 1 1 0 0 1 0 0 0 0
0 0 1 0 1 1 0 0 1
0 0 0 0
0 1 0 1 1 0 0 1 0
0 0 0
1 0 1 1 0 0 1 0
0 0 0
0 1 0 1 0 1 0 0 0
0
1 0 1 0 1 0 0
0 0
0 1 1
0 0 0 0 0
1 1 0
0 0 0 0
1 0 1
1 0 0
0 1 0
1 0
1 0 1
0
Message sent
1 0 1 1 0 0 1 1 0 1 0
28CRC decoding
1 0 1 1 0 0 1 1 0 1 0
0 0 0 0
0 0 0 1
0 1 1 0 0 1 1 0 1 0
0 0 1 0 1 1 0 0 1
1 0 1 0
0 1 0 1 1 0 0 1 1
0 1 0
1 0 1 1 0 0 1 1
0 1 0
0 1 0 1 0 1 1 0
1 0
1 0 1 0 1 1 0
1 0
0 1 1
0 1 0 1 0
1 1 0
1 0 1 0
1 0 0
1 1 0
0 0 0
0 0
0 0 0
0
29Generating Polynomials
- CRC-16 G(x) x16 x15 x2 1
- detects single and double bit errors
- All errors with an odd number of bits
- Burst errors of length 16 or less
- Most errors for longer bursts
- CRC-32 G(x) x32 x26 x23 x22 x16 x12
x11 x10 x8 x7 x5 x4 x2 x 1 - Used in ethernet
- Also 32 bits of 1 added on front of the message
- Initialize the LFSR to all 1s
30Conclusion
- ECC add redundancy to correct for errors
- (n,k,d) ? n code bits, k data bits, distance d
- Linear codes code vectors computed by linear
transformation - Erasure code after identifying erasures, can
correct - Reed-Solomon codes
- Based on GF(pn), often GF(2n)
- Easy to get distance d1 code with d extra
symbols - Often used in erasure mode
- Redundancy useful to gain reliability
- Redundant diskscontrollersetc (RAID)
- Geographical scale systems (OceanStore)
- Disk technology
- Two innovations GMR, Vertical recording
- Disk Latency Queuing Time Seek Time
Rotation Time Xfer Time Ctrl Time