Title: Week
1IT-101
Introduction to Information Technology
2Overview
- Chapter 3.5.3 Introduction to error detection
and correction - Parity checking
- Repetition coding
- Redundancy checking
- Chapter 4 Protocols
- Chapter 7 Compressing Information
- Why can information be compressed?
- Variable length coding
- Universal coding
3Error Detection and Correction
- Many factors can lead to errors during
transmission or storage of bits. - When binary information is transmitted across a
physical channel such as wire, coax cable,
optical fiber or air, there is always the
possibility that some bits will be received in
error due to interference from other sources or
signal attenuation due to long distances or
events like rain or snow etc - When binary information is stored on some form of
media such as magnetic disks or CDs, there is the
possibility that some bits will be read in error
due to smudged/scratched disks.
4- Clever code construction or additional
information added to the data can increase the
odds of the information being retrieved
correctly. This is called error control coding.
We will discuss 3 methods - Parity checking (error detection only)
- Repetition coding (error detection and
correction) - Redundancy code word checking (error detection
and correction) - Adding redundancy to a code increases the number
of bits that need to be transmitted/stored, but
leads to the detection and significant decrease
in errors during retrieval of information.
5- Just about any system that uses digital
information employs some form of error detection
and/or correction. Recall that a major advantage
of digital was the ability to detect/correct
errors. - For example, information on a CD is encoded to
allow the CD player to detect and correct errors
that might arise due to smudged or scratched
disks. - Almost all digital communication systems employ
some form of error control coding to correctly
decode information that has been corrupted by the
channel during transmission.
6Parity Checking
- A simple method for error detection can be
accomplished by appending an extra bit called a
parity bit at the end of the code. - Parity checking allows for the detection of
errors, but not correction. - Even parity is when the parity bit is set so that
the total number of 1s in the word is even - 11 ? 11 0
- 10 ? 10 1
- Odd parity is when the parity bit is set so that
the total number of 1s in the word is odd - 11 ? 11 1
- 10 ? 10 0
7Parity Checking
- Even parity is set, and 1 0 0 is received
- Error present, but dont know which one is the
bit in error - Parity checking can detect errors, but cant
correct them - Odd parity is set 1111011 0 is received.
Error? - Even parity is set 1111011 0 is received.
Error? - Parity checking has a major disadvantage due to
its inability to detect an even number of errors,
e.g. 0011 1001 would go undetected even if the
original transmission was 0011 1111, since there
are 2 errors.
8Repetition coding
- Repetition coding is the provision of extra bits
which are repetitions of the original data to
ensure the information is received correctly. - Each bit in the original data is repeated a
certain number of times. - Repetition coding allows for the detection and
correction of errors - For example, in 3-bit repetition coding
- Original data 1 0 0 1
- Transmitted 111 000 000 111 (each bit is
repeated 3 times) - Received 011 000 001 111
- Errors in the first and third bits detected
- Errors in the first and third bits can be
corrected
9Redundancy Checking
- Redundancy checking is a more complex form of
error control coding scheme compared to the
previous two methods. - It uses parity checking in an interleaved way.
- Redundancy checking allows for the detection and
correction of errors - Redundancy check coding may be accomplished a
number of ways, and only one approach will be
discussed.
10Redundancy Check
- Symbols are given a parity bit
- Total word is given a redundancy check
- The first bit of each symbol in the word is given
parity, then the second bit of each symbol in the
word is given parity - The additional parity symbol is given its own
parity and then appended to the transmitted
information
11Redundancy check coding
Symbols
- Information to be sent 00 01 10 11
- First, each 2-bit symbol is given an even parity
- 00 0 01 1 10 1 11 0
- Next, the first bits of each symbol are given odd
parity - 0 0 1 1 Odd parity bit 1
12- Next, the second bits of each symbol are given
odd parity - 0 1 0 1 Odd parity bit 1
- Next, the parity bits are given even parity
- Parity bits 1 1 Even parity bit 0
- The redundancy check codeword is appended at the
end of the bit stream. - Transmitted 000 011 101 110 110
Redundancy check codeword
13Redundancy Check Error
- Original bit stream 00 01 10 11
- Coded and transmitted bit stream 000 011 101
110 110 - Received bit stream 000 111 101 110 110
- Even parity tells us that the second symbol has
an error - Comparing Odd parity with the first bit in each
symbol shows us that the first bit in the second
symbol should be a 0 - Comparing Odd parity with the second bit in each
symbol shows us that everything is OK - So, the error is detected and can be corrected.
Since we know that the first bit in the second
symbol should be 0 instead of 1, this is
corrected, resulting in the decoded bit stream at
the receiver - 00 01 10 11
14In-class examples
- The following binary stream uses 2-bit repetition
coding to help detect errors. Find the erroneous
2-bit symbol. - 001100100011
- If an even-parity bit has been added to the end
of each 3-bit symbol for error detection, which
of the following symbols has been received in
error? - 0110
- 0010
- 0011
- Generate the redundancy check codeword for the
following stream of 2-bit symbols - 10 00 11 10
- Find and correct the error in the following bit
stream that is terminated in a redundancy check
codeword. - 000 101 010 101
15Next topic Ch.4 Protocols
- Objectives
- Protocols are common to human as well as digital
worlds - Sample protocols for transmission, storage, data
processing
16Protocols
Agreed upon sets of rules that provide order to
different systems and situations.
- Believe it or not, you use protocols every day
- When you see a red, octagonal sign, you_____
- When you pick up the telephone when it rings, you
say _______ - You know to wait in line at the DMV
- You understand how to mail a letter
- Protocols give structure and provide rules, but
they arent based on anything other than human
convention, agreement and understanding
17Protocols are a vital component of IT
Interoperability requires sets of rules for
communicating between various devices.
- What type of connector or voltage level should be
used by a device? - How can information be formatted in a standard
manner? - Where in a bit stream do you begin?
- Which bits comprise the destination address?
- How can a document include bold, italics,
different font sizes, etc.
Without agreed upon formats, wed be drowning in
a sea of 1s and 0s.
010100100101001011110101010110010101010101010
18IT Protocols
To achieve interoperability, digital systems must
organize, manage, and interpret information
according to a set of widely accepted standards.
- Weve already discussed many IT standards.
- Internet addresses have 32 bits.
- The byte is commonly accepted as the smallest
division of information for storage and
manipulation. - ASCII is the standard protocol for alphanumeric
data - IT is built on protocols
- Hypertext Markup Language (HTML)
- Hypertext Transfer Protocol (HTTP)
- Simple Mail Transfer Protocol (SMTP)
- Internet Protocol (IP)
- And just about everything else we will talk about
19Who Sets Standards and Protocols?
- Technology Consortiums
- Internet Engineering Task Force (IETF)
- World Wide Web Consortium (W3C)
- International Organizations
- International Telecommunications Union (ITU)
- International Standards Organization (ISO)
-
- National Organizations
- American National Standards Institute (ANSI)
-
- Professional Organizations
- Institute of Electrical and Electronic Engineers
(IEEE) - Companies Microsoft, Cisco, 3Com, others..
20Protocol example
- How do we know where the bit stream begins?
- Start bits stop bits (These are protocols)
- Flags -- used in Ethernet, High-Level Data Link
Control (HDLC)
21HDLC Protocol Challenges
- A protocol procedure like the HDLC flag byte
would fail if that byte also occurred somewhere
in the content bit stream. - Recall that flag byte 01111110 ASCII
- Furthermore, because the bit stream is read on a
bit-by-bit basis, this pattern could appear under
other circumstances. - To fix this problem, a rule had to be
incorporated into the protocol - Transmitter Whenever you have five 1s in a row,
insert an unneeded 0 - Receiver Whenever you receive five 1s in a row,
followed by a 0, discard the 0 - Note that this only happens in the data (or
content) field AFTER the flag byte is
transmitted. - This procedure is known as Bit Stuffing, or Zero
Bit Insertion
22Bit Stuffing Example
Original Data0110 1111 1111 1111 1111 0010
Transmitted Data 0110 1111 1011 1110 1111 1010 010
23Next Topic
- Chapter 7 Compressing Information
- Why can information be compressed
- Variable length coding
- Universal coding
- Huffman coding
24Compressing Information
World Wide Web not World Wide Wait
- Compression techniques can significantly reduce
the bandwidth and memory required for sending,
receiving, and storing data. - Most computers are equipped with modems that
compress or decompress all information leaving or
entering via the phone line. - With a mutually recognized system (e.g. WinZip)
the amount of data can be significantly
diminished. - Examples of compression techniques
- Compressing BINARY DATA STREAMS
- Variable length coding (e.g. Huffman coding)
- Universal Coding (e.g. WinZip)
- IMAGE-SPECIFIC COMPRESSION
- GIF and JPEG
- VIDEO COMPRESSION
- MPEG
25Why can we compress information?
REDUNDANCY
- Compression is possible because information
usually contains redundancies, or information
that is often repeated. - For example, two still images from a video
sequence of images are often similar. This fact
can be exploited by transmitting only the changes
from one image to the next. - For example, a line of data often contains
redundancies - File compression programs remove this redundancy.
Ask not what your country can do for you - ask
what you can do for your country.
26FREQUENCY
- Some characters/events occur more frequently than
others. - Its possible to represent frequently occurring
characters with a smaller number of bits during
transmission. - This may be accomplished by a variable length
code, as opposed to a fixed length code like
ASCII. - An example of a simple variable length code is
Morse Code. - E occurs more frequently than Z so we
represent E with a shorter length code
. E - T - - . . Z -
- . - Q
27Information Theory
- Variable length coding exploits the fact that
some information occurs more frequently than
others. - The mathematical theory behind this concept is
known as INFORMATION THEORY - Claude E. Shannon developed modern Information
Theory at Bell Labs in 1948. - He saw the relationship between the probability
of appearance of a transmitted signal and its
information content. - This realization enabled the development of
techniques that could achieve compression.
28A Little Probability
- Shannon found that information can be related to
probability. - An event has a probability of 1 (or 100), if we
believe it will occur. - An event has a probability of 0 (or 0), if we
believe it will not occur. - The probability that an event will occur takes on
values anywhere from 0 to 1. - Consider a coin toss heads or tails each has a
probability of .50 - In two tosses, the probability of tossing two
heads is - 1/2 x 1/2 1/4 or .25
- In three tosses, the probability of tossing all
tails is - 1/2 x 1/2 x 1/2 1/8 or .125
- We compute probability this way because the
result of each toss is independent of the
results of other tosses.
29Example from text..
A MENS SPECIALTY STORE
- The probability of male patrons is .8
- The probability of female patrons is .2
- Assume for this example, groups of two enter the
store. Calculate the probabilities of different
pairings - Event A, Male-Male. P(MM) .8 x .8 .64
- Event B, Male-Female. P(MF) .8 x .2 .16
- Event C, Female-Male. P(FM) .2 x .8 .16
- Event D, Female-Female. P(FF) .2 x .2 .04
- We could assign the longest codes to the most
infrequent events while maintaining unique
decodability.
30Example (cont..)
- Lets assign a unique string of bits to each
event based on the probability of that event
occurring. - Event Name Code
A Male-Male 0 B Male-Female 10
C Female-Male 110 D Female-Female 111 - Given a received code of 01010110100, determine
the events - The above example has used a variable length code.
31Variable Length Coding
Takes advantage of the probabilistic nature of
information.
- Unlike fixed length codes, variable length codes
- Assign the longest codes to the most infrequent
events. - Assign the shortest codes to the most frequent
events. - Each code must be uniquely identifiable
regardless of length. - Examples of Variable Length Coding
- Morse Code
- Huffman Coding
If we have total uncertainty about the
information we are conveying, fixed length codes
are preferred.
32Morse Code
- Characters represented by patterns of dots and
dashes. - More frequently used letters use short code
symbols. - Short pauses are used to separate the letters.
- Represent Hello using Morse Code
- H . . . .
- E .
- L . - . .
- L . - . .
- O - - -
- Hello . . . . . . - . . . - . . - - -
33Huffman Coding
A procedure for finding the optimum, uniquely
decodable, variable length code associated with a
set of events, given their probabilities of
occurrence.
- Huffman coding is a variable length coding scheme
because it assigns the shortest codes to the most
frequently occurring events, and longest codes to
the least frequent events - You must know the probability of occurrence of
events beforehand - To determine the assignment of codes to events, a
Huffman code tree needs to be constructed - To decode a Huffman encoded bit stream, you would
need to have the code table that was generated by
the Huffman code tree
34Constructing a Huffman Code Tree
- First list all events in descending order of
probability. - Pair the two events with lowest probabilities and
add their probabilities.
.3 Event A
.3 Event B
.13 Event C
.12 Event D
.1 Event E
.05 Event F
0.15
.3 Event A
.3 Event B
.13 Event C
.12 Event D
.1 Event E
.05 Event F
35Constructing a Huffman Code Tree
- Repeat for the pair with the next lowest
probabilities.
0.15
0.25
.3 Event A
.3 Event B
.13 Event C
.12 Event D
.1 Event E
.05 Event F
36Constructing a Huffman Code Tree
- Repeat for the pair with the next lowest
probabilities.
0.4
0.15
0.25
.3 Event A
.3 Event B
.13 Event C
.12 Event D
.1 Event E
.05 Event F
37Constructing a Huffman Code Tree
- Repeat for the pair with the next lowest
probabilities.
0.4
0.6
0.15
0.25
.3 Event A
.3 Event B
.13 Event C
.12 Event D
.1 Event E
.05 Event F
38Constructing a Huffman Code Tree
- Repeat for the last pair and add 0s to the left
branches and 1s to the right branches.
1
0
0.4
0.6
0
1
0.15
0.25
0
1
1
0
0
1
.3 Event A
.3 Event B
.13 Event C
.12 Event D
.1 Event E
.05 Event F
00
01
100
101
110
111
39- Given the tree we just constructed, we can assign
a unique code to each event (This is the Huffman
code table) - Event A 00 Event B 01
- Event C 100 Event D 101
- Event E 110 Event F 111
- How can you decode the string 0000111010110001000
000111? - Starting from the leftmost bit, find the shortest
bit pattern that matches one of the codes in the
table. The first bit is 0, but we dont have an
event represented by 0. We do have one
represented by 00, which is event A. Continue
applying this procedure
40Exercise
- Construct a Huffman code tree and code table
given events with the following probabilities - 0.3 0.2 0.15 0.10 0.08 0.07 0.06 0.04
- Decode the following bit stream using this
Huffman code tree - 110100000000100101000010
A B C D E
F G H
41Universal Coding
- Huffman coding has its limits
- You must know a priori (beforehand) the
probability of the characters or symbols you are
encoding. - What if a document is one of a kind?, i.e. you
dont know the probabilities of events? - Universal Coding schemes do not require a
knowledge of the statistics (probabilities) of
the events to be coded. - Universal Coding is based on the realization that
any stream of data consists of some repetition. - Lempel-Ziv coding is one form of Universal Coding
presented in the text. - Compression results from reusing frequently
occurring strings. - Works better for long data streams. Inefficient
for short strings. - Used by WinZip to compress information.
42An important distinction!
- If, for example you have 4 events/characters/symbo
ls/things, such as A, B, C, and D, and you want
to convert each of them to binary. Traditionally,
what you would do is to determine the number of
bits required to encode 4 different things - Since you have 4 different things, you would need
to use 2 bits (224) - The assignment of codes could be A 00, B 01,
C10, D11 - This is a fixed length coding approach (just like
ASCII), since 2 bits are used per code - Then, for example, if you wanted to send
information such as AABCDBC, you would need to
send 00 00 01 10 11 01 10 - Before sending this, however, you could apply
compression techniques such as universal coding
(WinZip) to reduce the amount of data that needs
to be sent - These types of compression techniques are done
after the information has been converted to
binary - Note that Huffman coding is a technique that can
achieve compression through the binary conversion
process
43Comments for next class
- Please start reading chapters 10, 11 and 12 for
next class
441.0
1
0.60
0
0.40
1
0
1
0
0.30
0
1
0.20
0.10
0
1
0.15
0
0
1
1
0.3
0.2
0.15
0.10
0.8
0.7
0.6
0.4
00
10
010
110
0110
0111
1110
1111
A
B
C
D
E
F
G
H
110100000000100101000010
DBAAACCBAC