Data Coding - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Data Coding

Description:

A code is a predetermined set of symbols that have specific meanings. ... Examples include Arabic, Latin, Greek, Gothic, and Cyrillic scripts. ... – PowerPoint PPT presentation

Number of Views:397
Avg rating:3.0/5.0
Slides: 26
Provided by: admi1307
Category:

less

Transcript and Presenter's Notes

Title: Data Coding


1
Data Coding
2
Introduction
  • Computing devices use energy to represent data
    and power the devices.
  • Consequently, computing devices are electrical
    (power to operate) and electronic (energy to
    represent data). The data can be represented with
    electricity, light, radio waves, microwaves, or
    other energy in the electromagnetic spectrum.
  • The data are represented in a two state form
    called binary. The light or electricity is either
    on or off. Symbolically, these states are
    represented by 0s and 1s called bits, the
    smallest units of information a computer can
    store.

3
Introduction
  • By themselves, the bits are not very useful
    because they can represent only two states.
  • Grouping the bits, however, allows for numerous
    combinations.
  • A group of three bits has eight combinations and
    a group of eight bits has 256 combinations.
  • Grouping, therefore, allows associations to be
    created with specific items such as characters,
    numbers, or commands. This association is called
    coding.

4
Early Codes
  • A code is a predetermined set of symbols that
    have specific meanings.
  • Morse Code Developed by Samuel Morse in 1838
    for use in telegraph communications.
  • The code is a sequence of dots and dashes.
  • A unique aspect of this system is that the letter
    codes have varying lengths
  • the letter E corresponds to a single dot and
    the letter H has four dots.
  • The varied code length allows messages to be sent
    more quickly than codes with the same length for
    each letter.

5
Baudot, Morse, and BCD Codes
6
Early Codes
  • Baudot code- It uses 5 bits for each character
    and letter and was developed by Jean Baudot for
    the French telegraph.
  • How many combinations does 5 bits allow?
  • How many letters and digits are there?
  • Are there duplicates?
  • If yes, how can you determine the difference
    between a letter and a digit?
  • A shift down (11111) and shift up (11011) are
    used. Upon receiving a shift down all subsequent
    codes are interpreted as letters until a shift up
    is received.

7
Machine Codes
  • The telegraph codes are satisfactory for human
    use, but not for computers. Computer codes need
    the following attributes
  • binary format
  • all characters have the same length
  • all the bits are perfectly formed
  • all bits are of the same duration

8
Specific Computer Codes
  • Binary-Coded Decimal (BCD)
  • American Standard Code for Information
    Interchange (ASCII)
  • Extended Binary Coded Decimal Interchange
    (EBCDIC)
  • Unicode (16 bit)

9
Specific Computer Codes
  • Binary-coded decimal- developed to facilitate the
    entry and computation of numeric data.
  • Base 10 digits one (0001) through nine (1001)
    were represented in 4 bits in binary format and
    zero was represented as 1010.
  • The digits were stored in BCD and calculations
    were conducted in BCD.
  • As character data became more important, the BCD
    code was expanded to include other characters.
    The expanded code was eventually called
    binary-coded decimal interchange code (BCDIC).

10
Specific Computer Codes
  • ASCII code- The most widely used code today.
  • It has seven and eight bit code versions that are
    used to assign unique combinations of bits to
    each keyboard key stroke and special printable
    and non-printable characters. The non-printable
    characters include the line feed, tab, or
    carriage return. In the 8-bit or extended version
    of ASCII, special characters such as the corner
    of a box are included.

11
Specific Computer Codes
  • EBCDIC code- used primarily on IBM mainframe
    computers and peripherals.
  • It is an 8-bit code that allows 256 characters.
    Like ASCII, it has printable and non-printable
    characters.

12
Specific Computer Codes
  • Unicode- With the internationalization of
    networking applications, the ASCII and EBCDIC
    codes have become too inflexible. Unicode was
    developed to address these issues.
  • It is a 16-bit coding scheme that supports many
    scripts, collections of mathematical symbols, and
    special characters that exist in particular
    languages.
  • Examples include Arabic, Latin, Greek, Gothic,
    and Cyrillic scripts. Unicode has already defined
    codes for more than 90000 characters.
  • A special extension mechanism has been created
    encode more than 1 million characters.

13
Data Compression/Compaction
  • Definition Data compression is reducing the
    amount of data or bits moving across a network
    connection. The compression improves bandwidth
    utilization and the speed of transmission.
  • The key to data compression is to determine if
    there is redundancy in the original data and then
    eliminate it. Redundancy can exist in almost any
    type of data.
  • Examples include reoccurring letters, numbers or
    pixels. Compression codes that attempt to reduce
    redundancy are frequency-dependent codes.

14
Data Compression/CompactionFrequency-Dependent
Codes
  • Huffman Code-
  • Redundant data are counted and frequencies
    calculated.
  • Then a Huffman code is assigned to each piece of
    data such as a character. Characters with high
    frequencies are represented with short bit
    strings and those with lower frequencies are
    represented with longer bit strings.

15
Frequencies for the Letters A Through E
  • Letter Frequency()
  • A 25
  • B 15
  • C 10
  • D 20
  • E 30

16
Rules for Creating a Huffman Tree
  • Associate a one node binary tree with each
    character and assign the characters frequency or
    weight.
  • Look for the 2 lightest weight trees. If there
    are more than 2, choose among them randomly.
    Merge the 2 selected trees into a new tree with a
    new root node whose right and left subtrees are
    the selected trees. Assign the sum of the weights
    to the new tree.
  • Repeat step 2 until only one tree remains.

17
Merging Huffman Trees
18
Huffman Codes for the Letters A Through E
  • Letter Code
  • A 01
  • B 110
  • C 111
  • D 10
  • E 00

19
Receiving and Interpreting a Huffman-Coded Message
  • Example
  • Bit Stream Transmission
  • ?---------------------------
    --
  • (01110001110110110111)
  • First Sent -? A B E C A D B C ?-
    Last sent
  • Does it work?
  • (Notice that there is no space or any separator
    in the bit stream.)
  • Why?

20
Summary of Huffman Trees
  • Huffman coding assumes that receiver/sender
    computers have tables of codewords and
    corresponding ASCII codewords.
  • This information must be sent before the
    transmission. This is an overhead for this coding
    scheme.
  • Huffman codes reduce the number of bits to send,
    but they require that frequency values be known.
    These codes work best with repeatable patterns of
    bits such as those found in character data.

21
Run-Length Encoding
  • Many items that are transmitted over networks,
    such as binary files, fax data, and video
    signals, are not like character data therefore,
    some other form of encoding is needed.
  • An alternative is run-length encoding.
  • It uses a simple approach that analyses bit
    strings looking for a long run of a 0 or 1.
    Instead of sending all the bits, it sends only
    how many bits are in the run.
  • This techniques is especially useful for fax
    transmissions, which have 70 to 80 white space.

22
1st Example of Run-length Encoding
  • Consider a screen containing plain black text on
    a solid white background. There will be many long
    runs of white pixels in the blank space, and many
    short runs of black pixels within the text.
  • Let us take a single scan line, with B
    representing a black pixel and W representing
    white
  • WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWW
    WWBWWWWWWWWWWWWWW
  • If we apply a simple run-length code to the above
    scan line, we get the following
  • 12WB12W3B24WB14W

23
2nd Example of Run-length Encoding
24
Character Stripping
  • Removes the leading and trailing control
    characters from a message, and inserts them again
    at the receiving device.
  • This technique is often used with character
    compression and run-length encoding.

25
MNP5
  • MNP5 is a compression algorithm that is a
    combination of Huffman coding and run-length
    encoding.
  • It is often implemented in the hardware of
    transmission equipment.
Write a Comment
User Comments (0)
About PowerShow.com