Title: CSE 1520 Computer Use: Fundamentals
1- Week 4 Data Representation PART II
- READING Chapter 3
2Representing Real Numbers
EECS 1520 -- Computer Use Fundamentals
- Real numbers have a whole part and a fractional
part
- Example of real numbers in base 10 12.05, 35.75,
.
- We have learned how to convert the whole part to
binary digits, the question now is, how do
convert the fractional part to binary numbers?
3Representing Real Numbers
EECS 1520 -- Computer Use Fundamentals
12.45 1 101 2 100 4 10-1 5
10-2
6.451 6 100 4 10-1 5 10-2 1
10-3
- The positions to the right of the decimal point
work the same way, except that the powers are
negative.
4Representing Real Numbers
EECS 1520 -- Computer Use Fundamentals
- Same rules apply to base 2
- Example
What is the decimal representation of the binary
number 01010.110?
1 23 1 20 1 2-1 1 2-2
8 2 0.5 0.25 10.75
5Representing Real Numbers
EECS 1520 -- Computer Use Fundamentals
- How do we convert a real number in base 10 to its
binary representation?
What is the binary representation of the decimal
number 24.25?
2 steps 1. find the binary number of the
integer part 2. find the binary number of the
fractional part
In step 2, we multiply by the new base, the
fractional part of the result is then multiplied
by the new base. The process continues until the
fractional part of the result is zero
6Number systems Twos Complement
EECS 1520 -- Computer Use Fundamentals
Step 1 convert 24 to binary
Q R 24/2 12 0 12/2 6 0 6/2 3 0 3/2 1 1 1/2
0 1
11000
Step 2 convert 0.25 to binary
0.252 0.5 0.52 1.0
01
24.25 in binary is 11000.01
7Representing Real Numbers
EECS 1520 -- Computer Use Fundamentals
- What is the binary representation of the decimal
number 0.4?
0.42 0.8 0.82 1.6 0.62 1.2 0.22 0.4 0.42
0.8 0.82 1.6 . . . . . .
Starts repeating
- We can say the binary representation of 0.4 is
0.011001..
- The more bits that you get, the closer you will
get to 0.4
8Representing Real Numbers
EECS 1520 -- Computer Use Fundamentals
- Adding two unsigned binary numbers with the
fractional part
6.375
3.25
9.625
9Representing Real Numbers
EECS 1520 -- Computer Use Fundamentals
- Subtracting two unsigned binary numbers with the
fractional part
5.625
3.25
2.375
- We can look at the corresponding decimal
representation to confirm the final results.
10Representing Text
EECS 1520 -- Computer Use Fundamentals
- English language includes 26 letters
- Uppercase and lowercase letter have to be treated
separately - so 52 unique characters will be required
- Punctuation characters (i.e. ! ,)
- Numeric digits (i.e. actual character 0, 1,
9) - Etc
- Two character sets
- 1. ASCII Character Set
- 2. The Unicode Character Set
11ASCII Character Set
EECS 1520 -- Computer Use Fundamentals
- ASCII (American Standard Code for Information
Interchange) - Each character is coded as 1 byte (8 bits)
- The codes are expressed as decimal numbers, these
values are translated to their binary equivalent
for storage in the computer.
12ASCII Character Set
EECS 1520 -- Computer Use Fundamentals
- ASCII (American Standard Code for Information
Interchange) - Each character is coded as 1 byte (8 bits)
- For example, the character 5 is represented as
ASCII value 53 - 8-bit binary string 0011 0101
13ASCII Character Set
EECS 1520 -- Computer Use Fundamentals
- Given that the ASCII code for B is 66, expressed
as a decimal vale, what is the ASCII code, in
hexadecimal, for the letter G?
Characters in the ASCII table are arranged in
alphabetical order, hence If the character B
is 66 in decimal, .G will be 71 in decimal
71 in decimal is 47 in hexadecimal
14Unicode Character Set
EECS 1520 -- Computer Use Fundamentals
- ASCII character set provides 256 characters (i.e.
8 bits) - Only enough to cover the characters in English
- Unicode was designed to be a superset of ASCII,
that is, the first 256 characters in the Unicode
character set correspond exactly to the extended
ASCII character set
- 16-bit standard
- 65,536 possible codes ( 216)
15Representing Text
EECS 1520 -- Computer Use Fundamentals
- It is important to find ways to store and
transmit text efficiently
- Data compression is a reduction in the amount of
space needed to store a piece of data - Compression ratio is the size of the compressed
data divided by the size of the original data
16Representing Text
EECS 1520 -- Computer Use Fundamentals
- It is important to find ways to store and
transmit text efficiently
- 3 types of text compression
- Keyword encoding
- Run-length encoding
- Huffman encoding
17Representing Text Keyword Encoding
EECS 1520 -- Computer Use Fundamentals
- Idea substitute a frequently used word with a
single character
Word Symbol
as
the -
and
that
must
well
To do well on the test
22 characters (including space)
To do on - test
17 characters (including space)
Compression ratio 17/22 0.773
18Representing Text Keyword Encoding
EECS 1520 -- Computer Use Fundamentals
- Idea substitute a frequently used word with a
single character
Limitations - these characters can't be part of
the text - frequently used words tend to be
short, so not much compression - word variations
not handled The vs. the
19Representing Text Run-Length Encoding
EECS 1520 -- Computer Use Fundamentals
- Also called recurrence coding
- A single character may be repeated over and over
again in a long sentence. This type of
repetition doesnt generally take place in
English text, but can occur in large data streams
- Idea replace long series of a repeated character
with a count of the repetition - A sequence of repeated characters is replaced by
- a flag character
- Followed by the repeated character
- Followed by a single digit that indicates how
many times the character is repeated - Example AAAAAAA can be replaced with A7
20Representing Text Run-Length Encoding
EECS 1520 -- Computer Use Fundamentals
- Example replace AAAAAAA with A7
- The character A is represented as ASCII value
65 - So replace 01000001 01000001 01000001 01000001
01000001 01000001 01000001 - With 00101010 01000001 00000111
A
A
A
A
A
A
A
A
7
21Representing Text Run-Length Encoding
EECS 1520 -- Computer Use Fundamentals
- Idea substitute a frequently used word with a
single character
- Limitations
- Its not worth it to encode strings of two or
three (i.e. it takes 3 characters to encode a
repetition sequence) - In the case of two repeated characters, encoding
would actually make the string longer
22Representing Text Huffman Encoding
EECS 1520 -- Computer Use Fundamentals
- Idea Using variable-length bit strings to
represent each character
- The approach is to
- Use only a few bits to represent characters that
appear often and - Use longer bit strings for character that dont
appear often
Huffman Code Character
00 A
01 E
100 L
110 O
111 R
1010 B
1011 D
- The word BELL would be
- 1010 01 100 100
- Only 12 bits are required
- Compared to the fixed-size bit string, for
example, if 8 bits are required to represent each
character, it would need 32 bits - A compression ratio (vs ASCII) of 12/32 0.375!
23Representing Text Huffman Encoding
EECS 1520 -- Computer Use Fundamentals
- Idea Using variable-length bit strings to
represent each character
Huffman Code Character
00 A
01 E
100 L
110 O
111 R
1010 B
1011 D
How about decoding the bit string with Huffman
Encoding? For example, can we decode 0100111?
- 0100111 would be decoded into the word EAR
-
24Representing Audio Data
EECS 1520 -- Computer Use Fundamentals
- A stereo sends an electrical signal to a speaker
to produce sound, this signal is an analog
representation of the sound wave.
The magnitude of the voltage signal varies in
direct proportion to the sound wave
magnitude
25Representing Audio Data
EECS 1520 -- Computer Use Fundamentals
- To represent audio data on a computer, we must
digitize the sound wave data.
- That is, we take the electric signal that
represents the sound wave and represent it as a
series of discrete numeric values
Analog signal
Sampling period
To digitize the signal, we periodically measure
the voltage of the signal, a process called
Sampling
Higher sampling rate produces better-quality sound
26Representing Audio Data
EECS 1520 -- Computer Use Fundamentals
- A sampling rate of around 40,000 times per second
is enough to create a reasonable sound
reproduction
Part of the data is lost
Low sampling rate
High sampling rate
27Representing Audio Data
EECS 1520 -- Computer Use Fundamentals
- Popular audio data formats WAV, AU, MP3
- All use some form of compression
- MP3 offers the strongest compression ratio than
other formats
- MP3
- MP3 is short for MPEG-2, audio layer 3 file
- Uses both lossy and lossless compression
- Lossless bit stream compressed by a form of
Huffman encoding - Lossy uses mathematical models of human
psychoacoustics to discard information the
human ear cant hear
28Representing Images and Graphics
EECS 1520 -- Computer Use Fundamentals
- Representing Color
- 3 basic colors red, green, and blue
- Color is often expressed as an RGB
(red-green-blue) value, which is actually three
numbers that indicate the relative contribution
of each of these three primary colors - If each number in the triple is given on a scale
of 0 to 255, 0 means no contribution of that
color and 255 means full contribution
29Representing Images and Graphics
EECS 1520 -- Computer Use Fundamentals
- The amount of data that is used to represent a
color is called the color depth - For example, TrueColor indicates a 24-bit color
depth. With this scheme, each number in an RGB
value gets 8 bits, which gives the range of 0 to
255 for each - This results in the ability to represent more
than 16.7million unique colors!
30Representing Images and Graphics
EECS 1520 -- Computer Use Fundamentals
- For example, (255,255,0) means no contribution
from blue, and this corresponds to bright
yellow
TrueColor RGB values
Three-dimensional color space
31Representing Images and Graphics
EECS 1520 -- Computer Use Fundamentals
- A photograph is an analog representation of an
image, it is continuous across its surface - Digitizing a picture means representing the
picture as a collection of individual dots called
pixels - Each pixel is composed of a single color
- The number of pixels used to represent a picture
is called the resolution
32Representing Images and Graphics
EECS 1520 -- Computer Use Fundamentals
- Digitized Images and Graphics
- High resolution image
- (i.e. more pixels)
- Low resolution image
- (i.e. less pixels)
33Representing Images and Graphics
EECS 1520 -- Computer Use Fundamentals
- The storage of image information on a
pixel-by-pixel basis is called a raster-graphics
format.
- Several popular raster-graphics file formats are
bitmap (BMP), JPEG
34Representing Images and Graphics
EECS 1520 -- Computer Use Fundamentals
- Another technique for representing images is
called vector graphics. - Instead of assigning colors to pixels, vector
graphics format describes an image in terms of
lines and geometric shapes - For example, a vector graphics can be a series of
commands that describe a lines direction,
thickness and color. - Suitable for line art and cartoon-style drawing
35Representing Videos
EECS 1520 -- Computer Use Fundamentals
- Video (film) is a stream of images/frames (at 24
or 30 fps frame per second) - A video codec (COmpressor/DECompressor) refers to
the methods used to shrink the size of a movie
- Two types of compression in video codec
- Temporal compression keyframe series of delta
frames that record only changes from keyframe
(good if image changes little, i.e. such as a
scene that has little movement) - Spatial compression removes redundant info
within a frame (essentially jpeg like compression
on each frame). This compression often groups
pixels into blocks that have the same color (for
example, a portion of a clear blue sky). Instead
of storing each pixel, the color and the
coordinates of the area are stored.