Title: Introduction to IT 1, 3 Lecture 3: Data Representation
1Introduction to IT (1), (3)Lecture 3 Data
Representation
Dr. Haipeng Guo United International College
Fall, 2006
2Outline
- Distinguish between analog and digital
information - Explain data compression and calculate
compression ratios - Explain the binary formats for negative values
- Describe the characteristics of the ASCII and
Unicode character sets - Explain the nature of sound and its
representation. - Explain how RGB values define a color.
- Explain how to represent images graphics.
- Explain how to represent video.
3Data Representation
- Data comes in many forms
- Numbers 235, 11.01, -24,
- Text hello, world! ??!
- Audio .mp3
- Images and graphics .bmp, gif, JPEG
- Video .avi
- All of the data is stored in computers as binary
digits - Data must be represented in a way that
- Captures the essence of the information
- And in a form that is convenient for computer
processing
4Data Compression
- Data compression
- Reduction in the amount of space needed to store
a piece of data. - Compression ratio
- The size of the compressed data divided by the
size of the original data. - A data compression techniques can be
- lossless, which means the data can be retrieved
without any loss of the original information, - lossy, which means some information may be lost
in the process of compaction.
5WinRAR
- Currently the best archiver
- WinRAR Tutorial
- http//users.pandora.be/soulmaniacs/winrar.html
6Analog and Digital Information
- Computers are finite. Computer memory and other
hardware devices have only so much room to store
and manipulate a certain amount of data. - The goal is to represent enough of the world to
satisfy our computational needs and our senses of
sight and sound.
7Analog and Digital Information
- Information can be represented in one of two
ways analog or digital. - Analog data A continuous representation,
analogous to the actual information it
represents. - Digital data A discrete representation,
breaking the information up into separate
elements. -
- A mercury thermometer is an analog device. The
mercury rises in a continuous flow in the tube in
direct proportion to the temperature.
8Analog Data
- A mercury thermometer is an analog device. The
mercury rises in a continuous flow in the tube in
direct proportion to the temperature.
9Analog and Digital Information
- Computers, cannot work well with analog
information. So we digitize information by
breaking it into pieces and representing those
pieces separately. - Why do we use binary?
- Modern computers are designed to use and manage
binary values because the devices that store and
manage the data are far less expensive and far
more reliable if they only have to represent on
of two possible values.
10Electronic Signals (Contd)
- An analog signal continually fluctuates in
voltage up and down. But a digital signal has
only a high or low state, corresponding to the
two binary digits. - All electronic signals (both analog and digital)
degrade as they move down a line. That is, the
voltage of the signal fluctuates due to
environmental effects.
11Analog and Digital Information
- Periodically, a digital signal is reclocked to
regain its original shape.
An analog and a digital signal
Degradation of analog and digital signals
12Binary Representation
- One bit can be either 0 or 1. Therefore, one bit
can represent only two things. - To represent more than two things, we need
multiple bits. Two bits can represent four things
because there are four combinations of 0 and 1
that can be made from two bits 00, 01, 10,11.
13Binary Representation
14Binary Representation
- In general, n bits can represent 2n things
because there are 2n combinations of 0 and 1 that
can be made from n bits. Note that every time we
increase the number of bits by 1, we double the
number of things we can represent. - Questions
- How many bits are needed to represent 128 things?
- How many bits are needed to represent 67 things?
15Representing Negative Values
- You have used the signed-magnitude representation
of numbers since grade school. - The sign represents the ordering,
- and the digits represent the magnitude of the
number.
16Representing Negative Values
- problem with the sign-magnitude representation.
- There are two representations of zero. There is
plus zero and minus zero. Two representations of
zero within a computer can cause unnecessary
complexity. - If we allow only a fixed number of values, we can
represent numbers as just integer values, where
half of them represent negative numbers.
17Representing Negative Values
- For example, if the maximum number of decimal
digits we can represent is two, we can let 1
through 49 be the positive numbers 1 through 49
and let 50 through 99 represent the negative
numbers -50 through -1. - This representation of negative numbers is called
the tens complement.
18Advantages of Using 10s Complement
- To perform addition within this scheme, you just
add the numbers together and discard any carry.
19Advantages of Using 10s Complement
- A-BA(-B). We can subtract one number from
another by adding the negative of the second to
the first. - Addition and subtraction become same
202s Complement
- 3 bits
- 000 0
- 001 1
- 010 2
- 011 3
- 100 -4
- 101 -3
- 110 -2
- 111 -1
21Overflow
- Overflow occurs when the value that we compute
cannot fit into the number of bits we have
allocated for the result. - For example, if each value is stored using eight
bits, adding 127 to 3 overflows. - Overflow is a classic example of the type of
problems we encounter by mapping an infinite
world onto a finite machine.
22Overflow
1111111 0000011 10000010
127 3
23Representing Text
- A text document can be decomposed into chapters,
paragraphs, sentences, words, and ultimately
individual characters. - To represent a text document in digital form, we
simply need to be able to represent every
character that may appear. - In English, a, b, , z, A, B,Z
- The general approach for representing characters
is to list them all and assign each a binary
string. - a ? (01100001)2 ? (97)10 ? 61h
24Character Set
- A character set is a list of characters and the
codes used to represent each one. - By agreeing to use a particular character set,
computer manufacturers have made the processing
of text data easier. - ASCII, Unicode, etc.
25ASCII
- ASCII stands for American Standard Code for
Information Interchange. - The ASCII character set originally used seven
bits to represent each character, allowing for
128 unique characters. - Later ASCII evolved so that all eight bits were
used which allows for 256 characters
26ASCII
27ASCII
- Note that the first 32 characters in the ASCII
character chart do not have a simple character
representation that you could print to the
screen.
28The Unicode Character Set
- The extended version of the ASCII character set
is not enough for international use. - The Unicode character set uses 16 bits per
character. Therefore, the Unicode character set
can represent 216, or over 65 thousand,
characters. - Unicode was designed to be a superset of ASCII.
That is, the first 256 characters in the Unicode
character set correspond exactly to the extended
ASCII character set.
29Unicode
30Representing
- We perceive sound when a series of air
compressions vibrate a membrane in our ear, which
sends signals to our brain. - A stereo sends an electrical signal to a speaker
to produce sound. This signal is an analog
representation of the sound wave. The voltage in
the signal varies in direct proportion to the
sound wave.
31Representing Audio Information
- We perceive sound when a series of air
compressions vibrate a membrane in our ear, which
sends signals to our brain. - A stereo sends an electrical signal to a speaker
to produce sound. This signal is an analog
representation of the sound wave. The voltage in
the signal varies in direct proportion to the
sound wave.
32Representing Audio Information
- To digitize the signal we periodically measure
the voltage of the signal and record the
appropriate numeric value. The process is called
sampling. - In general, a sampling rate of around 40,000
times per second is enough to create a reasonable
sound reproduction.
33Representing Audio Information
34Representing Audio Information
- A compact disk (CD) stores
- audio information digitally
- On the surface of the CD are
- microscopic pits that represent
- Binary digits
- A low intensity laser is pointed
- as the disc.
- The laser light reflects strongly
- if the surface is smooth and
- reflects poorly if the surface is pitted.
35Representing Audio Information
- Audio Formats
- WAV, AU, AIFF, VQF, and MP3.
- MP3 is dominant
- MP3 is short for MPEG-2, audio layer 3 file.
- MP3 employs both lossy and lossless compression.
- First it analyzes the frequency spread and
compares it to mathematical models of human
psychoacoustics (the study of the interrelation
between the ear and the brain), then it discards
information that cant be heard by humans. Then
the bit stream is compressed to achieve
additional compression.
36Representing Color
- Color is our perception of the various
frequencies of light that reach the retinas of
our eyes. - Our retinas have three types of color
photoreceptor cone cells that respond to
different sets of frequencies. - These photoreceptor categories correspond to the
colors of red, green, and blue.
37Representing Color
- Color is often expressed in a computer as an RGB
(red-green-blue) value, which is actually three
numbers that indicate the relative contribution
of each of these three primary colors. - For example, an RGB value of (255, 255, 0)
maximizes the contribution of red and green, and
minimizes the contribution of blue, which results
in a bright yellow.
38Three Dimension Color Space
(0,0,0)
(1,1,1)
39Representing Images and Graphics
- The amount of data that is used to represent a
color is called the color depth. - HiColor is a term that indicates a 16-bit color
depth. Five bits are used for each number in an
RGB value and the extra bit is sometimes used to
represent transparency. - TrueColor indicates a 24-bit color depth.
Therefore, each number in an RGB value gets eight
bits.
40Indexed Color
- A particular application such as a browser
- may support only a certain number of
- specific colors, creating a palette from
- which to choose.
- For example
41Digitized Images and Graphics
- Digitizing a picture is the act of representing
it as a collection of individual dots called
pixels. - The number of pixels used to represent a picture
is called the resolution. - The storage of image information on a
pixel-by-pixel basis is called a raster-graphics
format. - Several popular raster file formats including
bitmap (BMP), GIF, and JPEG.
42BMP
43Digitized Images and Graphics
High Resolution
44Digitized Images and Graphics
Low Resolution
45Representing Video
- A video codec (COmpressor/DECompressor) refers to
the methods used to shrink the size of a movie to
allow it to be played on a computer or over a
network. - Almost all video codecs use lossy compression to
minimize the huge amounts of data associated with
video. - The goal is not to lose information that affects
the viewer's senses.
46Video Players
- QuickTime Player (Apple)
- Real Player
- VLC media player
- Microsoft Media Player
47Summary
- Distinguish between analog and digital
information - Explain the binary formats for negative values
- Describe the characteristics of the ASCII and
Unicode character sets - Explain the nature of sound and its
representation. - Explain how RGB values define a color.
- Representing Audio Information
- Representing Images Graphics
- Representing Video Information