Title: Data Formats
1Data Formats
Textbook Chapter 3
2Figure 3.1 Data conversion and representation
3Introduction
Input device
4Format must be Appropriate
- The internal representation must be appropriate
for the type of processing to take place (e.g.,
text, images, sound) - Problem Since computers store everything in
binary code, how does it know what a particular
stored item is?
5Rules/Conventions
- Proprietary formats
- Unique to a product or company
- E.g., Microsoft Word, Corel Word Perfect, IBM
Lotus Notes - Standards
- Evolve two ways
- Proprietary formats become de facto standards
(e.g., Adobe PostScript, Apple Quick Time) - Committee is struck to solve a problem (Motion
Pictures Experts Group, MPEG)
Text pg 63-64
6Standards Organizations
- ISO International Standards Organization
- CSA Canadian Standards Association
- ANSI American National Standards Institute
- IEEE Institute for Electrical and
Electronics Engineers - Etc. Et Cetera (Everybody Else)
Rv.kc
7Examples of Standards
Hint - Learn What kind is which!
8Why Standards?
- Standards are arbitrary
- They exist because they are
Convenient Efficient Flexible
Appropriate
Plus, they provide some consistency
and predictability for applications.
Rv.kc
9Alphanumeric Data
- Problem Distinguishing between the number 123
(one hundred twenty-three) and the characters
123 (one, two, three) - In software data is given a type
- Four standards for representing letters (alpha)
and numbers - BCD Binary-coded decimal
- ASCII American standard code for information
interchange - EBCDIC Extended binary-coded decimal
interchange code - Unicode
pp. Old 63-69 Rev 65-72
R/kc
10Standard Alphanumeric Formats
Next 2 slides
11Binary-Coded Decimal (BCD)
Note the following bit patterns are not
used 1010 1011 1100 1101 1110 1111
12Example
7 0 9 3 0111 0000 1001 0011
13Standard Alphanumeric Formats
Next 22 slides
14The Problem
- Representing text strings, such as Hello,
world, in a computer
After all, computers store binary digits, not
letters!
15Codes and Characters
- Each character is coded as a byte
- Most common coding system is ASCII (Pronounced
ass-key) - ASCII American National Standard Code for
Information Interchange - Defined in ANSI document X3.4-1977
16ASCII Features
- 7-bit code
- 8th bit is unused (or used for a parity bit or
to indicate extended character set) - 27 128 codes
- Two general types of codes
- 95 are Graphic codes (displayable on a console)
- 33 are Control codes (control features of the
console or communications channel)
R/kc
17Hint Memorize codes for blank space,
period, digit zero (0), capital
A, small a, carriage return (CR)
/Kc
18ASCII Chart
Book - page 67, Figure 3.3 (In decimal)
19(No Transcript)
20Most significant bit
Least significant bit
21e.g., a 1100001
2295 Graphic codes
2333 Control codes
See text page 69 / 71 for details
24Alphabetic codes
25Numeric codes
26Punctuation, etc.
27Hello, world Example
28Common Control Codes
- CR 0D carriage return
- LF 0A line feed
- HT 09 horizontal tab
- DEL 7F delete
- NULL 00 null
Hexadecimal code
29(No Transcript)
30Terminology
- Learn the names of the special symbols
- brackets
- braces
- ( ) parentheses
- _at_ commercial at sign
- ampersand
- tilde
31(No Transcript)
32Escape Sequences
- Extend the capability of the ASCII code set
- For controlling terminals and formatting output
- Defined by ANSI in documents X3.41-1974 and
X3.64-1977 - The escape code is ESC 1B16
- An escape sequence begins with two codes ESC
1B16
5B16
33Examples
- Erase display ESC 2 J
- Erase line ESC K
34Standard Alphanumeric Formats
Next slide
35EBCDIC
- Extended BCD Interchange Code (pronounced
ebb-se-dick) - 8-bit code
- Developed by IBM
- Rarely used today
- IBM mainframes only
36Standard Alphanumeric Formats
Next 2 slides
37Unicode
- 16-bit standard
- Developed by a consortia
- Intended to supercede older 7- and 8-bit codes
38Unicode Version 2.1
- 1998
- Improves on version 2.0
- Includes the Euro sign (20AC16 )
- From the standard
contains 38,887 distinct coded characters
derived from the supported scripts. These
characters cover the principal written languages
of the Americas, Europe, the Middle East, Africa,
India, Asia, and Pacifica.
http//www.unicode.org
39Keyboard Input
- Key (scan) codes are converted to ASCII
- ASCII code sent to host computer
- Received by the host as a stream of data
- Stored in buffer
- Processed
- Etc.
pp. Old 69 Rev 72
40Figure 3.7 Keyboard operation
41Shift Key
- inhibits bit 5 in the ASCII code
a
a
Shift
42Control Key
- inhibits bits 5 6 in the ASCII code
c
c
Ctrl
Controlcode
43Other Input
- OCR optical character recognition
- Bar code readers
- Voice/audio input
- Punched cards
- Images / objects
- Pointing devices
pp. Old 69-86 Rev 72-89
44OCR
Optical scan
10110110
Page of text
Computer file
45Other Input
- OCR optical character recognition
- Bar code readers
- Voice/audio input
- Punched cards
- Images / objects
- Pointing devices
pp. 69-86
46Bar Codes
- An automatic identification (Auto ID) technology
that streamlines identification and data
collection - See http//www.digital.net/barcoder/barcode.html
47Other Input
- OCR optical character recognition
- Bar code readers
- Voice/audio input
- Punched cards
- Images / objects
- Pointing devices
pp. 69-86
48Voice/audio Input
- Input device microphone
- Audio input is digitized and stored
- Processed in two ways
- As is (no recognition)
- Recognized and converted to alphanumeric data
(ASCII)
Digitize
10110010
49Figure 3.15 Digitizing an audio waveform
50Figure 3.16 .WAV sound format
51Other Input
- OCR optical character recognition
- Bar code readers
- Voice/audio input
- Punched cards
- Images / objects
- Pointing devices
pp. 69-86
52Punched Cards
- Invented by Herman Hollerith (founder of IBM)
- Each card holds 80 characters
53(No Transcript)
54Other Input
- OCR optical character recognition
- Bar code readers
- Voice/audio input
- Punched cards
- Images / objects
- Pointing devices
pp. 69-86
55Images
- Typically images are pictures that are optically
scanned and saved as a bit map or in some other
format - Many formats
- gif, jpeg,
Note animated gifs often used on www web.
56Typical Save As Dialog
57Figure 3.10 GIF screen layout
58Figure 3.11 GIF file format layout
59Objects
- Images made of geometrically definable shapes
example MS Paint software. - Offer efficiency, flexibility, small size, etc.
60Figure 3.12 An object image
61Figure 3.13 A PostScript program
62Figure 3.14 Another PostScript program
63Other Input
- OCR optical character recognition
- Bar code readers
- Voice/audio input
- Punched cards
- Images / objects
- Pointing devices
pp. 69-86
64Pointing Devices
- Originally used for specifying coordinates (x, y)
for graphical input - Today used as general purpose device for
graphical user interfaces (GUIs)
65Thank you