Title: Data Formats
1Data Formats
Textbook Chapter 3
2Introduction
Input device
3Format must be Appropriate
- The internal representation must be appropriate
for the type of processing to take place (e.g.,
text, images, sound) - Problem Since computers store everything in
binary code, how does it know what a particular
stored item is?
4Rules/Conventions
- Proprietary formats
- Unique to a product or company
- E.g., Microsoft Word, Corel Word Perfect, IBM
Lotus Notes - Standards
- Evolve two ways
- Proprietary formats become de facto standards
(e.g., Adobe PostScript, Apple Quick Time) - Committee is struck to solve a problem (Motion
Pictures Experts Group, MPEG)
Text pg 63-64
5Standards Organizations
- ISO International Standards Organization
- CSA Canadian Standards Association
- ANSI American National Standards Institute
- IEEE Institute for Electrical and
Electronics Engineers - Etc. Et Cetera (Everybody Else)
Rv.kc
6Examples of Standards
Type of Data Standards
Alphanumeric ASCII, EBCDIC, Unicode
Image JPEG, GIF, PCX, TIFF
Motion picture MPEG-2, Quick Time
Sound Sound Blaster, WAV, AU
Outline graphics/fonts PostScript, TrueType, PDF
Hint - Learn What kind is which!
7Why Standards?
- Standards are arbitrary
- They exist because they are
Convenient Efficient Flexible
Appropriate
Plus, they provide some consistency
and predictability for applications.
Rv.kc
8Alphanumeric Data
- Problem Distinguishing between the number 123
(one hundred twenty-three) and the characters
123 (one, two, three) - In software data is given a type
- Four standards for representing letters (alpha)
and numbers - BCD Binary-coded decimal
- ASCII American standard code for information
interchange - EBCDIC Extended binary-coded decimal
interchange code - Unicode
pp. Old 63-69 Rev 65-72
R/kc
9Standard Alphanumeric Formats
Next 2 slides
10Binary-Coded Decimal (BCD)
Digit Bit pattern
0 0000
1 0001
2 0010
3 0011
4 0100
5 0101
6 0110
7 0111
8 1000
9 1001
Note the following bit patterns are not
used 1010 1011 1100 1101 1110 1111
11Example
7 0 9 3 0111 0000 1001 0011
12Standard Alphanumeric Formats
Next 22 slides
13The Problem
- Representing text strings, such as Hello,
world, in a computer
After all, computers store binary digits, not
letters!
14Codes and Characters
- Each character is coded as a byte
- Most common coding system is ASCII (Pronounced
ass-key) - ASCII American National Standard Code for
Information Interchange - Defined in ANSI document X3.4-1977
15ASCII Features
- 7-bit code
- 8th bit is unused (or used for a parity bit or
to indicate extended character set) - 27 128 codes
- Two general types of codes
- 95 are Graphic codes (displayable on a console)
- 33 are Control codes (control features of the
console or communications channel)
R/kc
16Hint Memorize codes for blank space,
period, digit zero (0), capital
A, small a, carriage return (CR)
/Kc
17ASCII Chart
Book - page 67, Figure 3.3 (In decimal)
18(No Transcript)
19Most significant bit
Least significant bit
20e.g., a 1100001
2195 Graphic codes
2233 Control codes
See text page 69 / 71 for details
23Alphabetic codes
24Numeric codes
25Punctuation, etc.
26Hello, world Example
27Common Control Codes
- CR 0D carriage return
- LF 0A line feed
- HT 09 horizontal tab
- DEL 7F delete
- NULL 00 null
Hexadecimal code
28(No Transcript)
29Terminology
- Learn the names of the special symbols
- brackets
- braces
- ( ) parentheses
- _at_ commercial at sign
- ampersand
- tilde
30(No Transcript)
31Escape Sequences
- Extend the capability of the ASCII code set
- For controlling terminals and formatting output
- Defined by ANSI in documents X3.41-1974 and
X3.64-1977 - The escape code is ESC 1B16
- An escape sequence begins with two codes ESC
1B16
5B16
32Examples
- Erase display ESC 2 J
- Erase line ESC K
33Standard Alphanumeric Formats
Next slide
34EBCDIC
- Extended BCD Interchange Code (pronounced
ebb-se-dick) - 8-bit code
- Developed by IBM
- Rarely used today
- IBM mainframes only
35Standard Alphanumeric Formats
Next 2 slides
36Unicode
- 16-bit standard
- Developed by a consortia
- Intended to supercede older 7- and 8-bit codes
37Unicode Version 2.1
- 1998
- Improves on version 2.0
- Includes the Euro sign (20AC16 )
- From the standard
contains 38,887 distinct coded characters
derived from the supported scripts. These
characters cover the principal written languages
of the Americas, Europe, the Middle East, Africa,
India, Asia, and Pacifica.
http//www.unicode.org
38Keyboard Input
- Key (scan) codes are converted to ASCII
- ASCII code sent to host computer
- Received by the host as a stream of data
- Stored in buffer
- Processed
- Etc.
pp. Old 69 Rev 72
39Shift Key
- inhibits bit 5 in the ASCII code
Key(s) ASCII code 6 5 4 3 2 1 0 Character
1 1 0 0 0 0 1 1 0 0 0 0 0 1 a A
a
a
Shift
40Control Key
- inhibits bits 5 6 in the ASCII code
Key(s) ASCII code 6 5 4 3 2 1 0 Character
1 1 0 0 0 1 1 0 0 0 0 0 1 1 c ETX
c
c
Ctrl
Controlcode
41Other Input
- OCR optical character recognition
- Bar code readers
- Voice/audio input
- Punched cards
- Images / objects
- Pointing devices
pp. Old 69-86 Rev 72-89
42OCR
Optical scan
10110110
Page of text
Computer file
43Other Input
- OCR optical character recognition
- Bar code readers
- Voice/audio input
- Punched cards
- Images / objects
- Pointing devices
pp. 69-86
44Bar Codes
- An automatic identification (Auto ID) technology
that streamlines identification and data
collection - See http//www.digital.net/barcoder/barcode.html
45Other Input
- OCR optical character recognition
- Bar code readers
- Voice/audio input
- Punched cards
- Images / objects
- Pointing devices
pp. 69-86
46Voice/audio Input
- Input device microphone
- Audio input is digitized and stored
- Processed in two ways
- As is (no recognition)
- Recognized and converted to alphanumeric data
(ASCII)
Digitize
10110010
47Other Input
- OCR optical character recognition
- Bar code readers
- Voice/audio input
- Punched cards
- Images / objects
- Pointing devices
pp. 69-86
48Punched Cards
- Invented by Herman Hollerith (founder of IBM)
- Each card holds 80 characters
49Other Input
- OCR optical character recognition
- Bar code readers
- Voice/audio input
- Punched cards
- Images / objects
- Pointing devices
pp. 69-86
50Images
- Typically images are pictures that are optically
scanned and saved as a bit map or in some other
format - Many formats
- gif, jpeg,
Note animated gifs often used on ww web.
51Typical Save As Dialog
52Objects
- Images made of geometrically definable shapes
example MS Paint software. - Offer efficiency, flexibility, small size, etc.
53Other Input
- OCR optical character recognition
- Bar code readers
- Voice/audio input
- Punched cards
- Images / objects
- Pointing devices
pp. 69-86
54Pointing Devices
- Originally used for specifying coordinates (x, y)
for graphical input - Today used as general purpose device for
graphical user interfaces (GUIs)
55Thank you