Data Formats - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

Data Formats

Description:

Define the different ways human data may be represented, stored and processed by ... Plus small set of accents and other European special characters (Latin-I ASCII) 13 ... – PowerPoint PPT presentation

Number of Views:262
Avg rating:3.0/5.0
Slides: 72
Provided by: Khai
Category:
Tags: accents | data | formats

less

Transcript and Presenter's Notes

Title: Data Formats


1
Lecture 3
ITEC 1000 Introduction to Information Technology
  • Data Formats

objectimagegallery.com
Prof. Peter Khaiter
2
Lecture Template
  • Data Forms
  • Data conversion and representation
  • Data Formats
  • Alphanumeric Data
  • Image Data
  • Audio Data
  • Data Input
  • Data Compression
  • Internal Computer Data Format

3
Data Forms
  • Human communication
  • Includes language, images and sounds
  • Computers
  • Process and store all forms of data in binary
    format
  • Conversion to computer-usable representation
    using data formats
  • Define the different ways human data may be
    represented, stored and processed by a computer

4
Data conversion and representation
5
Data formats
  • Proprietary formats
  • Unique to a product or company
  • E.g., Microsoft Word, Word Perfect
  • Standards (evolve in two ways)
  • Proprietary formats become de facto standards
    (e.g., Adobe PostScript)
  • Invented by an international standard
    organization (e.g., Motion Pictures Experts
    Group, MPEG)

6
Common Data Representations
7
Alphanumeric Data
  • Characters (r, T), number digits (0..9),
    punctuation (!, ), special purpose characters
    (, )
  • Four codes/standards to represent letters and
    numbers
  • BCD (Binary-Coded Decimal)
  • Unicode
  • ASCII (American Standard Code for Information
    Interchange)
  • EBCDIC (Extended Binary Coded Decimal Interchange
    Code)

8
Standard Alphanumeric Formats
  • BCD
  • ASCII
  • EBCDIC
  • Unicode

Next 2 slides
9
Binary-Coded Decimal (BCD)
  • Four bits per digit

Note the following 6 bit patterns are not
used 1010 1011 1100 1101 1110 1111
10
BCD Example
  • 709310 ? (in BCD)

7 0 9 3 0111 0000 1001 0011
11
Standard Alphanumeric Formats
  • BCD
  • ASCII
  • EBCDIC
  • Unicode

Next 13 slides
12
ASCII Features
  • Developed by ANSI (American National Standards
    Institute)
  • Defined in ANSI document X3.4-1977
  • 7-bit code
  • 8th bit is unused (or used for a parity bit or to
    indicate extended character set)
  • 27 128 different codes
  • Two general types of codes
  • 95 are Printing codes (displayable on a
    console)
  • 33 are Control codes (control features of the
    console or communications channel)
  • Represents
  • Latin alphabet, Arabic numerals, standard
    punctuation characters
  • Plus small set of accents and other European
    special characters (Latin-I ASCII)

13
ASCII Table
14
ASCII Table
Most significant bit
Least significant bit
15
ASCII Table
e.g., a 1100001
16
ASCII Table
95 Printing codes
17
ASCII Table
33 Control codes
18
ASCII Table
Alphabetic codes
19
ASCII Table
Numeric codes
20
ASCII Table
Punctuation, etc.
21
ASCII Table
7416 111 0100
22
Example Hello, world
Hex
919766CDEB1077DFCB664
1001000 1100101 1101100 1101100 1101111 0101100
9 1 9 7 6 6 C
D E B 0100000 1110111 1101111 1110010
1101100 1100100 1 0 7 7 D
F C B 6 6 4
23
Common Control Codes
  • CR 0D carriage return
  • LF 0A line feed
  • HT 09 horizontal tab
  • DEL 7F delete
  • NULL 00 null

Hexadecimal code
24
ASCII Table Common Control Codes
25
Standard Alphanumeric Formats
  • BCD
  • ASCII
  • EBCDIC
  • Unicode

Next 3 slides
26
EBCDIC
  • 8-bit code
  • Developed by IBM
  • IBM and compatible mainframes only
  • Rarely used today (common in archival data)
  • Character codes differ from ASCII
  • Conversion software to/from ASCII available

27
EBCDIC Table (1 out of 2)
28
EBCDIC Table (2 out of 2)
29
Standard Alphanumeric Formats
  • BCD
  • ASCII
  • EBCDIC
  • Unicode

Next 2 slides
30
Unicode
  • Most common 16-bit form represents 65,536
    characters
  • ASCII Latin-I subset of Unicode
  • Values 0 to 255 in Unicode table
  • Multilingual defines codes for
  • Nearly every character-based alphabet
  • Large set of ideographs for Chinese, Japanese and
    Korean
  • Composite characters for vowels and syllabic
    clusters required by some languages
  • Allows software modifications for local-languages

31
Two-byte Unicode Assignment Table
32
Collating Sequence
  • Collating Sequence the order of the codes in
    the representation table
  • Determines sorting and selection of the
    alphanumeric data
  • Collating Sequences are different in ASCII and
    EBCDIC
  • Small letters precede capitals in EBCDIC reverse
    in ASCII
  • Numbers collate first in ASCII in EBCDIC, last

33
Two Classes of Codes
  • Printing characters
  • Produced output on the screen or printer
  • Control characters
  • Control position of output on screen or printer
  • Cause action to occur
  • Communicate status between computer and I/O
    device

34
Control Code Definitions (ASCII Table)
35
Escape Sequences
  • Extend the capability of the ASCII code set
  • For controlling terminals and formatting output
  • Defined by ANSI in documents X3.41-1974 and
    X3.64-1977
  • The escape code is ESC 1B16
  • An escape sequence begins with two codes ESC

1B16
5B16
36
Escape Sequences Examples
  • Erase display ESC 2 J
  • Erase line ESC K

37
Alphanumeric Input Keyboard
  • Scan code
  • Two different binary scan codes generated
  • when key is struck and when key is released
  • Converted to Unicode, ASCII or EBCDIC by software
    in terminal or PC
  • Received by the host as a stream of text and
    other characters, i.e. in the sequence typed
  • Advantage
  • Easily adapted to different languages or keyboard
    layout
  • Separate scan codes for key press/release for
    multiple key combinations
  • Examples shift and control keys

38
Shift Key
  • inhibits bit 5 in the ASCII code

a
a
Shift
39
Control Key
  • inhibits bits 5 6 in the ASCII code

c
c
Ctrl
Controlcode
40
Keyboard Input
  • Three letters are typed D, I, R, followed
    by the carriage return
  • Four scan codes translated to ASCII binary codes
    1000100, 1001001, 1010010, 0001101

41
OCR (optical character recognition)
  • Scans text and inputs it as character data
  • Special OCR software required
  • Used to read specially encoded characters
  • Example magnetically printed check numbers
  • Attempts to recognize hand-written input
    (limited, only carefully printed)

42
Bar Code Readers
  • Used in applications that require fast, accurate
    and repetitive input with minimal employee
    training
  • Examples supermarket checkout counters and
    inventory control
  • Alphanumeric data in bar code (i.e., 780471
    108801 90000) read optically using wand that
    converts them into electrical binary signals
  • A bar code translation module converts the binary
    input into a sequence of number codes , one code
    per digit, then translated to Unicode or ASCII.

43
Other Alphanumeric Input
  • Magnetic stripe reader alphanumeric data from
    credit cards
  • Voice
  • Digitized audio recording common but conversion
    to alphanumeric data difficult
  • Requires knowledge of sound patterns in a
    language (phonemes) plus rules for pronunciation,
    grammar, and syntax

44
Image Data
  • Photographs, figures, icons, drawings, charts and
    graphs
  • Two approaches
  • Bitmap or raster images of photos and paintings
    with continuous variation (e.g., GIF, JPEG)
  • Object or vector images composed of graphical
    shapes like lines and curves defined
    geometrically
  • Differences include
  • Quality of the image
  • Storage space required
  • Time to transmit
  • Ease of modification

45
Image Input
  • Image scanning (moves over the image converting
    dot by dot into a stream of binary numbers,
    pixels, representing black or white, or levels of
    gray, or of a colour) bitmap image
  • Digital/video cameras bitmap image
  • Pointing devices (mouse, pen)- object image

46
Bitmap Images
  • Each individual pixel (pi(x)cture element) in a
    graphic stored as a binary number
  • Pixel A small area with associated coordinate
    location
  • Example each point below represented by a 4-bit
    code corresponding to 1 of 16 shades of gray

47
Bitmap Display
  • Monochrome black or white
  • 1 bit per pixel
  • Gray scale black, white or 254 shades of gray
  • 1 byte per pixel
  • Color graphics 16 colors, 256 colors, or 24-bit
    true color (16.7 million colors)
  • 4, 8, and 24 bits respectively

48
Storing Bitmap Images
  • Frequently large files
  • Example 600 rows of 800 pixels with 1 byte for
    each of 3 colors 1.5MB file
  • File size affected by
  • Resolution (the number of pixels per inch)
  • Amount of detail affecting clarity and sharpness
    of an image
  • Levels number of bits for displaying shades of
    gray or multiple colors
  • Palette color translation table that uses a code
    for each pixel rather than actual color value
  • Data compression

49
GIF (Graphics Interchange Format)
  • First developed by CompuServe in 1987
  • GIF89a enabled animated images
  • allows images to be displayed sequentially at
    fixed time sequences
  • Color limitation 256
  • Image compressed by LZW (Lempel-Zif-Welch)
    algorithm
  • Preferred for line drawings, clip art and
    pictures with large blocks of solid color
  • Lossless compression

50
GIF (Graphics Interchange Format)
51
JPEG (Joint Photographers Expert Group)
  • Allows more than 16 million colors
  • Suitable for highly detailed photographs and
    paintings
  • Employs special compression algorithm that
  • Discards data to decrease file size and
    transmission speed
  • May reduce image resolution, tends to distort
    sharp lines

52
Other Bitmap Formats
  • TIFF (Tagged Image File Format) .tif (pronounced
    tif)
  • Used in high-quality image processing,
    particularly in publishing
  • BMP (BitMaPped) .bmp (pronounced dot bmp)
  • Device-independent format for Microsoft Windows
    environment pixel colors stored independent of
    output device
  • PCX .pcx (pronounced dot p c x)
  • Windows Paintbrush software
  • PNG (Portable Network Graphics) .png
    (pronounced ping)
  • Designed to replace GIF and JPEG for Internet
    applications
  • Patent-free
  • Improved lossless compression
  • No animation support

53
Object Images
  • Created by drawing packages or output from
    spreadsheet data graphs
  • Composed of lines and shapes in various colors
  • Computer translates geometric formulas to create
    the graphic
  • Storage space depends on image complexity
  • number of instructions to create lines, shapes,
    fill patterns
  • Movies Shrek and Toy Story use object images

54
Object Images
  • Based on mathematical formulas
  • Easy to move, scale and rotate without losing
    shape and identity as bitmap images may
  • Require less storage space than bitmap images
  • Cannot represent photos or paintings
  • Cannot be displayed or printed directly
  • Must be converted to bitmap since output devices
    except plotters are bitmap

55
Popular Object Graphics Software
  • Most object image formats are proprietary
  • Files extensions include .wmf, .dxf, .mgx, and
    .cgm
  • Macromedia Flash low-bandwidth animation
  • Micrographx Designer technical drawings to
    illustrate products
  • CorelDraw vector illustration, layout, bitmap
    creation, image-editing, painting and animation
    software
  • Autodesk AutoCAD for architects, engineers,
    drafters, and design-related professionals
  • W3C SVG (Scalable Vector Graphics) based on XML
    Web description language
  • Not proprietary

56
PostScript
  • Page description language list of procedures and
    statements that describe each of the objects to
    be printed on a page
  • Stored in ASCII or Unicode text file
  • Interpreter program in computer or output device
    reads PostScript to generate image
  • Scalable font support
  • Font outline objects specified like other objects

57
PostScript Program
58
Representing Characters as Images
  • Characters stored in format like Unicode or ASCII
  • Text processed and stored primarily for content
  • Presentation requirements like font stored with
    the character
  • Text appearance is primary factor
  • Example screen fonts in Windows
  • Glyphs Macintosh coding scheme that includes
    both identification and presentation requirement
    for characters

59
Bitmap vs. Object Images
60
Video Images
  • Require massive amount of data
  • Video camera producing full screen 640 x 480
    pixel true color image at 30 frames/sec
    27.65 MB of data/sec
  • 1-minute film clip 1.6 GB storage
  • Options for reducing file size decrease size of
    image, limit number of colors, reduce frame rate
  • Method depends on how video delivered to users
  • Streaming video video displayed as it is
    downloaded from the Web server
  • Example video conferencing
  • Local data (file on DVD or downloaded onto
    system) for higher quality
  • MPEG-2 movie quality images with high
    compression require substantial processing
    capability

61
Audio Data
  • Transmission and processing requirements less
    demanding than those for video
  • Waveform audio digital representation of sound
  • MIDI (Musical Instrument Digital Interface)
    instructions to recreate or synthesize sounds
  • Analog sound converted to digital values by
    A-to-D converter

62
Waveform Audio
Sampling rate normally 50KHz
63
Sampling Rate
  • Number of times per second that sound is measured
    during the recording process.
  • 1000 samples per second 1 KHz (kilohertz)
  • Example Audio CD sampling rate 44.1KHz
  • Height of each sample saved as
  • 8-bit number for radio-quality recordings
  • 16-bit number for high-fidelity recordings
  • 2 x 16-bits for stereo

64
MIDI
  • Music notation system that allows computers to
    communicate with music synthesizers
  • Instructions that MIDI instruments and MIDI sound
    cards use to recreate or synthesize sounds.
  • Do not store or recreate speaking or singing
    voices
  • More compact than waveform
  • 3 minutes 10 KB

65
Audio Formats
  • MP3
  • Derivative of MPEG-2 (ISO Moving Picture Experts
    Group)
  • Uses psychoacoustic compression techniques to
    reduce storage requirements
  • Discards sounds outside human hearing range
    lossy compression
  • WAV
  • Developed by Microsoft as part of its multimedia
    specification
  • General-purpose format for storing and
    reproducing small snippets of sound

66
.WAV Sound Format
67
Data Compression
  • Compression recoding data so that it requires
    fewer bytes of storage space.
  • Compression ratio the amount file is shrunk
  • Lossless inverse algorithm restores data to
    exact original form
  • Examples GIF, PCX, TIFF
  • Lossy trades off data degradation for file size
    and download speed
  • Much higher compression ratios, often 10 to 1
  • Example JPEG
  • Common in multimedia
  • MPEG-2 uses both forms for ratios of 1001

68
Compression Algorithms
  • Repetition
  • 0 5 8 7 0 0 0 0 3 4 0 0 0 0 1 5 8 7 0 4 3
    4 0 3
  • Example large blocks of the same color
  • Pattern Substitution
  • Scans data for patterns
  • Substitutes new pattern, makes dictionary entry
  • Example 45 to 30 bytes plus dictionary
  • Peter Piper picked a peck of pickled peppers.
  • ? t ? ? p ? ?? ? a ? ? of ? ?l ? ? pp ?
    s.

69
Internal Computer Data Format
  • All data stored as binary numbers
  • Interpreted based on
  • Operations computer can perform
  • Data types supported by programming language used
    to create application

70
Five Simple Data Types
  • Boolean 2-valued variables or constants with
    values of true or false
  • Char Variable or constant that holds
    alphanumeric character
  • Enumerated
  • User-defined data types with possible values
    listed in definition
  • Type DayOfWeek Mon, Tues, Wed, Thurs, Fri, Sat,
    Sun
  • Integer positive or negative whole numbers
  • Real
  • Numbers with a decimal point
  • Numbers whose magnitude, large or small, exceeds
    computers capability to store as an integer

71
Thank you!
Reading Lecture slides and notes, Chapter 4
Write a Comment
User Comments (0)
About PowerShow.com