Information Representation: Characters and Images - PowerPoint PPT Presentation

1 / 17

About This Presentation

Title:

Information Representation: Characters and Images

Description:

All information must be rendered into binary in order to be stored on a computer. ... In addition, only one locale can be represented at a time in multibyte encoding, ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 18

Provided by: dalero

Category:

more less

Transcript and Presenter's Notes

Title: Information Representation: Characters and Images

1
Information Representation Characters
and Images
Department of Computer and Information
Science,School of Science, IUPUI
CSCI 230
Dale Roberts, Lecturer IUPUI droberts_at_cs.iupui.edu
2
Information Representation Review

All information must be rendered into binary in
order to be stored on a computer.
Prior example of binary information
representations include positive integers,
negative integers, and floating point.
Besides numbers, almost all applications must
store characters and string information.
Images are pervasive in todays internet world
and must be rendered in binary to be handled by
internet browsers.
Crucial to make general purpose computers,
computers that can easily perform many different
tasks, is the idea that the program is just data.
Like any other information, programs must be
rendered into binary in order to be stored within
a computer.

3
Character Representations

ASCII PC workstations
EBCDIC IBM Mainframes
Unicode International Character sets

4
ASCII

ASCII
Expanded name American Standard Code for
Information Interchange
Area covered 7-bit coded character set for
information interchange
Sponsoring body American National Standards
Institute (ANSI)
Source documents Information Systems Coded
Character Sets 7-Bit American National Standard
Code for Information Interchange (7-Bit ASCII)
Characteristics/description Specifies coding of
space and a set of 94 characters (letters, digits
and punctuation or mathematical symbols) suitable
for the interchange of basic English language
documents. Forms the basis for most computer code
sets and is the American National Version of
ISO/IEC 646.
Usage Used as the basic US code set for personal
and workstation computers.
Further details available from ANSI, 25 West
43rd Street, New York, NY 10036, USA
Other references A list of ASCII codes can be
obtained from http//www.dkuug.dk/i18n/charmaps/AN
SI_X3.4-1968.

5
ASCII Code Set
6
EBCDIC

EBCDIC
Expanded name Extended Binary Coded Decimal
Interchange Code
Area covered 8-bit coded character set for
information interchange between IBM computers
Sponsoring body Proprietary specification
developed by IBM
Characteristics/description A set of national
character sets for interchange of documents
between IBM mainframes. Most EBCDIC character
sets do not contain all of the characters defined
in the ASCII code set but there is a special
International Reference Version (IRV) code set
that contains all of the characters in ISO/IEC
646 (and, therefore, ASCII). Several national
versions have been updated to support the
encoding of the euro sign (in lieu of the
currency sign).
Usage Not much used outside of IBM and similar
mainframe environments. When transmitting EBCDIC
files between systems care needs to be taken to
ensure that the systems are set up for the
relevant national code set.
Further details available from Your local IBM
office.
Other references Details of the most commonly
used sets of EBCDIC codes can be obtained from
http//www.dkuug.dk/i18n/charmaps which, however,
has not necessarily been updated to cover the new
code pages that also support the euro sign..

7
EBCDIC Code Table
8
Unicode

From MSDN Unicode can represent all of the
world's characters in modern computer use,
including technical symbols and special
characters used in publishing. Because each
Unicode code value is 16 bits wide, it is
possible to have separate values for up to 65,536
characters. Unicode-enabled functions are often
referred to as "wide-character" functions. Note
that the implementation of Unicode in 16-bit
values is referred to as UTF-16. For
compatibility with 8- and 7-bit environments,
UTF-8 and UTF-7 are two transformations of 16-bit
Unicode values. For more information, see The
Unicode Standard, Version 2.0.

9
Unicode The Wide-Character Set

From VB Online Help
A wide character is a 2-byte multilingual
character code. Any character in use in modern
computing worldwide, including technical symbols
and special publishing characters, can be
represented according to the Unicode
specification as a wide character. Developed and
maintained by a large consortium that includes
Microsoft, the Unicode standard is now widely
accepted. Because every wide character is always
represented in a fixed size of 16 bits, using
wide characters simplifies programming with
international character sets.
A wide character is of type wchar_t. A
wide-character string is represented as a
wchar_t array and is pointed to by a wchar_t
pointer. You can represent any ASCII character as
a wide character by prefixing the letter L to the
character. For example, L'\0' is the terminating
wide (16-bit) NULL character. Similarly, you can
represent any ASCII string literal as a
wide-character string literal simply by prefixing
the letter L to the ASCII literal (L"Hello").
Generally, wide characters take up more space in
memory than multibyte characters but are faster
to process. In addition, only one locale can be
represented at a time in multibyte encoding,
whereas all character sets in the world are
represented simultaneously by the Unicode
representation.

10
Universal Character Set (Unicode)

ISO/IEC 10646
Expanded name ISO/IEC 10646 Universal
Multiple-Octet Coded Character Set (UCS)
Area covered Multilingual, multi-octet character
set covering all major trading languages. The
intent is to provide coding for all the
characters of all the scripts of the world.
Sponsoring body ISO/IEC JTC1/SC2 and ISO/IEC
JTC1/SC22 WG20
Source documents
ISO/IEC 10646-1 Information technology --
Universal Multiple-Octet Coded Character Set
(UCS)
Part 1 Architecture and Basic Multilingual Plane
Part 2 Supplementary Planes
ISO/IEC DIS 14651 International string ordering
and comparison -- Method for comparing character
strings and description of the common template
tailorable ordering
ISO/IEC PRF TR 14652 Information technology --
Specification method for cultural conventions
ISO/IEC 147551997 Information technology --
Input methods to enter characters from the
repertoire of ISO/IEC 10646 with a keyboard or
other input devices
Unicode 3.2
RFC 2279 UTF-8, a transformation format of ISO
10646
Characteristics/description Integrates previous
internationally/nationally agreed character sets
into a single code set together with additional
characters to previously encoded scripts and new,
both current and ancient scripts. ISO/IEC 10646
is based on 4 octet (32-bit) coding scheme known
as the "canonical form" (UCS-4), but a 2-octet
(16-bit) form (UCS-2) is used for the Basic
Multilingual Plane (BMP), where the missing two
high order octets are assumed to be 00 00. The
code set is split into 128 "groups" of 256
"planes", each containing 256 "rows" with 256
"cells" for characters. Each character is given a
code position using multiple octets, the third
(first) of which identifies the row containing
the character and the fourth (second) its cell
number.
Usage This standard has become the basic coding
form for all 16 and 32-bit computer systems.
Users of Internet Explorer 5, and XLink-aware XML
browsers, can obtain more details about
applications of ISO 10646 from our Diffuse Topic
Map service.
Further details available from ISO and national
standards bodies.
Other references Details of the Unicode
standard, the repertoire and coding of which are
identical to those of the ISO/IEC 10646 code set
can be obtained from http//www.unicode.org.

11
Unicode Latin Set
12
Additional Unicode Pages
13
Comparing Characters Collating Sequence

If you look at the ASCII Character Code Table the
ASCII binary number for A is 1000001, which is
65 decimal. The ASCII binary number for a is
1100001, which is 97 decimal. Therefore, A is
less than a. A blank is stored as 0100000, or
32 decimal. The blank has the smallest value of
the digits or characters.
Rules
Upper case lt lower case
Space lt any other character

14
Comparing Strings

A useful operation is the comparison of two
strings. Two strings are related in the same
three basic ways as number values. One string is
either less than, equal to, or greater than the
other. String comparison is usually based on the
positions of the characters in the character set.
Scanning along both strings and comparing
corresponding characters establish the
relationship between two strings. The strings
are equal as long as corresponding characters are
equal. If two characters are different, the
comparisons are based on their relative order in
the character set. The character whose code is
less belongs to the lesser string.
Ex. abcd lt abcz
If the two strings are of different length, but
identical up to the end of the shorter one, then
the shorter string is the lesser of the two
Ex. abc lt abcd
If the two strings are of different length and
consist of Upper and lowercase letters, Upper
case letters come before lower case letter and a
blank has a lower value than all other letters.
Ex. AZZZ lt Aaaah
Below is an example of a comparison of strings
that contain blanks. Scanning along both strings
and comparing corresponding characters, you see
the strings are equal for the first two
characters. You then compare the blank and the
t you then reach the conclusion below.
Ex. hi there lt hit a ball

15
Image Data

Image Data
Because of the number of different shapes,
colors, textures, sizes and shadings of images,
there is no standard representational format and
there is with alphanumeric codes.
There are 2 ways of representing images
1 Bit map or raster images
2 Object or vector images are made up of simple
geometrical elements. Each element is specified
by its geometric parameters, its location in the
picture and other details.
Common Graphics Formats

16
Rastor Images

Bit map or raster images consist of an array of
pixel values (pixel stands for 'picture
element'). Each pixel represents the sampling of
a small area of the picture.In its simplest form
an image is represented as a long string of bits
representing the rows of pixels in the image,
where each bit is either 1 or 0 depending on
whether the corresponding pixel is black or
white.
Color images are only slightly more complicated,
since each pixel can be represented by a
combination of bits indicating the color of that
pixel. It is common to record the color of each
pixel as three components
red
green
blue
One byte is typically used to represent the
intensity of each color component

17
Acknowledgements

A list of character standard was obtained from
www.diffuse.org.
A portion of the discussion regarding character
and string comparisons was obtained from Emad
Hayajneh.
A portion of the discussion regarding images was
obtained from Dr. Robert Stephens.

Write a Comment

User Comments (0)