Title: Computer Science 340
1Computer Science 340
- Chapter 6 Sebesta Notes
- Evans J. Adams Fall 03
2Evolution of Data Types FORTRAN I (1956) -
INTEGER, REAL, arrays Ada (1983) - User can
create a unique type for every category of
variables in the problem space and have the
system enforce the types Def A descriptor is
the collection of the attributes of a
variable
3Design Issues for all data types
- What is the syntax of references to variables?
- What operations are defined and how are they
specified? - What values are allowed?
4Primitive Data Types (those not defined in
terms of other data types) Integer - Almost
always an exact reflection of the hardware,
so the mapping is trivial - One bit for the
sign (, -) other bits for binary data 27
128 215 32,768 231 2,147,483,648 263
9223372036854775808 - There may be as many as
eight different integer types in a language
5Floating Point - Model real numbers, but only
as approximations - Languages for scientific
use support at least two floating-point
types sometimes more - Usually exactly like
the hardware, but not always some languages
allow accuracy specs in code e.g. (Ada)
type SPEED is digits 7 range 0.0..1000.0
type VOLTAGE is delta 0.1 range -12.0..24.0 -
Floating Point Representation (next slide)
6Example 2.01152E3 2011.52 Fraction 201152
(in Binary) Exponent 3 (in Binary) Sign 0
(positive)
7Decimal - For business applications (money) -
IBM 360 / COBOL - Store a fixed number of
decimal digits (coded) - Advantage accuracy -
Disadvantages limited range, wastes
memory Boolean - Could be implemented as bits,
but often as bytes - Advantage readability
8Character String Types - Values are
sequences of characters Design issues
1. Is it a primitive type or just a special kind
of array? 2. Is the length of objects
static or dynamic? (Java string type is
static, stringbuffer type is dynamic)
Operations - Assignment - Comparison
(, gt, etc.) - Catenation () -
Substring reference - Pattern matching
9Examples - Pascal - Not
primitive assignment and comparison
only (of packed arrays) - Ada, FORTRAN 77,
FORTRAN 90 and BASIC - Somewhat
Primitive Single Dimension Array of CHARACTER
in Ada - Assignment, comparison,
catenation, substring reference -
FORTRAN has an intrinsic for pattern matching
10Example (Ada) N N1 N2 (catenation)
N(2..4) (substring reference) - C
and C - Not primitive char str
apples creates a char pointer which points
to apples0 - Use char arrays and a
library of functions (string.h) that
provides operations (strcpy, strcat, strcmp,
strlen)
11- SNOBOL4 (a string manipulation language)
- Primitive - Many operations,
including elaborate pattern
matching - Perl - Patterns are defined
in terms of regular expressions
- A very powerful facility! - e.g.,
/A-Za-zA-Za-z\d/
(Perls syntax for identifier pattern) -
Java - String class (not arrays of char)
12String Length Options 1. Static - FORTRAN
77, Ada, COBOL e.g. (FORTRAN
90) CHARACTER (LEN 15)
NAME 2. Limited Dynamic Length - C and
C actual length is indicated by a
null character, but limited by maximum declared
length 3. Dynamic - SNOBOL4, Perl
Evaluation (of character string types) -
Aid to writability - As a primitive type
with static length, they are inexpensive
to provide--why not have them? - Dynamic
length is nice, but is it worth the
expense?
13Implementation - Static length - compile-time
descriptor (stringName, length, address) -
Limited dynamic length - may need a run-time
descriptor for length (but not in C and
C) (stringName, maxLength, currentLength,
address) - Dynamic length - need run-time
descriptor allocation/deallocation is
the biggest implementation problem probably a
linked list of characters (easy to grow, shrink)
14Ordinal Types (user defined) An ordinal type
is one in which the range of possible values
can be easily associated with the set of
positive integers 1. Enumeration Types - one in
which the user enumerates all of the
possible values, which are symbolic
constants Design Issue Should a symbolic
constant be allowed to be in more than one
type definition?
15Pascal Declaration type colortype (red, blue,
green, yellow) var color colortype . . .
color blue if color gt red then .
16Examples Pascal - cannot reuse constants
they can be used for array
subscripts, for variables, case
selectors NO input or output can
be compared Ada - constants can be reused
(overloaded literals) disambiguate with
context or type_name (one of
them) can be used as in Pascal
CAN be input and output C and C - like
Pascal, except they can be input
and output as integers Java does
not include an enumeration type (can define a
class with constants instead)
17Examples Java does not include an enumeration
type (can define a class with constants
instead) Class colors public final int red
0 public final int blue 1 Colors
mycolor Mycolor red
18Evaluation (of enumeration types) a.
Aid to readability--e.g. no need to code a color
as a number b. Aid to reliability--e.g.
compiler can check operations and ranges of values
19- 2. Subrange Type
- an ordered contiguous subsequence of an ordinal
type - Design Issue How can they be used?
- Examples
- Pascal
- Subrange types behave as their parent types
- can be used as for variables and array indices
-
- e.g. type pos 0 .. MAXINT
20Ada - Subtypes are not new types, just
constrained existing types (so they
are compatible) can be used as in
Pascal, plus case constants
e.g. subtype POS_TYPE
is INTEGER range 0..INTEGER'LAST
21More Ada Examples Type is (Mon, Tue, Wed, Thu,
Fri, Sat, Sun) / Type Declaration / Subtype
weekdays is Days range Mon..Fri / Type
Declaration / Day1 Days / Variable
Declaration / Weekday Weekdays /
Variable Declaration / Day1 Mon Weekdays
Day1 / Legal unless Day1 equals Sat or Sun /
22 Evaluation of sub-range types - Aid
to readability - Reliability - restricted
ranges aid error detection Implementation of
user-defined ordinal types - Enumeration
types are implemented as integers -
Subrange types are the parent types with code
inserted (by the compiler) to restrict
assignments to subrange variables
23Arrays An array is an aggregate of homogeneous
data elements in which an individual element is
identified by its position in the aggregate,
relative to the first element.
24Array Design Issues
- 1. What types are legal for subscripts?
- 2. Are subscripting expressions in element
- references range checked?
- 3. When are subscript ranges bound?
- 4. When does allocation take place?
- 5. What is the maximum number of subscripts?
- 6. Can array objects be initialized?
- 7. Are any kind of slices allowed?
25Indexing is a mapping from indices to elements
map(array_name, index_value_list) ? an
element Syntax - FORTRAN, PL/I, Ada
use parentheses - Most others use brackets
Subscript Types FORTRAN, C - int only
Pascal - any ordinal type (int, boolean, char,
enum) Ada - int or enum (includes boolean and
char) Java - integer types only
26Specifying Subscripts
- Pascal/Ada - LowerBound..UpperBound
- Var B Array 1..2 of integer
- VB (LowerBound To UpperBound)
- Dim B (1 To 2, 1 To 3, 1 To 2) As Integer
- C, C, Java Integers Zero-Based
- Int B (2, 3, 2) with subscripts 0,1 0,1,2
0,1
27Four Categories of Arrays (based on subscript
binding and binding to storage) 1. Static -
range of subscripts and storage bindings
are static e.g. FORTRAN 77, some arrays
in Ada Advantage execution
efficiency (no allocation
or deallocation) 2. Fixed stack dynamic
- range of subscripts is statically
bound, but storage is bound at
elaboration time e.g. Pascal locals and,
C locals that are not static
Advantage space efficiency
283. Stack-dynamic - range and storage are dynamic,
but fixed from then on for the variables
lifetime e.g. Ada declare blocks
declare STUFF array (1..N) of
FLOAT begin ...
end Advantage flexibility - size need
not be known until
the array is about to be used
294. Heap-dynamic - subscript range and storage
bindings are dynamic and not fixed e.g.
(FORTRAN 90) INTEGER, ALLOCATABLE, ARRAY
(,) MAT (Declares MAT to be a dynamic
2-dim array) ALLOCATE (MAT (10,
NUMBER_OF_COLS)) (Allocates MAT to have 10
rows and NUMBER_OF_COLS columns)
DEALLOCATE MAT (Deallocates MATs
storage) - In APL Perl, arrays grow
and shrink as needed - In Java, all arrays
are objects (heap-dynamic)
30Number of subscripts (dimensions) - FORTRAN I
allowed up to three - FORTRAN 77 allows up to
seven - C, C, and Java allow just one, but
elements can be arrays - Others - no
limit Array Initialization - Usually just a
list of values that are put in the array in
the order in which the array elements are
stored in memory Examples 1. FORTRAN -
uses the DATA statement, or put the
values in / ... / on the declaration
312. C and C - put the values in braces can let
the compiler count them e.g.
int stuff 2, 4, 6, 8 3. Ada
- positions for the values can be specified
e.g. SCORE array (1..14, 1..2)
(1 gt (24, 10), 2 gt (10, 7), 3 gt(12,
30), others gt (0, 0))
4. Pascal and Modula-2 do not allow array
initialization
32Array Operations 1. APL - many, see book (p.
216-217) 2. Ada - assignment RHS can be
an aggregate constant or an array name
- catenation for all single-dimensioned
arrays - relational operators ( and /
only) 3. FORTRAN 90 - intrinsics
(subprograms) for a wide variety of
array operations (e.g., matrix multiplication,
vector dot product)
33Slices A slice is some substructure of an
array nothing more than a referencing
mechanism Slice Examples (Show Transparency Fig
6.4) 1. FORTRAN 90 INTEGER MAT (1
4, 1 4) MAT(1 4, 1) - the first
column MAT(2, 1 4) - the second row
2. Ada - single-dimensioned arrays only
LIST(4..10)
34Associative Arrays - An associative array is an
unordered collection of data elements that are
indexed by an equal number of values called
keys - Design Issues 1. What is the form of
references to elements? 2. Is the size static
or dynamic?
35Structure and Operations in Perl - Names
begin with - Literals are delimited by
parentheses e.g., hi_temps
("Monday" gt 77, "Tuesday" gt
79,) - Subscripting is done using braces
and keys e.g., hi_temps"Wednesday"
83 - Elements can be removed with delete
e.g., delete hi_temps"Tuesday"
36Implementation of Arrays - Access function
maps subscript expressions to an address in
the array - Row major (by rows) or column
major order (by columns)
37Arrays
- An ABSTRACT DATA TYPE which is built into most
programming languages - Programmer (user) manipulates the abstraction
and - not the actual storage implementation of the
array - to store, retrieve individual elements.
38Arrays
- An Address Mapping Algorithm
- to map an array subscript
- onto its actual main memory storage address
- is executed by the memory management subsystem
of the programming language.
39VECTOR ( a one-dimensional array) B Base
Address L Size of Each Element I Subscript
Subscript
Memory
B
Element 1
1 2 3 4
B L
Element 2
B2L
Element 3
B3L
Element 4
Rest of Array . .
40Address Mapping Function - Vectors
- Given
- L size of each array element (in bytes)
- B Base storage address of first array
- element
- I subscript (desired element in the array)
- Absolute Storage Address (I) B (I - 1) L
41Two Dimensional Arrays
- Two-Dimensional Arrays are typically implemented
in - Row-Major Order by most language compilers
- except FORTRAN which uses Column-Major Order
- See Row-Major Diagram (Next Slide)
42Row - Major Mapping
Memory
B
1062 .99 2.84 10.50 1048 1.88
5.60 22.38 . . . 80.90
Logical Array Structure
43Three Dimensional Arrays
- Typically implemented in Plane-Row-Major Order
- Storage is row-major order within planes
- Three-D Array Declarations
- Pascal/Ada - Var B Array 1..2, 1..3, 1..2 of
integer - VB Dim B (1 To 2, 1 To 3, 1 To 2) As Integer
- See Three-D Array Diagrams (next slides)
443D Array with declaration Var B Array 1..2,
1..3, 1..2 of integer Logical Storage Structure
Plane 2
Plane 1
453D Array with declaration Var B Array 1..2,
1..3, 1..2 of integer Physical Storage in
Plane, Row-Major Order
1st plane
2nd plane
46- Relative Address Calculation Formula (3D Arrays
in plane-row-major order) - Given Array DECLARATION with num_planes,
num_rows and num_cols, such as - X ARRAY 1..num_planes, 1..num_rows,1..num_cols
of INTEGER - and Array REFERENCE such as
- Y Xplane_subscript, row_subscript,
col_subscript - Relative Address (plane_subscript,
row_subscript, col_subscript) - (plane_subscript - 1) num_rows num_cols
- (row_subscript - 1) num_cols
- col_subscript
47Example 1 Find Relative Address (RA) of
B1,1,2 given Var B ARRAY 1..2, 1..3, 1..2
OF INTEGER RA(1,1,2) (1 - 1) 3 2 (1 -
1) 2 2 0 0 2 2
48Example 2 Find RA of B2,2,2 given same array
definition RA(2,2,2) (2 - 1) 3 2 (2 -
1) 2 2 6 2 2 10
49- The relative address is then used to compute
- the ABSOLUTE ADDRESS
- using the same formula introduced for vectors
- Given
- L size of each array element (in bytes)
- B Base storage address of first array element
- RA Relative address of desired array element
Absolute Storage Address (RA) B (RA - 1) L
50Example 3 Given B 100, L 1 BYTE, RA 10
(RA computed in Example 2 above) Absolute
Address (10) 100 (10 - 1) 1 100
9 109
51Example 4 Given B 1000, L 4 Bytes, RA
2 (RA computed in Example 1 above) Absolute
Address (2) 1000 (2 - 1) 4 1000
4 1004
52- Generalized Relative Address Formula
- (for an N-Dimensional Array)
- Given array A defined as
- VAR A ARRAY 1..U1, 1..U2, , 1..Un OF
INTEGER - and reference to an element of A, such as
- Y AS1, S2, , Sn
- Note
- The Ui, 1 lt i lt n, represent the upper
bounds for each dimension of the n-dimensional
array - The Si, 1 lt i lt n, represent the subscripts
for a specific reference to an element of the
array A
53- Generalized Relative Address Formula
- RA (S1, S2, , Sn)
- (SUM (Si - 1) Pi) 1
- 1ltiltn
- Where Pi PRODUCT Ur
- iltrltn
- i.e., Pi is the product of the upper bounds of
all subsequent dimensions to the right of i
54- Example
- Generate Formulas for two and three dimensional
arrays from the General Formula
55Records A record is a possibly heterogeneous
aggregate of data elements in which the
individual elements are identified by
names Design Issues 1. What is the form of
references? 2. What unit operations are
defined?
56Records
- Hierarchical, Heterogeneous Data Structure
- No Regular Pattern like Arrays
- Physical Storage is via offsets from the Base
Address of the Record - Offsets are computed according to the size (in
bytes) of the Record Components - Examples on Next Slides
57PL1 Record Declaration
58 EMP_REC
NAME_ADDR POSITION
SALARY NUM_DEP HEALTH_PLAN D ATE_HIRED
NAME ADDRESS DEPTNO JOBTITLE
FIRST MIDINT LAST
STRADDR CITY STATE
ZIP
Hierarchical Representation of PL1 Record
Structure
59 EMP_REC
NAME_ADDR
NAME ADDRESS
POSITION
M
N I
S
S U D
C T
A
M FIRST I LAST STRADDR I A ZIP
DEPTNO JOBTITLE L - HEALTH_ DATE_
N T
T
A D PLAN HIRED I
Y E
R E
T
Y P
B
Physical Storage Representation of PL1 Structure
60- Record Definition Syntax
- COBOL uses level numbers to show nested
records - others use recursive definitions
- Record Field References
- 1. COBOL
- field_name OF record_name_1 OF ... OF
record_name_n - 2. Others (dot notation)
- record_name_1.record_name_2. ...
.record_name_n.field_name
61Fully qualified references must include all
record names Elliptical references allow
leaving out record names as long as the
reference is unambiguous VB, Pascal and Modula-2
provide a with clause to abbreviate
references Record Operations 1. Assignment
- Pascal, Ada, and C allow it if the types
are identical - In Ada, the RHS
can be an aggregate constant
622. Initialization - Allowed in Ada, using
an aggregate constant 3. Comparison -
In Ada, and / one operand can be an
aggregate constant 4. MOVE CORRESPONDING
- In COBOL - it moves all fields in the source
record to fields with the same names in
the destination record
63Sets A set is a type whose variables can store
unordered collections of distinct values from
some ordinal type Design Issue What is
the maximum number of elements in any set
base type? Examples 1. Pascal - No maximum
size in the language definition (not
portable, poor writability if max is too small)
- Operations union (), intersection (),
difference (-), , ltgt, superset (gt), subset
(lt), in
64Set Examples
- type colors (red, blue, green, yellow, black)
- type colorset set of colors
- var set1, set2, set3, set4 colorset
- x colors
- set1 red, blue, green
- set2 black, blue
- set3 green set4
- x blue
- set3 set1 set2 set4 set3 set1
- if x in set1
- if ch in vowels / from example in book
65 2. Modula-2 and Modula-3 - Additional
operations INCL, EXCL, / (symmetric
set difference (elements in one but
not both operands)) 3. Ada - does not include
sets, but defines in as set membership
operator for all enumeration types
4. Java includes a class for set operations
66Evaluation - If a language does not have sets,
they must be simulated, either with
enumerated types or with arrays - Arrays
are more flexible than sets, but have much
slower operations Implementation - Usually
stored as bit strings and use logical
operations for the set operations
67Pointers A pointer type is a type in which the
range of values consists of memory addresses and
a special value, nil (or null) Uses 1.
Addressing flexibility 2. Dynamic storage
management
68Design Issues 1. What is the scope and
lifetime of pointer variables? 2.
What is the lifetime of heap-dynamic variables?
3. Are pointers restricted to pointing at a
particular type? 4. Are pointers
used for dynamic storage management,
indirect addressing, or both? 5. Should a
language support pointer types,
reference types, or both? Fundamental Pointer
Operations 1. Assignment of an address to a
pointer 2. References (explicit versus
implicit dereferencing)
69program sample var x integer Globals defined
here procedure sub1 var x integer
var intPtr integer define pointer variable
begin sub1 x 10 new
(intPtr) allocate storage in heap for an
integer and set intPtr to its address
intPtr 111 set value of integer in
heap dispose (intPtr) free storage in
heap and set intPtr to
null end sub1 begin big x 15
sub1 end sample
70Problems with pointers 1. Dangling pointers
(dangerous) - A pointer points to a
heap-dynamic variable that has been
deallocated - Creating one
a. Allocate a heap-dynamic variable and set a
pointer to point at it b.
Set a second pointer to the value of the
first pointer c. Deallocate
the heap-dynamic variable,
using the first pointer
712. Lost Heap-Dynamic Variables (wasteful)
- A heap-dynamic variable that is no longer
referenced by any program pointer
- Creating one a. Pointer p1 is set
to point to a newly created
heap-dynamic variable b. p1 is later
set to point to another newly
created heap-dynamic variable -
The process of losing heap-dynamic
variables is called memory leakage
72Examples 1. Pascal used for dynamic storage
management only - Explicit
dereferencing - Dangling pointers are
possible (dispose) - Dangling objects are
also possible 2. Ada a little better than
Pascal and Modula-2 - Some dangling pointers
are disallowed because dynamic objects can
be automatically deallocated at the end of
pointer's scope - All pointers are
initialized to null - Similar dangling
object problem (but rarely happens)
733. C and C - Used for dynamic storage
management and addressing - Explicit
dereferencing and address-of operator - Can
do address arithmetic in restricted forms -
Domain type need not be fixed (void )
e.g. float stuff100 float p
p stuff / stuff0 by default
(p5) is equivalent to stuff5 and p5
(pi) is equivalent to stuffi and pi
- void - can point to any type and can
be type checked
745. C Reference Types - Constant pointers
that are implicitly dereferenced - Used for
parameters - Advantages of both
pass-by-reference and pass-by-value
6. Java - Only references - No pointer
arithmetic - Can only point at objects
(which are all on the heap) - No
explicit deallocator (garbage collection is
used) - Means there can be no
dangling references - Dereferencing is
always implicit
75- Evaluation of pointers
- Dangling pointers and dangling objects are
problems, as is heap management - Pointers are like goto's--they widen the range
of cells that can be accessed by a variable - Pointers are necessary--so we can't design a
language without them?? - Hoare
- Their introduction into 3GLs has been a step
backward from which we may never recover - Java does not have a pointer data type, but
relies heavily on the addresses of objects
76(No Transcript)