Title: Data Types
1Lecture 8
2Chapter 6 Topics
- Introduction
- Primitive Data Types
- Character String Types
- User-Defined Ordinal Types
- Array Types
- Associative Arrays
- Record Types
- Union Types
- Pointer and Reference Types
3Introduction
- A data type defines a collection of data objects
and a set of predefined operations on those
objects - PL/1 first includes accuracy of decimal by
integer, floating point and many other data types - ALGOL 68 provided few basic types and operators
to allow programmer for tailored user defined
types - User should be allowed to create a unique type
for unique class of variables - Improves readability and modifiability
- Extended to the concept of ADTs
4Introduction
- Abstract Data Types Specification of data and
operation whose use is separated from
implementation - expressed by using type operators (e.g, , (), ,
etc) - A descriptor is the collection of the attributes
of a variable - An object represents an instance of a
user-defined abstract data type - In Object Oriented language it is the instance of
class - One design issue for all data types What
operations are defined and how are they
specified?
5Primitive Data Types
- Almost all programming languages provide a set of
primitive data types - Primitive data types Those not defined in terms
of other data types - Some primitive data types are merely reflections
of the hardware - Others require little non-hardware support
6Primitive Data Types Integer
- Almost always an exact reflection of the hardware
so the mapping is trivial - There may be as many as eight different integer
types in a language - Javas signed integer sizes byte, short, int,
long - Ada SHORT INTEGER, INTEGER, LONG INTEGER
- C unsigned integer
- Represented by binary strings
7Primitive Data Types Floating Point
- Model real numbers, but only as approximations
- One of the reasons is binary representation
- Languages for scientific use support at least two
floating-point types - float single precision and double double
precision - IEEE Floating-Point
- Standard 754
8Primitive Data Types Decimal
- For business applications (money)
- Essential to COBOL
- C offers a decimal data type
- Store a fixed number of decimal digits
- Advantage accuracy
- Disadvantages limited range, wastes memory
9Primitive Data Types Boolean
- Simplest of all
- Range of values two elements, one for true and
one for false - Could be implemented as bits, but often as bytes
- Advantage readability
- First introduced in ALGOL 60
- C uses numeric expression as a replacement
- Non zero values are true else false
- C and Java provides bool as primitive data type
10Primitive Data Types Character
- Stored as numeric codings
- Most commonly used coding ASCII
- Uses 0 to 127 to code 128 different characters
- An alternative, 16-bit coding Unicode
- Includes characters from most natural languages
- First 128 characters are identical to ASCII
- Originally used in Java
- C and JavaScript also support Unicode
11Character String Types
- Values are sequences of characters
- Design issues
- Is it a primitive type or just a special kind of
array? - Should the length of strings be static or dynamic?
12Character String Type in Certain Languages
- C and C
- Not primitive
- Use char arrays and a library of functions that
provide operations - Character strings are terminated with a special
character, null, which is represented with zero. - The library operations simply carry out their
operations until the null character being
operated on. Library functions that produce
strings often supply the null character. - Common library functions
- strcpy, strcat, strcmp, strlen
13Character String Type in Certain Languages
- C and C
- String manipulation functions of the C standard
library is unsafe, as they dont guard against
overflowing the destination. E.g. - strcpy(dest,src)
- If the length of dest is 20 and the length of
src is 50, strcpy will write over the 30 bytes
that follow dest - C programmer should use the string class from
the standard library
14Character String Types Operations
- Java
- Strings are supported as a primitive type by the
String class string aabcd is the same as
String a new String(abcd), whose values are
constant strings (each time when the value is
changed, a new String object is created). - StringBuffer class values of a string is
changeable. - C
- Similar to Java
- C
- C-style strings and strings in its standard class
library which is similar to that of Java - Pattern matching
15Character String Types Operations
- Pattern matching
- Fundamental character string operation
- Often called regular expressions
- E.g., /A-Za-zA-Za-z\d/ matches string that
begin with a letter, followed by one or more
letters or digits - Perl, JavaScript and PHP have built-in pattern
matching operations - Java, C and C have pattern matching
capabilities in the class libraries, e.g. Java - Pattern p Pattern.compile("ab")
- Matcher m p.matcher("aaaaab")
- boolean b m.matches()
16Character String Length Options
- Static Length COBOL, Javas String class
- Length is set when the string is created, fixed
length - Limited Dynamic Length C and C
- Length is set to have a maximum, varying length
- Dynamic Length(no maximum) Perl, JavaScript
- No length limit, varying length. Required
overhead of dynamic storage allocation and
deallocation but provides maximum flexibility - Ada supports all three string length options
- String static
- Bounded_String limited dynamic
- Unbounded_String dynamic
17Character String Type Evaluation
- Aid to writability
- As a primitive type with static length, they are
inexpensive to provide - Simple pattern matching and catenation are
essential, should be included - Dynamic length is most flexible, but overhead of
implementation must be weighed. Often included
only in languages that are interpreted.
18Character String Implementation
- Static length compile-time descriptor
- Limited dynamic length may need a run-time
descriptor for length (but not in C and C)
type of string
address of the first character
type of string
address of the first character
19Character String Implementation
- Dynamic length need run-time descriptor
allocation/de-allocation is the biggest
implementation problem. Two approaches - String is stored in a linked list, so when a
string grows, the newly required cells can come
from anywhere in the heap. Drawback extra
storage occupied by the links in the list
representation and the necessary complexity of
string operations. But allocation and
deallocation process is simple. - Store complete strings in adjacent storage cells.
When the storage for the adjacent cell is not
available(when the string grows), a new area of
memory is found to store the complete new string.
Faster string operation and requires less
storage, but slower allocation and deallocation
process. This approach is typically used.
20User-Defined Ordinal Types
- An ordinal type is one in which the range of
possible values can be easily associated with the
set of positive integers - Examples of primitive ordinal types in Java
- integer
- char
- boolean
- Generally two kinds of user-define ordinal types
- enumeration
- subrange
21Enumeration Types
- All possible values, which are named constants,
are provided in the definition - C example
- enum days mon, tue, wed, thu, fri, sat, sun
- The enumeration constants are typically
implicitly assigned the integer values, 0, 1, ,
but can be explicitly assigned any integer
literal in the types definition - Design issues
- Is an enumeration constant allowed to appear in
more than one type definition, and if so, how is
the type of an occurrence of that constant
checked? - Are enumeration values coerced to integer?
- Any other type coerced to an enumeration type?
22Design
- In languages that do not have enumeration types,
programmers usually simulate them with integer
values. E.g. Fortran 77, use 0 to represent blue
and 1 to represent red - INTEGER RED, BLUE
- DATA RED, BLUE/0,1/
- Problem there is no type checking when they are
used. It would be legal to add two together. - They can be assigned any integer value thus
destroying the relationship with the colors.
23Design
- Pascal and C/C do not allow literal constants
to be used in more than one enumeration type
definition - Ada allows overloaded literals
- Resolve overloading from context of its appearance
24Design
- In C, we could have
- enum colors red, blue, green, yellow, black
- colors myColor blue, yourColor red
- The enumeration values are coerced to int when
they are put in integer context. E.g. myColor
would assign green to myColor.
25Design
- In Java, all enumeration types are implicitly
subclasses of the predefined class Enum. They can
have instance data fields, constructors and
methods - Java ExampleEnumeration days Vector dayNames
new Vector() dayNames.add("Monday")
dayNames.add("Friday") days
dayNames.elements() while (days.hasMoreElements(
)) System.out.println(days.nextElement())
26Design
- C enumeration types are like those of C except
that they are never coerced to integer. - Operations are restricted to those that make
sense. - The range of values is restricted to that of the
particular enumeration type.
27Evaluation of Enumerated Type
- Aid to readability, e.g., no need to code a color
as a number - Aid to reliability, e.g., compiler can check
- operations (dont allow colors to be added with
integer) - No enumeration variable can be assigned a value
outside its defined range, e.g. if the colors
type has 10 enumeration constants and uses 0 .. 9
as its internal values, no number greater than 9
can be assigned to a colors type variable. - Ada, C, and Java 5.0 provide better support for
enumeration than C because enumeration type
variables in these languages are not coerced into
integer types
28Evaluation of Enumerated Type
- C treats enumeration variables like integer
variables it does not provide the advantage of
reliability. - C is better. Numeric values can be assigned to
enumeration type variables only if they are cast
to the type of the assigned variable. Numeric
values are checked to determine in they are in
the range of the internal values. However if the
user uses a wide range of explicitly assigned
values, this checking is not effective. E.g. - enum colors red 1, blue 100, green 100000
- A value assigned to a variable of colors type
will only be checked to determine whether it is
in the range of 1..100000. - Java 5.0, C and Ada are better, as variables are
never coerced to integer types
29Subrange Types
- An ordered contiguous subsequence of an ordinal
type - Not a new type, but a restricted existing type
- Example 12..18 is a subrange of integer type
- Adas design
- type Days is (mon, tue, wed, thu, fri, sat, sun)
- subtype Weekdays is Days range mon..fri
- subtype Index is Integer range 1..100
- Day1 Days
- Day2 Weekday
- Day2 Day1 //legal if Day1 is not sat or sun
- Compatible with its parent type.
30Subrange Evaluation
- Aid to readability
- Make it clear to the readers that variables of
subrange can store only certain range of values - Reliability
- Assigning a value to a subrange variable that is
outside the specified range is detected as an
error
31Implementation of User-Defined Ordinal Types
- Enumeration types are implemented as integers
- Subrange types are implemented like the parent
types with code inserted (by the compiler) to
restrict assignments to subrange variables - increase code size and execution time
- may help in compiler optimization