Title: Database Design
1Chapter 12
2Learning Objectives
- Describe the differences and similarities between
relational and object-oriented database
management systems - Design a relational database schema based on an
entity-relationship diagram - High Level
- Detailed
- Design an object database schema based on a class
diagram
3Learning Objectives (continued)
- Design a relational schema to implement a hybrid
object-relational database - Describe the different architectural models for
distributed databases
4Overview
- This chapter describes design of relational and
OO data models - Developers transform conceptual data models into
detailed database models - Entity-relationship diagrams (ERDs) for
traditional analysis - Class diagrams for object-oriented (OO) analysis
- Detailed database models are implemented with
database management system (DBMS)
5Databases and Database Management Systems
- Databases (DB) integrated collections of stored
data that are centrally managed and controlled - Database management system (DBMS) system
software that manages and controls access to
database - Databases described by a schema description of
structure, content, and access controls - XML Data and Schema are integrated
6XML Example
7Database Models
- Impacted by technology changes since 1960s
- Model types
- Hierarchical
- Network
- Relational
- Object-oriented
- Most current systems use relational and (rarely)
object-oriented data models
8Components of a DB and DBMS
Virtually all actions in a relational DB managed
through Structured Query Language
9Important DBMS Capabilities
- Simultaneous access by multiple users and
applications - Access to data without application programs (via
a query language) - Organizational data management with uniform
access and content controls
10Objectives of SQL
- Ideally, database language should allow user to
- create the database and relation structures
- perform insertion, modification, deletion of
data - perform simple and complex queries.
- Should perform these tasks with minimal user
effort and command structure/syntax must be easy
to learn. - Should be portable.
- The relational model provides Structured Query
Language (SQL) to meet the above objectives
11SQL Benefits
- SQL is relatively easy to learn
- it is non-procedural - you specify what
information you require, rather than how to get
it - it is essentially free-format.
- Relatively easy to learn and use but also very
powerful - Consists of standard English words
- CREATE TABLE Staff(staffNo VARCHAR(5),
- lName VARCHAR(15),
- salary DECIMAL(7,2))
- INSERT INTO Staff VALUES ('SG16', 'Brown',
8300)
12SQL
- SQL has 3 major components
- A Data Definition Language (DDL) for defining
database structure. - A Data Manipulation Language (DML) for retrieving
and updating data. - Commands to control access to data (security)
13Relational Databases
- Relational database management system (RDBMS)
organizes data into tables or relations - Tables are two dimensional data structures
- Tuples rows or records represents a specific
occurrence of an entity - Fields columns or attributes
- Tables have primary key field(s) that can be used
to identify unique records - Keys relate tables to each other foreign keys
14Partial Display of Relational Database Table
15Database design
- High Level logical
- Detailed physical
16High Level Design
- Create table for each entity type
- Choose or invent primary key for each table
- Add foreign keys to represent one-to-many
relationships - Create new tables to represent many-to-many
relationships
17High Level Design
- Define referential integrity constraints
- Evaluate schema quality and make necessary
improvements. normalization - Choose appropriate data types and value
restrictions (if necessary) for each field - Valid values
- Valid ranges
18RMO Entity-Relationship Diagram
19Relationship Between Data in Two Tables
20Representing Relationships
- Relational databases use foreign keys to
represent relationships - One-to-many relationship
- Add primary key field of one entity type as
foreign key in table that represents many
entity type - Many-to-many relationship
- Use the primary key field(s) of both entity types
- Use (or create) an associative entity table to
represent relationship
21Entity Tables with Primary Keys (Figure 12-7)
22Represent One-to-Many Relationships by Adding
Foreign Keys (in italics) (Figure 12-8)
23Enforcing Referential Integrity
- Consistent relational database state
- Every foreign key value also exists as a primary
key value - DBMS enforces referential integrity automatically
after schema designer identifies primary and
foreign keys
24DBMS Referential Integrity Enforcement
- When rows containing foreign keys are created
- DBMS ensures that value also exists as a primary
key in a related table - When row is deleted
- DBMS ensures no foreign keys in related tables
have same value as primary key of deleted row - When primary key value is changed
- DBMS ensures no foreign key values in related
tables contain the same value
25RI in DDL
RI is defined when defining the foreign keys
associated with a table
26Evaluating Schema Quality
- High-quality data model has
- Uniqueness of table rows and primary keys
- Ease of implementing future data model changes
(flexibility and maintainability) - Lack of redundant data (database normalization)
- Database design is not objective or
quantitatively measured it is experience and
judgment based
27Normalization
- A good data model
- Is simple all attributes in an entity describe
only that entity. - Is non-redundant. All attributes other than
foreign keys describe one entity - Is flexible and adaptive.
- Normalization is a data analysis technique that
organizes data attributes such that they are
grouped to form non-redundant, stable, flexible,
and adaptive entities.
28Database Normalization
- Normal forms minimize data redundancy
- First normal form (1NF) no repeating fields or
groups of fields - Functional dependency one-to-one relationship
between the values of two fields - 2NF in 1NF and if each non-key element is
functionally dependent on entire primary key - 3NF in 2NF and if no non-key element is
functionally dependent on any other non-key
element
29First Normal Form
The first normal form states that each field is
granular and there are no repeating groups
AND
30Second/Third Normal Form
The second normal form states that each field in
a multiple field primary key table must be
directly related to the entire primary key. Third
normal form is the same as second normal form
except that it only refers to tables that have a
single field as their primary key.Â
31Exercise
- Define what would need to be done place data in
first normal form (look carefully at the data) - Place this data in 3rd normal form
32Detailed Design
- Design physical representation
- Analyze transactions
- Choose file organizations
- Choose indexes
- Estimate disk space requirements
- Design user views
- Design security mechanisms
- Consider the introduction of controlled
redundancy - Monitor and tune the operational system
33Analyze Transactions
- Assume that a transaction use case
- Use this information to identify the parts of the
database that may cause performance problems. - To select appropriate file organizations and
indexes, also need to know high-level
functionality of the transactions, such as - attributes that are updated in an update
transaction - criteria used to restrict rows that are retrieved
in a query.
34Analyze Transactions
- Often not possible to analyze all expected
transactions, so investigate most important
ones use Paretto Principle - To help identify which transactions to
investigate, can use - transaction/relation cross-reference matrix,
showing relations that each transaction accesses,
and/or - Access frequency analysis.
35Cross-referencing transactions and relations
Which relations require the most attention?
36(2) Determine frequency information
- We need to identify the key transactions that
access those relations - Key information includes
- Data volumes anticipated in each relation
- Average and maximum number of times per period
that a transaction executes - Peak access times for each transaction
Why is peak access time important to know?
37(2) Determine frequency information
- Peak times are key to understanding if
transactions conflict that is, have similar peak
times. If so, potential for performance issues
is greater.
38Transaction usage map for some sample transactions
39Choose File Organizations
- Purpose To determine an efficient file
organization for each table. - Determines how records physically ordered on disk
- File organizations type unordered, ordered,
hash. - Rule of Thumb Use DBMS default organization
40Choose Indexes
- Purpose To determine whether adding indexes
will improve the performance of the system. - Need to consider both primary and secondary
indexes - Primary Index how data will be physically
ordered on disk - Secondary Index additional indexes created to
speed access to to data
41Why do indexes improve performance?
- Without an index, need to scan tables
- Index table with very few attributes.
- Use the index to find the record(s) that meet
query criteria.
42Estimate Disk Space Requirements
- Purpose To estimate the amount of disk space
that will be required by the database. - We need to do this step to ensure that we have
sufficient hardware and disk capacity both now
and in the future to manage the database. - To calculate storage, need to consider
- Amount of data required by each table (width of
each row count of rows) - Indexes to be maintained
- Log files to be maintained
- Growth over a defined time period (1 2 years)
43Design User Views
- Purpose To design the user views that were
identified during the requirements stage of the
relational database application lifecycle. - On multi-user systems, views are often used to
manage security, reduce complexity for users
44Other Detailed Design Tasks
- Design security mechanisms
- Consider the introduction of controlled
redundancy - Monitor and tune the operational system
45Object-Oriented Databases
- Direct extension of OO design and programming
paradigm - ODBMS stores data as objects
- Direct support for method storage, inheritance,
nested objects, object linking, and
programmer-defined data types - Object Definition Language (ODL)
- Standard language for describing structure and
content of an object database
46Designing Object Databases
- Determine which classes require persistent
storage - Define persistent classes
- Represent relationships among persistent classes
- Choose appropriate data types and value
restrictions (if necessary) for each field
47Representing Classes
- Transient classes
- Objects exist only during lifetime of program or
process - Examples view layer window, pop-up menu
- Persistent classes
- Objects not destroyed when program or process
ceases execution. State must be remembered. - Exist independently of program or process
- Examples customer information, employee
information
48Representing Relationships
- Object identifiers
- Used to identify objects uniquely
- Physical storage address or reference
- Relate objects of one class to another
- ODBMS uses attributes containing object
identifiers to find objects that are related to
other objects - Similar to..?
- Keyword relationship can be used to declare
relationships between classes
49Representing Relationships (continued)
- Advantages include
- ODBMS assumes responsibility for determining
connection among objects - ODBMS assumes responsibility for maintaining
referential integrity - Type of relationships
- 11, 1M, MM (one-to-one, one-to-many,
many-to-many) - Association class used with MM
50RMO Domain Model Class Diagram (Figure 12-15)
51One-to-One Relationship Represented with
Attributes Containing Object Identifiers
52One-to-Many Relationship Between Customer and
Order Classes
53One-to-Many Relationship Represented with
Attributes Containing Object Identifiers
54Generalization Hierarchy within the RMO Class
Diagram (Figure 12-21)
55Object DBMS Realities (from Wikipedia)
- Object databases based on persistent programming
acquired a niche in application areas such as - engineering and spatial databases,
telecommunications, and scientific areas such as
high energy physics and molecular biology. - They have made little impact on mainstream
commercial data processing,
56Object DBMS Realities (from Wikipedia)
- ODBMS suggest that pointer-based techniques are
optimized for very specific "search routes" or
viewpoints. However, for general-purpose queries
on the same information, pointer-based techniques
will tend to be slower and more difficult to
formulate than relational. - Other things that work against ODBMS
- lack of interoperability with a great number of
tools/features that are taken for granted in the
SQL world including but not limited to industry
standard connectivity, reporting tools, OLAP
tools and backup and recovery standards. - Lack a formal mathematical foundation, unlike the
relational model,
57Hybrid Object-Relational Database Design
- RDBMS (hybrid DBMS) used to store object
attributes and relationships - Design complete relational schema and
simultaneously design equivalent set of classes - Mismatches between relational data and OO
- Class methods cannot be directly stored or
automatically executed - Relationships are restricted compared to ODBMS
- ODBMS can represent wider range of data types
58Classes and Attributes
- Designers store classes and object attributes in
RDBMS by table definition - Relational schema can be designed based on class
diagram - Table is created for each class
- Fields of each table same as attributes of class
- Row holds attribute values of single object
- Key field is chosen for each table
59Relationships
- Relationships are represented with foreign keys
- Foreign key values serve same purpose as object
identifiers in ODBMS - 1M relationship add primary key field of class
on one side of the relationship to table
representing class on many side - MM relationship create new table that contains
primary key fields of related class tables and
attributes of the relationship itself
60Data Access Classes
- OO design based on a three-layer architecture
- Data access classes are implementation bridge
between data stored in program objects and data
in relational database - Methods add, update, find, and delete fields and
rows in table or tables that represent the class - Methods encapsulate logic needed to copy data
values from problem domain class to database and
vice versa
61Interaction Among a Domain Class, a Data Access
Class, and the DBMS(Figure 12-25)
62Data Types
- Storage format and allowable content of program
variable, object state variable, or database
field or attribute - Primitive data types directly implemented
- Memory address (pointer), Boolean, integer, and
so on - Complex data types user-defined
- Dates, times, audio streams, video images, URLs
63Relational DBMS Data Types
- Designer must choose appropriate data type for
each field in relational database schema - Choice for many fields is straightforward
- Names and addresses use a set of fixed- or
variable-length character arrays - Inventory quantities can use integers
- Item prices can use real numbers
- Complex data types (DATE, LONG, LONGRAW)
64Subset of Oracle RDBMS Data Types
65Object DBMS Data Types
- Use set of primitive and complex data types
comparable to RDBMS data types - Schema designer can create new data types and
associated constraints - Classes are complex user-defined data types that
combine traditional concept of data with
processes (methods) to manipulate data
66Distributed Databases
- Rare for all organizational data to be stored in
a single database in one location - Different information systems in an organization
are developed at different times - Parts of an organizations data may be owned and
managed by different units - System performance is improved when data is near
primary applications
67Single Database Server Architecture (Figure
12-27)
68Replicated Database Server Architecture(Figure
12-28)
69Partitioned Database Server Architecture
70Advantages of DDBMSs
- Reflects organizational structure
- Improved shareability and local autonomy
- Improved availability
- Improved reliability
- Improved performance
- Economics
- Modular growth
71Disadvantages of DDBMSs
- Complexity
- Cost
- Security
- Integrity control more difficult
- Lack of standards
- Lack of experience
- Database design more complex
72Functions of a DDBMS
- Expect DDBMS to have at least the functionality
of a DBMS. - Also to have following functionality
- Extended communication services.
- Extended Data Dictionary.
- Distributed query processing.
- Extended concurrency control.
- Extended recovery services.