Title: CS586 Fall 2004
1CS586 Fall 2004
Day 4
- Introduction to Database Management Systems
- Sid Kitchel
2Plan of the Day
- Questions
- Data Modeling
- The Entity-Relationship Model
- The Relational Model
3Modeling
- What is modeling?
- Producing a controlled representation of a real
world object or process
- The representation is usually an abstraction or a
miniaturization of the real thing
- Types of modeling
- Physical
- Model train or model aircraft
- Engineering prototype or analog computer
- Operational
- Mathematical models
- Data
- Choosing and recording data that represents
aspects of a real world object or process
4Data Modeling
- What is data modeling?
- Producing a controlled representation of a real
world object or process through data
- The representation is done by
- Choosing elements of data from the real world
thing that characterize it and are useful for
applications about the thing
- Discovering the natural relationships of the
clusters of modeling data
- For example
- Suppose Im helping the PSU basketball coach
prepare to recruit next years freshman players
- Id want to collect basic info on players like
name, address, phone number, high school
- Also I need to collect stats like field goal
percentage, rebounds, points scored, etc. by
playing year
- The data models the players and data has
relationships
5Data Modeling Documentation
- Data modeling produces what?
- We have to record our ideas about a proper data
model
- In this course, we will do it in two ways
- Entity-Relationship Diagrams (ERD)
- Database schemas
- Today well learn about how to construct an
Entity-Relationship Diagram
- Later we will learn how to record a data model
using the SQL language so a DBMS can use it
6Overview of Database Development
- Applications may have to use data in the
database
- The data may be already stored there or it could
be new
- The application design and the database design
must agree to be successful
- Often the designer should be aware that data will
be shared and design beyond the current
application and for the future
- To succeed the developer must think both in terms
of how to make an application operate and how to
use data appropriately
7Phases of Development
Collect Requirements And Analyze
Functional Spec.
Database Requirements
Designing
Data Modeling
Data Model Spec.
ERD
Application Design Doc.
Create the schema
Implement the Design
Application Code
Database Schema
8Steps Toward a Database Design
- Learn what the user wants to do in an
application
- Learn about the real-world objects and processes
that the application is about
- Understand the measurable in them
- Select the set of data items that can
successfully represent those real world objects
and processes in the database
- Capture both objects and their relationships
- This is the essence of data modeling
- Make sure that the database requirements satisfy
the data needs of the functional specification
- Add to the database requirements any likely data
elements that will be needed by future
applications or future investigations dont
forget metadata
9Lets Create an Example
- You are in a university nowso lets use that
- Lets imagine you are at a large university
- You are participating in developing a new class
scheduling application
- You have to create a database design for the
project
- How do you start???
- How about requirements gathering?
- What people/things are needed for class
scheduling?
- What operations or activities happen?
10The Fruits of Analysis
- What objects are in the university world that are
involved in scheduling?
- People are involved students and faculty
- Courses exist and are described in the catalog
- Sections occur in time and space
- Colleges are broken into departments
- How do objects relate and interact?
- Sections must be able to accept assignment of
students to them and be an instance of a course
- Courses are given by departments
- Students have departments as their majors and
minors
11Representing the Object
- Objects need to be described
- This is done through object attributes that tell
about an object in data
- Objects need to be identified
- One or more attributes must act as a key that
uniquely identifies the object
- Objects must be independent or owned
- Independent objects must be uniquely identified
- Owned objects use a concatenated key the
owners ID plus some owned object attribute that
allows the object to be distinguished from the
other owned objects
12Capturing Relationships
- Which objects participate?
- There is a need to identify the objects
- There is a need to identify how the participate
totally or partially
- What is the nature of the relationship?
- There is a need to show the cardinality
- One to one
- One to many
- Many to many
- There is a need to show whether objects are
independent or owned
13Relationship Cardinalities
- One to one
- If there is a transcript object showing
graduation, then it will be one to one with the
student object
- One to many
- For each single department, there will be a
group of students in it
- Many to many
- A student may take multiple sections, while
- a single section will hold multiple students
14Overview of Database Design
- Conceptual design (Entity-Relationship Model is
used at this stage and is an example of a
Semantic Data Model.)
- What are the entities and relationships in your
enterprise?
- What information about these entities and
relationships should we store in the database?
- What are the integrity constraints or business
rules that hold for the data and its
relationships?
- An instance of data modeling in the ER Model can
be represented pictorially using ER diagrams.
- More practically you can map an ER diagram into a
relational schema.
15ER Model Basics
- Entity Real-world object distinguishable from
other objects. An entity is described (in the DB)
using a set of attributes.
- Entity Set A collection of similar entities.
E.g., all employees.
- All entities in an entity set have the same set
of attributes. (Until we consider ISA
hierarchies, anyway!)
- Each entity set has a key.
- Each attribute has a domain.
16name
ER Model Basics (Cont.)
ssn
lot
since
Employee
name
dname
subor-dinate
super-visor
ssn
lot
budget
did
Reports_To
Works_In
Department
Employee
- Relationship Association among two or more
entities. E.g., Jill works in the Pharmacy
department.
- Relationship Set Collection of similar
relationships.
- An n-ary relationship set R relates n entity
sets E1 ... En each relationship in R involves
entities e1 ? E1, ... , en ? En
- The same entity set could participate in
different relationship sets, or in different
roles in same set.
17Key Constraints
budget
did
- Consider Works_In An employee can work in many
departments a dept can have many employees.
- In contrast, each dept has at most one manager,
according to the key constraint on Manages.
Departments
1-to-1
1-to Many
Many-to-1
Many-to-Many
Relationship set cardinalities
18Weak Entities
- A weak entity can be identified uniquely only by
considering the primary key of another (owner)
entity.
- Owner entity set and weak entity set must
participate in a one-to-many relationship set
(one owner, many weak entities).
- Weak entity set must have total participation in
this identifying relationship set.
name
cost
pname
age
ssn
lot
Dependents
Policy
Employees
19Alternate Representations
middle
last
first
weak entity
composite attribute
name
R
E1
E2
E2 participates totally in R
1
N
E1
E2
R
relationship set cardinality
derived attribute
multivalued attribute
20More examples
- The following slides show examples using the
alternate representation
- The alternate representation is often used
- Also computer or web based ER tools often use
even simpler representations
- You might see
Student
SSN Name Address Enroll_date
21Cardinalities
1
n
Faculty
teaches
Class
1
1
Chair
Department
heads
m
n
takes
Student
Class
22Offering and Enrollment
Is this correct??
1
Faculty
teaches
n
m
n
takes
Student
Class
Does this model how PSU does scheduling?
23Offering and Enrollment 2
Is this correct??
1
Faculty
teaches
n
m
n
takes
Student
Section
1
n
has
Course
24Ternary Relationship
1
Faculty
teaches
n
1
isbn
author
m
n
assigned
Textbook
Section
publisher
1
n
has
Course
title
25The Beginning
- This paper is concerned with the application of
elementary relation theory to systems which
provide shared access to large banks of formatted
data. - It provides a means of describing data with its
natural structure only that is, without
superimposing any additional structure for
machine representation purposes. - Accordingly, it provides a basis for a high
level data language which will yield maximal
independence between programs on the one hand and
machine representations and organization of data
on the other. A further advantage of the
relational view is that it forms a sound basis
for treating derivability, redundancy, and
consistency of relations - -- E.F. Codd, A Relational Model of Data for
Large Shared Data Banks, CACM, no. 6 (June
1970), 13377-387.
26Why Study the Relational Model?
- Most widely used model.
- Vendors IBM, Informix, Microsoft, Oracle,
Sybase, etc.
- Legacy systems in older models
- e.g., IBMs IMS, IDS, IDMS, ADABAS
- Recent competitor object-oriented model
- GemStone/S, ObjectStore, Versant, Ontos
- A synthesis emerging object-relational model
- Informix Universal Server, UniSQL, O2, Oracle, DB2
27Big Picture
- The Relational Model has a mathematical
foundation
- It offers a fundamental change in architecture
and implementation
Network
Hierarchical
Relational
28Relational Database Definitions
- Relational database a set of relations
- Relation made up of 2 parts
- Instance a table, with rows and columns. Rows
cardinality, fields degree / arity.
- Schema specifies name of relation, plus name
and type of each column.
- E.G. Student (sid string, name string, login
string, age integer, gpa
real).
- You can think of a relation as a set of rows or
tuples (i.e., all rows are distinct).
29Relational Database Definitions (more formally)
- Domain a set of atomic values each domain has a
name, a datatype, and format a domain is
semantically unified.
- Relation r, is a set of n-tuples denoted by r
t1, t2,, tm in which each ti is a list of
values ti
- and each vi is a member of
domi, the domain associated with attribute ai
30Relational Terminology
- Relational theory and relational practice do not
talk the same
Theory talk Relation Tuple Attribute name Attr
ibute value
Domain
What is it? Main storage unit Repeating subunit
Aspect of the subunit Data chunk Possible
values
DBMS Practice Table Row Column name / field nam
e
Data value / field value Data types
31Example Instance of Students Relation
- Cardinality 3, degree 5, all rows distinct
- Do all columns in a relation instance have to
- be distinct?
32Relational Query Languages
- A major strength of the relational model
supports simple, powerful querying of data.
- Queries can be written intuitively, and the DBMS
is responsible for efficient evaluation.
- The key precise semantics for relational
queries.
- Allows the optimizer to extensively re-order
operations, and still ensure that the answer does
not change.
33Integrity Constraints (ICs)
- IC a condition that must be true for any
instance of the database e.g., domain
constraints.
- ICs are specified when schema is defined.
- ICs are checked when relations are modified.
- A legal instance of a relation is one that
satisfies all specified ICs.
- The DBMS should not allow illegal instances.
- If the DBMS checks ICs, stored data is more
faithful to real-world meaning.
- Avoids data entry errors, too!when you use check
constraints.
34Primary Key Constraints
- A set of fields is a key for a relation if
- 1. No two distinct tuples can have the same
values in all key fields, and
- 2. This is not true for any subset of the key.
- If condition 2 is false? You have a superkey.
- If theres 1 key for a relation, one of the keys
is chosen (by the DBA) to be the primary key.
- Each of these keys is often referred to as a
candidate key, because they are a candidate for
being primary.
- E.g., sid is a key for Student. (What about
name?) The set sid, gpa is a superkey.
35Primary and Candidate Keys in SQL
- Possibly many candidate keys (specified using
UNIQUE), one of which is chosen as the primary
key.
CREATE TABLE Enrolled (sid CHAR(20)
cid CHAR(20), grade CHAR(2),
PRIMARY KEY (sid,cid) )
- For a given student and course, there is a
single grade. vs. Students can take only one
course, and receive a single grade for that
course further, no two students in a course
receive the same grade. - Used carelessly, an IC can prevent the storage of
tuples that arise in practice!
vs.
CREATE TABLE Enrolled (sid CHAR(20)
cid CHAR(20), grade CHAR(2),
PRIMARY KEY (sid), UNIQUE (cid, grade)
)
36Foreign Keys, Referential Integrity
- Foreign key Set of fields in one relation that
is used to refer to a tuple in another
relation. (Must correspond to primary key of the
second relation.) Like a logical pointer. - E.g. sid is a foreign key referring to
Student
- Enrolled(sid string, cid string, grade
string)
- If all foreign key constraints are enforced,
referential integrity is achieved, i.e., no
dangling references.
- Can you name a data model w/o referential
integrity?
- Links in HTML!
37Foreign Keys in SQL
CREATE TABLE Enrolled (sid CHAR(20),
cid CHAR(20), grade CHAR(2),
PRIMARY KEY (sid,cid), FOREIGN KEY (sid
) REFERENCES Students )
- Only students listed in the Student relation
should be allowed to enroll for courses.
Enrolled
Student
38Enforcing Referential Integrity
- Consider Student and Enrolled sid in Enrolled
is a foreign key that references Student.
- What should be done if an Enrolled tuple with a
non-existent student id is inserted? (Reject
it!)
- What should be done if a Student tuple is
deleted?
- Also delete all Enrolled tuples that refer to
it.
- Disallow deletion of a Student tuple that is
referred to.
- Set sid in Enrolled tuples that refer to it to a
default sid.
- (In SQL, also Set sid in Enrolled tuples that
refer to it to a special value null, denoting
unknown or inapplicable.)
- Similar if primary key of Student tuple is
updated.
39Referential Integrity in SQL
- SQL/92 and SQL1999 support all 4 options on
deletes and updates.
- Default is NO ACTION (delete/update is
rejected)
- CASCADE (also delete all tuples that refer to
deleted tuple)
- SET NULL / SET DEFAULT (sets foreign key value
of referencing tuple)
CREATE TABLE Enrolled (sid CHAR(20),
cid CHAR(20), grade
CHAR(2), PRIMARY KEY (sid,cid), FOREIGN
KEY (sid) REFERENCES Students ON DELETE
CASCADE ON UPDATE SET DEFAULT )
40Where do ICs Come From?
- ICs are based upon the semantics of the
real-world enterprise that is being described in
the database relations.
- We can check a database instance to see if an IC
is violated, but we can NEVER infer that an IC is
true by looking at an instance.
- An IC is a statement about all possible
instances!
- From example, we know name is not a key, but the
assertion that sid is a key is given to us.
- Key and foreign key ICs are the most common more
general ICs supported too.
41Logical DB Design ER to Relational
CREATE TABLE Employee
(ssn CHAR(11),
name CHAR(20),
lot INTEGER,
PRIMARY KEY (ssn))
42Relationship Sets to Tables
- In translating a relationship set to a relation,
attributes of the relation must include
- Keys for each participating entity set (as
foreign keys).
- This set of attributes forms a superkey for the
relation.
- All descriptive attributes.
CREATE TABLE Works_In( ssn CHAR(11),
did INTEGER, since DATE, PRI
MARY KEY (ssn, did), FOREIGN KEY (ssn)
REFERENCES Employees, FOREIGN KEY (did)
REFERENCES Departments)
43Review Key Constraints
- Each dept has at most one manager, according to
the key constraint on Manages.
budget
did
Department
Translation to relational model?
Many-to-Many
1-to-1
1-to Many
Many-to-1
44Translating ER Diagrams with Key Constraints
CREATE TABLE Manages( ssn CH
AR(11), did INTEGER, sinc
e DATE, PRIMARY KEY (did),
FOREIGN KEY (ssn) REFERENCES Employee,
FOREIGN KEY (did) REFERENCES Departments)
- Map relationship to a table
- Note that did is the key now!
- Separate tables for Employees and Departments.
- Since each department has a unique manager, we
could instead combine Manages and Departments.
CREATE TABLE Dept_Mgr( did
INTEGER, dname CHAR(20), bud
get REAL, ssn
CHAR(11), since DATE, PR
IMARY KEY (did), FOREIGN KEY (ssn) REFERENCES
Employee)
45Review Participation Constraints
- Does every department have a manager?
- If so, this is a participation constraint the
participation of Departments in Manages is said
to be total (vs. partial).
- Every did value in Department table must appear
in a row of the Manages table (with a non-null
ssn value!)
since
since
name
name
dname
dname
ssn
lot
budget
did
budget
did
Department
Employee
Manages
Works_In
since
46Participation Constraints in SQL
- We can capture participation constraints
involving one entity set in a binary
relationship, but little else (without resorting
to CHECK constraints).
CREATE TABLE Dept_Mgr( did
INTEGER, dname CHAR(20), budge
t REAL, ssn CHA
R(11) NOT NULL, since DATE,
PRIMARY KEY (did), FOREIGN KEY (ssn) REFER
ENCES Employee, ON DELETE NO ACTION)
47Weak Entities
- A weak entity can be identified uniquely only by
considering the primary key of another (owner)
entity.
- Owner entity set and weak entity set must
participate in a one-to-many relationship set (1
owner, many weak entities).
- Weak entity set must have total participation in
this identifying relationship set.
name
cost
pname
age
ssn
lot
Dependent
Policy
Employee
48Translating Weak Entity Sets
- Weak entity set and identifying relationship set
are translated into a single table.
- When the owner entity is deleted, all owned weak
entities must also be deleted.
CREATE TABLE Dep_Policy ( pname CHAR(20),
age INTEGER, cost
REAL, ssn CHAR(11) NOT NULL, PRIMARY KEY
(pname, ssn), FOREIGN KEY (ssn) REFERENCES E
mployee, ON DELETE CASCADE)
49Review Binary vs.Ternary Relationships
pname
age
Dependent
Covers
- What are the additional constraints in the 2nd
diagram?
Bad design
pname
age
Dependent
Purchaser
Better design
50Binary vs. Ternary Relationships (Cont.)
CREATE TABLE Policy ( policyid INTEGER,
cost REAL, ssn CHA
R(11) NOT NULL, PRIMARY KEY (policyid). F
OREIGN KEY (ssn) REFERENCES Employee,
ON DELETE CASCADE)
- The key constraints allow us to combine Purchaser
with Policies and Beneficiary with Dependents.
- Participation constraints lead to NOT NULL
constraints.
- What if Policies is a weak entity set?
CREATE TABLE Dependent ( pname CHAR(20
), age INTEGER, policyid
INTEGER, PRIMARY KEY (pname, policyid).
FOREIGN KEY (policyid) REFERENCES Policy,
ON DELETE CASCADE)
51Views
- A view is just a relation, but we store a
definition, rather than a set of tuples.
CREATE VIEW YoungActiveStudent (name, grade)
AS SELECT S.name, E.grade FROM Studen
t S, Enrolled E WHERE S.sid E.sid
AND S.age
- Views can be dropped using the DROP VIEW
command.
- How to handle DROP TABLE if theres a view on the
table?
- DROP TABLE command has options to let the user
specify this.
52Views and Security
- Views can be used to present necessary
information (or a summary), while hiding details
in underlying relation(s).
- Given YoungStudent, but not Student or Enrolled,
we can find students who are enrolled, but not
the cids of the courses they are enrolled in.
53Relational Model Summary
- A tabular representation of data.
- Simple and intuitive, currently the most widely
used.
- Integrity constraints can be specified by the
DBA, based on application semantics. DBMS checks
for violations.
- Two important ICs primary and foreign keys
- In addition, we always have domain constraints.
- Powerful and natural query languages exist.
- Rules to translate ER to relational model