Title: Data Models
1Data Models
2What is a Data Model?
- Having formed a model of the enterprise, we now
need to represent the data. - The data model tells us the structure of the
database. - Historically, three data models
- Hierarchical data model
- Network data model
- Relational data model
3Hierarchical and Network Data Models
- Hierarchical and network data models have been
superseded by the relational data model. - Reasons
- Lack of expressive power
- E.g., one cannot express many-to-many
relationships in the hierarchical model - More closely tied to the underlying
implementation. Hence, less data independence. - Relational data model has a clean mathematical
basis.
4The Relational Model
- Due to Codd.
- Everything is represented as a relation in the
mathematical sense. Also called tables. - A database therefore is a collection of tables,
each of which has a unique name, and each of
which is described by a schema. - In addition, Codd defined a data manipulation
language.
5Example of Schemas in the Relational Model
- Example of a representation of entity sets
- Student(sid,name,addr)
- Course(cid,title,eid)
- Empl(eid, ename, deptid)
- Dept(deptid, dname, loc)
- Primary keys are underlined.
- Recall that a primary key is one that uniquely
identifies an entity. - An entity is a row in a table.
6More Example Schemas
- Relationship sets between entity sets are also
represented in tables. - Example of a table corresponding to a
relationship - Enrol(sid, cid, grade)
- Again, a relationship is represented by a row (or
a tuple) in a relation.
7Relational DatabasesBasic Concepts I
- Attribute
- A column in a table
- Domain
- The set of values from which the values of an
attribute are drawn. - Null value
- A special value, meaning not known or not
applicable. - Relation schema
- A set of attribute names
8Relational Databases Basic Concepts II
- Tuple
- A set of values, one for each attribute in the
relation scheme over which the tuple is defined,
i.e. a mapping from attributes to the appropriate
domains - Relation instance
- A set of tuples over the scheme of the relation
9Relational DatabasesBasic Concepts III
- Relational Database
- A set of relations, each with a unique name
- Normalized Relation
- A relation in which every value is atomic
(non-decomposable). Hence, every attribute in
every tuple has a single value.
10Keys
- Candidate Key
- A minimal set of attributes that uniquely
identifies a tuple - Primary Key
- The candidate key chosen as the identifying key
of the relation - Alternate Key
- Candidate keys which are not primary keys
11- Foreign Key
- An attribute (or set of attributes) in table R1
which also occurs as the primary key of relation
R2. - R2 is called the referenced relation.
- Foreign keys are also called connection keys or
reference attributes.
12Integrity Rules Entity Constraint
- Entity constraint
- All attributes in a primary key must be non-null.
- Motivation If the primary key uniquely
identifies an entity in an entity set, then we
must ensure that we have all the relevant
information
13Integrity RulesReferential Integrity
- Referential integrity
- A database cannot contain a tuple with a value
for a foreign key that does not match a primary
key value in the referenced relation. - Or, a foreign key must refer to a tuple that
exists. - Motivation If referential integrity were
violated, we could have relationships between
entities that we do not have any information
about.
14Data Manipulation Languages
- In order for a database to be useful, it should
be possible to store and retrieve information
from it. This is the role of the data
manipulation language. - One of the attractions of the relational data
model is that it comes with a well-defined data
manipulation language.
15Types of DML
- Two types of data manipulation languages
- Navigational (procedural)
- The query specifies (to some extent) the strategy
used to find the desired result e.g. relational
algebra. - Non-navigational(non-procedural)
- The query only specifies what data is wanted, not
how to find it e.g. relational calculus.
16Relational Algebra
- Codd defined a number of algebraic operations for
the relational model. - Unary operations take as input a single table and
produce as output another table. - Binary operations take as input two tables and
produce as output another table.
17Unary Operations Select
- Select produces a table that only contains the
tuples that satisfy a particular condition, in
other words a horizontal subset. - Appearance
- sC(R)
- where C is a selection condition
- and R is the relation over which the selection
takes place
18Example of Select
- Student
- sid name addr
- 123 Fred 3 Oxford
- 345 John 6 Hope Rd.
- 567 Ann 5 Garden
- s sid gt 300(Student) yields
- 345 John 6 Hope Rd.
- 567 Ann 5 Garden
19Unary OperationsProject
- Project produces a table consisting of only some
of the attributes. It creates a vertical
subset. - Note that a project eliminates duplicates.
- Appearance
- ?A(R)
- where A is a set of attributes of R
- and R is the relation over which the project
takes place.
20Example of Project
- Enrol
- sid cid grade
- 123 CS51T 76
- 234 CS52S 50
- 345 CS52S 55
- ?cid(Enrol) yields
- CS51T
- CS52S
21Binary Operations
- Two relations are (union) compatible if they have
the same set of attributes. - Example, one table may represent suppliers in one
country, while another table with same schema
represents suppliers in another country. - For the union, intersection and set-difference
operations, the relations must be compatible.
22Union, Intersection, Set-difference
- R1 ? R2
- The union is the table comprised of all tuples in
R1 or R2. - R1 ? R2
- The intersection is the table comprised of all
tuples in R1 and R2 - R1 - R2
- The set-difference between R1 and R2 is the table
consisting of all tuples in R1 but not in R2.
23Cartesian Product
- R1 ? R2
- The Cartesian product is the table consisting of
all tuples formed by concatenating each tuple in
R1 with a tuple in R2, for all tuples in R2.
24Example of a Cartesian Product
- R1 A B
- 1 x
- 2 y
- R2 C D
- a s
- b t
- c u
- R1 ? R2 A B C D
- 1 x a s
- 1 x b t
- 1 x c u
- 2 y a s
- 2 y b t
- 2 y c u
25Natural Join
- R1 R2
- Assume R1 and R2 have attributes A in common.
Natural join is formed by concatenating all
tuples from R1 and R2 with same values for A, and
dropping the occurrences of A in R2 - R1 R2 ?A(sC(R1 ? R2))
- where C is the condition that the values for R1
and R2 are the same for all attributes in A and
A is all attributes in R1 and R2 apart from the
occurrences of A in R2. - hence, natural join is syntactic sugar
26Example of a Natural Join I
- Course
- cid title eid
- CS51T DBMS 123
- CS52S OS 345
- CS52T Networking 345
- CS51S ES 456
- Instructor
- eid ename
- 123 Rao
- 345 Allen
- 456 Mansingh
27Example of a Natural Join II
- Course Instructor
- cid title eid ename
- CS51T DBMS 123 Rao
- CS52S OS 345 Allen
- CS52T Net... 345 Allen
- CS51S ES 456 Mansingh
28Division
- R1 ?R2
- Assume that the schema for R2 is a proper subset
of the one for R1. - We form the division by
- Ordering the tuples in R1 so that all the tuples
with the same value for the non-common attributes
are grouped together. - Each group contributes a tuple to the result if
the groups values on the common attributes form
a superset of the values of these attributes in
R2.
29Example of Division I
- Enrol cid sid grade
- CS51T 123 A
- CS52S 123 A
- CS51T 234 C
- CS52S 234 B
- CS51T 345 C
- CS52S 345 C
- Temp sid grade
- 123 A
- 234 B
-
30Example of Division II
- Enrol cid sid grade
- CS51T 123 A
- CS51T 234 C
- CS51T 345 C
- CS52S 123 A
- CS52S 234 B
- CS52S 345 C
-
- Enrol ? Temp cid
- CS52S
- Thus, the division gives all courses for which
123 got an A and 234 a B.
31Assignment
- Allows the expression to be written in parts.
- Assigns the part to a temporary variable.
- This variable can be used in subsequent
expressions. - E.g.
- ?sid(?title DBMS (Enrol Course)
- Could be re-written as
- r Enrol Course
- ?sid(?title DBMS(r))
32Rename Operation
- Names the result of an expression.
- ?x(A1,A2,,An) (E)
- returns the result of expression E under the name
x with the attributes renamed as A1,A2,,An. - E.g. ?S (Student)
- Renames Student table to S.
33Database Modification
- Insert
- r r E
- e.g.
- Course Course (CS51T,DBMS)
- Delete
- r r - E
- e.g.
- Student Student - ?sid1(Student)
- Update
- r ?F1,F2,,Fn (r)
- e.g.
- Enrol ?sid,cid,grade grade
2 (Enrol)
?
?
34Examples
- Assume the following schema
- Student(sid,sname,saddr)
- Course(cid,title,lid)
- Enrol(sid, cid, grade)
- Lecturer(lid,lname,deptname)
- Query 1 Find the name of all students that have
taken the course entitled Expert Systems. - Query 2 Find the titles of all courses that
student Mark Smith has done. - Query 3 Find the id of students that have
enrolled in all the courses that lecturer with
id. 234 has taught. - Query 4 Find the highest grade for CS51T.
35Relational Calculus
- A relational calculus expression defines a new
relation in terms of other relations. - A tuple variable ranges over a named relation.
So, its values are tuples from that relation. - Example
- Get grades for CS51T
- ?e(Enrol)
- lte.gradegt e.cid CS51T
36Basic Syntax for Relational Calculus Expressions
- ?r(R),,?s(S)
- lttargetgt predicate
- where
- R,..,S are tables
- r,..,s are tuple variables
- target specifies the attributes of the resulting
relation - predicate is a formula giving a condition that
tuples must satisfy to qualify for the resulting
relation.
37The Predicate
- Predicate is constructed from
- attribute names
- constants
- comparison operators
- ? ? ? ? ? ?
- logical connectives
- ? ? ? ? ?
- quantified tuple variables
- ?t(R), ?t(R)
38Examples of Relational Calculus
- Example 2
- Get names and grades for students enrolled in
CS51T - ?e(Enrol), s(Student)
- lts.name, e.gradegt
- e.cid CS51T ?
- s.sid e.sid
- In relation algebra
- ?cid, name( s CID CS51T(Grade
Student))
39Example 3
- Give the names of all students who got at least
one A. - ?s(Student)
- lts.namegt
- ?e(Enrol)
- (e.grade A ?
- s.sid e.sid)
- Tuple variables not mentioned in the target list
must be bound in the predicate.
40Example 4
- Get the names of all students who only got As
- ?s(Student)
- lts.namegt
- ? e(Enrol)( s.sid e.sid ? e.grade
A) - ?
- ?e2(Enrol) (s.sid e2.sid)
41Example 5
- Get the names of all students who got an A and a
B - ?s(Student)
- lts.namegt
- ?e(Enrol) (e.grade B ?
- s.sid e.sid)
- ?
- ?e2(Enrol) (e2.grade A ?
- s.sid e2.sid)
42Example 6
- Get the course titles and names for the courses
for which the student did not get an A - ?c(Course), s(Student)
- lts.name, c.titlegt
- ?g(Enrol) s.sid g.sid ?
- g.cid c.cid ?
- g.grade ? A