Title: Fundamentals of CS 2: Databases REVISION WEEK 2
1Fundamentals of CS 2DatabasesREVISION WEEK (2)
- John Barnden
- Professor of Artificial Intelligence
- School of Computer Science
- University of Birmingham, UK
2What We Mainly Studied
- The nature of relational databases, the central
modern type of database. Entity types represented
as tables, holding relations. - Some basic mathematical concepts underpinning
relational databases, and useful also in many
other branches of CS. - Key aspects of how to develop the
conceptual/logical design of relational
databases. - In particular, how to achieve certain types of
good structuring, to help achieve certain types
of correctness and efficiency. - How to create and query databases using a
particular database language, PostgreSQL (a
version of SQL very widely used in various
forms).
3Overall Nature of Exam
- Half of a three-hour exam. Other half SE.
Equally weighted. - Five questions. Equally weighted.
- Questions range from precise technical things to
more general considerations. - One question will mainly ask for SQL expressions.
SQL may crop up in other questions. - The mathematical and relational algebra material
may be needed for the exam, but the finest
technical detail will not be expected. - Anything in the required textbook reading may be
needed in the exam, except of course that a
detailed memory of specific, data-full examples
is not expected, and except for some SQL detail
(see next slide).
4Textbook Chapters (7th Ed)
- Chapters 1-9, except that
- On SQL (Chs 7,8) the exam doesnt rest on fine
detail beyond whats in the handouts (and
occasional lecture) - In Chapter 8 only up to section 8.4 inclusive
- Chapter 9 note the important concepts of the
Systems Development Life Cycle and the Database
Life Cycle.
5Initial Considerations
- What a database is, and how it relates to other
types of data structure/repository in CS. - Data integrity, data redundancy, data anomalies.
- Associative links between parts of a database, as
opposed to pointing. - Ways data is stored/linked in physical human
media such as diaries, address books and
timetables. - Various complications in tables in human
documents. - Restricted type of table used in relational DBs.
6Entities and Relationships
- Any relational DB as consisting of entity types
and relationships between them Entity
Relationship Model (ERM) in general. - Specific ERMs for specific applications, and
distinction from Entity Relationship Diagrams
(ERDs). - Entity Types as represented by tables.
- The question of what types of thing should
correspond to entity types, and hence tables,
depends on the application and your design
judgment.
7Attribute Determination and Keys
- One or more attributes determining another
attribute. Can also be described as that
attribute being functionally dependent on the
former attributes. - Various notions of key, especially superkeys,
candidate keys and primary keys - And foreign keys as (primary) implementation of
relationships between entity types.
8Strength and Weakness
- Strong and weak relationships. (Also called
identifying and non-identifying relationships
respectively.) - Weak entity types, as defined according to the
strength of their relationships to other entity
types and existence-dependence with respect to
those types. - Depiction of strength or weakness in different
styles of ERD.
9Connectivity, Cardinality Participation
- Connectivity uniqueness or multiplicity of
entities at either end of a relationship. - Cardinality precise numerical info about how
many entities allowed or required at either end
of a relationship. - Participation optionality or mandatoriness of a
relationship, in either direction. - Overlap between these notions.
- Notation in ERDs.
10Table Representation of Relationships of
Different Connectivities
- Basic case is 1M non-recursive. (Recursive is
when two or more entity types in a relationship
are the same.) - MN, MNP, etc. standardly handled by breaking
down into two, three, etc. 1M relationships
going to a new entity type a bridging or
linking type. - 11 recursivevarious different methods according
to circumstances, one involving a bridging type. - 1M recursivecan often be handled within a
single table, but may be reasonable to introduce
a bridging type.
11Other Representation Issues
- Multivalued attributes. OK in themselves in early
stages of design, but should eventually be broken
down into single-valued attributes in some way. - A main divergence in ways of doing this is based
on whether the different values are for stably
identifiable subattributes. - Generalization hierarchies. Exhaustiveness,
disjointness.
12Normalization
- What normalization is and what role it plays in
the database design process - The normal forms 1NF, 2NF, 3NF, BCNF, and 4NF.
- How normal forms can be transformed from lower
normal forms to higher normal forms. - That normalization and ER modeling are used
concurrently to produce a good database design,
helping to eliminate data redundancies
anomalies. - That some situations require denormalization to
generate information efficiently.
13Creating ER Models/Diagrams
- Designing an ER model for a database is an
iterative process, because, e.g. - As you proceed, you think of new ways of
conceiving whats going on (much as in ordinary
programming) - Multivalued attributes need to be re-represented
- MN relationships can be included as such at an
early stage, but generally need to be replaced by
bridging entity types at some point - 11 relationships raise a red flag may indicate
poor design (though standard in supertype/subtype
representation) - Special supertype/subtype notation may need to be
converted into more standard diagram notation - Conversion to a Normal Form.
14SQL
- Mainly, the module only covers how to query
tables and how to create tables. - See manual and textbook for much more if you
want!
15MATHEMATICAL VIEW
16Tuples, Relations and Tables
- A relation is simply a set of tuples drawn from
some sets. - A relation is therefore a subset of the Cartesian
product of those sets. - A row is a tuple. Hence a table at any given
moment induces a relation over the value domains
of the table. - The table consists of not just the induced
relation but also the attributes themselves,
their domains, specification of primary and
foreign keys, etc.
17Totality, Partiality and Functionality of
Relations
- Totality and partiality in general.
- Relations induced by tables as almost certainly
being merely partial (i.e., not total). - Functionality of relations. Functions and partial
functions. - Determination relationships (i.e., functional
dependence relationships) within a table as
inducing partial functions from one or more of
the tables value domains to others.
18Some Categories of Relation
- One-to-one
- One-to-many and many-to-one
- Many-to-many
19Relations from Entity Relationships
- The connection between relationships in ERMs and
mathematical relations. - E.g., the EMPLOYED-BY relationship from the
People entity type to the Organizations entity
type says that - the database (at any moment) stores a relation
from the People entity set to the Organizations
entity set. - The connection between connectivity of a
relationship between entity types and the issue
of whether the corresponding relation is
one-to-one, one-to-many, etc. - The connection between mandatoriness of a
relationship from entity set E to entity set F
and a restricted notion of relation totality.
(The relation is total on the set of current E
things.)
20Some Operations on Sets in General
- Union, intersection, difference and Cartesian
product of two sets A and B (of any sort). - When A and B are relations the set of all
possible concatenations of a tuple within A and a
tuple within B. - I have called this the flattened Cartesian
product of A and B, notated as A ? B as opposed
to A ? B. - This name is intended to emphasize that the
operation is the mathematical basis of the
product of two tables.
21Relational DB Operators Relational Algebra
- Defines theoretical way of manipulating tables
using relational DB operators that mainly
manipulate the relations in the tables. - SELECT
- PROJECT
- JOIN (various sorts)
- INTERSECT
- Use of relational DB operators on existing tables
produces new tables. Strong connection to SQL
commands/operators. - Relational algebra puts relational DB operators
into a mathematical notation that is more
convenient than, e.g., SQL operators.
- UNION
- DIFFERENCE
- PRODUCT
- DIVIDE
22QUESTIONS?
23What Is a Database? contd.
- A database is generally a regularly (but often
complexly) structured body of information about
entities of various specific, precisely defined
types. - The entities are generally in various specific
types of relationship to each other - Each entity has a specific set of (intrinsic)
attributes of interest. - The values of intrinsic attributes are generally
of fairly basic, simple sorts (e.g., numbers,
dates, names). - Generally there are many entities of some of the
types or rather the expectation is that the
numbers could get large. - The entities of a given type are typically not in
any special order.
24What Is a Database(?) contd.
- The individual data elements held, though of
simple sorts, are generally close to end-user
concepts and concerns -- they are directly
meaningful interesting to users. E.g., price of
a car. - Contrast the detailed meteorological
measurements across the country and beyond, used
in a weather forecasting system for the UK. - The data held and retrieved is generally of exact
form (no vagueness expressed) and of definite
form (no uncertainty expressed or expected). - The operations provided to users for extracting,
inserting and updating data are of conceptually
straightforward sorts, not requiring elaborate
reasoning, problem-solving or analysis. - However, aggregate/overview/statistical
information (counts, averages, maxima, graphs,
histograms, etc.) often needs to be computed from
the data.
25Data Redundancy, Data Anomalies, and Data
Integrity
- See Ch.1 and many other parts of book.
- Data Redundancy replicating data in different
places in a data repository. - E.g., in a recipe book, saying how to fry onions
every time fried onions are needed in a recipe. - Encourages data anomalies and lack of
integrity basically, inconsistency between
the different places. - Such problems arise with insertions, deletions
and modifications in general. - Also causes a type of inefficiency replicated
updates.
26Redundancy, etc., contd. 1
- Redundancy implies that if you want to
modify/delete a piece of information, you need to - know that there is (or is not) replication, or
check for possible replications - go to the effort of repeating changes when the
item is replicated - avoid errors in such repeated changes.
27Restrictions on our Tables
- We will apply the following restrictions amongst
others - Regular overall shape rows all same length,
similarly columns. No division into different
regions (with a certain exception). - No labels for rows, as opposed to columns. No
significance to the order or number of rows (the
number can change). - All cells in any one column given same intuitive
interpretation. - One data item per cell (but it can be a
variable-length character string, containing
anything). - Each cells item restricted to a pre-specified,
fairly simple format, and all cells in any given
column restricted to same format. No exceptional
entries. - Uncertainty markers not recognized though
tolerated in principle. - Some columns not supposed to have empty or vague
entries. - No additional comments, footnotes, etc.
28A Bad Table
29Extra, Crucial Restriction
- No row can be repeated in a table. (I.e., no two
rows can contain be exactly the same in terms of
the values they contain.) Exception
dynamically-created, temporary tables. - Equivalent to saying
- Rows are uniquely determined (picked out) by the
values in some set of columns. That is, given
some values for those columns, there is at most
one row that has those values in those columns.
30A Conceptual Model
- Represents global view of the database
- Enterprise-wide representation of data as viewed
by high-level managers - Basis for identification and description of main
data objects and relationships, avoiding
details
31The Entity Relationship Model
- Introduced by Chen in 1976
- Most widely used conceptual model of DBs
- The ER model strictly speaking is just the
approach of thinking of things as composed of
entities, attributes and relationships it has
nothing intrinsically to do with diagrams
J.A.B. - We also say that applying this approach to a
particular body of data gives rise to an ER model
of the specific intended database - But diagrams based on the model are a widely
accepted and adapted graphical approach to data
modeling
32Entity Relationship Diagrams (ERDs)
- The ER model of a database forms the basis of an
ER diagram (or several diagrams). - The ERDs represent the database as viewed by end
users. - There are several markedly different styles of
ERD, and for each main style there are several
variants. - And the style in the module handouts will differ
somewhat from that in the textbook and these
lectures - An ERD depicts (some of) the ER models entities,
attributes and relationships, and (depending on
the diagram style) varying amounts of other info
such as connectivities, cardinalities, keys,
weakness,
33What Is Represented as Tables?
- The question of what types of thing should
correspond to entity types and hence tables
depends on the application and your design
judgment. - It all depends on things like
- what variety of data is needed about something
- how separate the pieces of data about a given
thing are - what operations are needed
- how often theyre needed.
34Attribute Determination
- A collection of one or more attributes determines
another attribute A if only one value for A is
possible given the values for the former
attributes. - E.g., the collection DAY-NUMBER, MONTH and YEAR
could determine DAY-NAME. - Also write this as DAY-NUMBER, MONTH, YEAR ?
DAY-NAME - We alternatively say that DAY-NAME is
functionally dependent on DAY-NUMBER, MONTH and
YEAR.
35Keys(RC pp.82/3)
- A key for a table is a collection of one or more
attributes that determines some other
attribute(s) in that same table. - A superkey for a table is a collection of one or
more attributes that determines all the other
attributes in the table, i.e. determines a whole
row. - Trivially, the collection of all the attributes
is a superkey. - A candidate key is a minimal superkey (i.e., you
cant remove attributes from it and still have a
superkey.)
36Primary Keys
- A primary key for a table (entity type) is a
candidate key that the DB designer has chosen as
being the main way of uniquely identifying a row
(entity). Extra restriction Its attributes are
not allowed to have null values. - It could be that theres only one candidate key
in practice anyway, such as a persons ID number. - Primary keys are the main way of identifying
target entities in entity relationships, e.g.,
the way to identify someones employing
organization. - For efficiency reasons, the simpler primary keys
are the better. - Identity numbers (of people, companies, products,
courses, etc.), or combinations of them with one
or two other attributes, are the typical primary
keys in examples in RC.
37Strong and Weak Entity Types
- A weak entity type is one to which there is at
least one strong relationship going to it from
another entity type. - So on previous slide, Dependents is weak, because
there is a strong relationship to it from People. - A strong entity type is one that is not weak!
i.e. there are no strong relationships going to
it from another type. - A weak entity is existence-dependent on every
entity type that has a strong relationship going
to it. - Marys existence in the database as a member of
Dependents relies on the existence of person
1698674 in the database. (Doesnt mean Mary would
vanish from the planet if person 1698674 left the
database! And indeed Mary could herself be an
entity in type People even if 1698674 leaves.)
38Strong and Weak Relationships
- Also called identifying and non-identifying
relationships respectively. - A relationship from entity type A to entity type
B, mediated by having As primary key (PK) as an
attribute of B, is strong when Bs PK contains
As. - E.g., A People, B Dependents, where each
entity in People is identified by a PERS_ID (As
PK) and each entity in Dependents is identified
by a PERS_ID plus a first name and a
kinship/friendship attribute. - So a PK value in B could be (1698674, Mary,
child), meaning that this entity is the child
called Mary of person 1698674. - A relationship is weak when it isnt strong.
39Connectivity Cardinality
- Relationships are importantly categorized as to
uniqueness or multiplicity of entities at either
end connectivity. Has big effect on DB design. - Relationships can be further specified as to how
many entities allowed or required at either end
cardinality. - Also has significant effect on DB design.
40Relationship Participation
- Optional in a particular direction, X to Y
- an X entity occurrence does not require a
corresponding Y entity occurrence - i.e. the minimum number of Ys per X is 0
- Mandatory in a particular direction, X to Y
- an X entity occurrence requires a corresponding Y
entity occurrence - i.e. the minimum number of Ys per X is 1 or more
41Relationships and Foreign Keys
- A foreign key in a table T is a chosen collection
of attributes intended to match the attributes in
the primary key in another table. In essence, the
foreign key is that primary keys ambassador in
T. - Standardly, a relationship is represented by
means of foreign keys.
421M Connectivity between Tables
More than one employee allowed per organization,
but no more than one employer per person. NOTE
direction of use of a foreign key. Why so??
People
Organizations
43MN Connectivity between Tablesusing a Bridging
Entity Type
People
Employments
Organizations
44Examples in Two Styles of Diagram
45Multivalued Attributes in ERMs and ERDs
46Splitting the Multivalued Attribute into New
Namable Component Attributes
47OR, Replace the Attribute by a New Entity Set
with an Attribute identifying the Original
Attributes Separate Components
48Generalization Hierarchies in ERMs and ERDs
49A Generalization Hierarchy with Overlapping
Subtypes (Gs)and Disjoint Subtypes (G)
50Supertypes/Subtypes in ERDs
A supertype maintains a 11 relationship with
each subtype optional in the super-to-sub
direction and mandatory in the other
51Non-Binary and Recursive Relationships in ERMs
and ERDs
52Tables for a Ternary Relationship
CFR is just like a bridging entity type for a
binary MN relationship, but has 3 foreign keys
instead of 2
53Recursive Relationships
- Relationship can exist between occurrences of the
same entity set - E.g. marriage, management, parthood,
54Table for the 1M EMPLOYEE Manages EMPLOYEE
Recursive Relationship
55Tables for the MN Recursive PART Contains PART
Relationship
The COMPONENT entity type is just a bridging type
56Alternative Implementations of a 11 Recursive
Relationship
- As previously.
- MARRIED_VI is just a bridging entity type.
- MARRIAGE together with MARPART act similarly as
bridge.
57NORMALIZATION(Ch. 5 of RC)
58First Normal Form
- Tabular format in which
- All attributes are dependent on a chosen
primary key - Primary-key attributes do not have NULL values
- There are no repeating groups in the table
- All relational tables satisfy 1NF requirements
59A Table that is Not in 1NF because of repeating
groups
60Second Normal Form
- Table is in second normal form (2NF) if
- It is in 1NF and
- It includes no partial dependencies
- Convert to 2NF by creating a new table for each
part of the primary key that provides partial
dependencies, and moving out the corresponding
dependent attributes into the new tables.
61A Dependency Diagram for a Table that is merely
in 1NF (by having a partial dependency)
62Second Normal Form (2NF) Conversion Results
63Third Normal Form
- A table is in third normal form (3NF) if
- It is in 2NF and
- It contains no transitive dependencies
64Conversion to Third Normal Form
- For every transitive dependency, take its
determinant (the attribute or collection of
attributes on which the dependency rests) as a
PK for a new table. - Move the attributes dependent on the determinant
to that new table.
65Third Normal Form (3NF) Conversion Results
66The Boyce-Codd Normal Form (BCNF)
- A table is in BCNF if every one-attribute
determinant in the table is a candidate key - i.e., every attribute that determines any other
attribute determines all the attributes, and
therefore determines a unique row, so theres no
redundancy problem
67A Table in 3NF but not in BCNF
68Decomposition to BCNF
The middle diagram shows that changing the PK so
as to include C doesnt work
69Plusses/Minusses of Normalization
- PLUS Unnormalized tables in a production
database tend to have these defects - Data updates are less efficient because programs
that read and update tables must deal with larger
tables - Indexing is much more cumbersome
- MINUS Joining larger number of tables takes
additional disk input/output (I/O) operations and
processing logic - Reduces system speed
- Conflicts among design efficiency, information
requirements, and processing speed are often
resolved through compromises that may include
denormalization
70Tuples in a Table
People
- The tuples are
- ?9568876A, Chopples, 37gt
- ?2544799Z, Blurp, 21gt
- ?1698674F, Rumpel, 88gt
71Cartesian Product of Domains
- The set of all possible tuples formed from the
domains is called the Cartesian product of the
domains. - (This applies to any sets, not just domains in
tables.) - Notation, e.g. D ? E ? F ? G ? H
- if D, E, F, G, H are the sets (domains)not
necessarily different. - In discrete mathematics, any subset at all of
that Cartesian product is called a relation on
the sets in question (D, E, ) - even the whole of the product (even if infinite)
- and even the empty set.
- (The notion of relation is very useful in CS as a
whole.)
72Relation from a Table
People
- The relation at the moment is
- ? ?9568876A, Chopples, 37gt
- ?2544799Z, Blurp, 21gt
- ?1698674F, Rumpel, 88gt ?
73Relations from Somewhere to Somewhere
- Some convenient terminology of my own
- A relation R from A1, A2, A3, to B1, B2, B3,
is a subset of A1 ? A2 ? A3 ? B1 ? B2 ? B3 . - i.e., R ? A1 ? A2 ? A3 ? B1 ? B2 ? B3 .
- Same thing as a relation on A1, A2, A3, B1, B2,
B3, just different terminology.
74Partiality of Tables
- The relation in a table, considered as a relation
from its first attribute domain to the remaining
attribute domains, will almost always be merely
partial. - Similarly, any attribute-reordered version of the
tables relation will be merely partial. - We can sum this up by saying that the relation is
partial on each of its attribute domains.
75Functional Relations
- A relation from A to B is functional if, for any
a in A, there is at most one b in B such that ?a,
bgt is in R. - A total functional relation from A to B is called
a function from A to B. - A (merely) partial functional relation from A to
B is called a (merely) partial function from A to
B. - Can generalize a relation from A1, A2, A3 to
B1, B2, B3, is functional if, for any a1, a2,
a3, in A1, A2, A3, respectively, there is at
most one b1, b2, b3, in B1, B2, B3,
respectively such that ?a1, a2, a3, , b1, b2,
b3, gt is in R.
76Functional Relations arising from Dependency
Relationships
- Suppose attribute X is functionally dependent on
( determined by) attributes A, B, in a table. - Restrict to the subtable that just has attributes
X and A, B, , with A, B, first. - Then the relation in this subtable is a partial
function from the A, B, domains to the X
domain. - In particular, each attribute X outside the
primary key of a table is functionally dependent
on the primary key, so we have a partial function
from the PK domains to the X domain.
77Other Categories of Relation
- A relation R from A to B is one-to-one (1-1) if,
for any a in A, there is at most one b in B such
that ?a, bgt is in R, AND for any b in B, there is
at most one a in A such that ?a, bgt is in R. - That is, both the relation and its inverse from B
to A are functional. - To put it another way it is functional and
different members of A map to different members
of B. - Or again Different members of A map to different
members of B and different members of B map to
different members of A.
78 - A relation R from A to B is many-to-one if it is
functional but not necessarily one-to-one i.e.,
there may be at least one case of different
members of A mapping to the same member of B. - A relation R from A to B is one-to-many if it is
NOT necessarily functional but its inverse from B
to A is functional i.e., there may be at least
one case of a member of A mapping to more than
one member of B, but each member of B can map to
at most one member of A. - A relation R from A to B is many-to-many if
neither it nor its inverse is necessarily
functional i.e., there may be at least one case
of a member of A mapping to more than one member
of B, and there may be at least one case of a
member of B mapping to more than one member of A.
79Example Continued
- So at any given moment the relation might be
- ?Person1, Org1gt, ?Person2, Org1gt, ?Person3,
Org1gt, - ?Person4, Org2gt, ?Person3, Org2gt
- Each Person.. and Org.. is an entity ...
therefore represented as a row of the
corresponding table ... therefore itself
mathematically represented as a tuple of
attribute values - So ?Person1, Org1gt could be, in more detail,
- ? ?E156, Sam, Finks, I678gt, ?I678, IBM,
USAgt gt - Note the nested tuples.
80Bridging Entity Types
- Recall that bridging entity types are brought in
to represent MN relationships (and similarly
MNP relationships, etc.) - People/Organizations again the relation in the
bridging table would look like ? E156, I678gt, ?
E257, I996gt, ?E714, I678gt, . - This relation can also be said to correspond to
the original People-Organization relationship,
but is abstracted from the above relation by
replacing tuples representing entities, such as
?E156, Sam, Finks, I678gt, by the PK values in
them, such as E156.
81Connectivities
- If a relationship from an entity type to another
is 11, then at any moment the actual relation
must be one-to-one (1-1). - If the relationship is 1M then the relation at
any moment is allowed to be one-to-many (but may
by chance be one-to-one). - If the relationship is MN then the relation at
any moment is allowed to be many-to-many (but may
by chance be one-to-one, one-to-many, or
many-to-one).
82Optionality/Mandatoriness
- If a relationship from an entity type E to
another type F is mandatory then the relation at
any moment is total when restricted to the set
of entities currently in E (and NOT on the whole
entity set of E, unless all possible entities of
type E are in the E table!!). - Observation So if it is also one-to-one or
many-to-one it is in fact a function on the
current set of E entities to the set of
organizations. - If a relationship from an entity type E to
another type F is optional then the relation is
not required to be total in the above sense (but
may happen to be).
83Some Operations on Sets in General
- Union of sets A and B
- A ? B the set of things that are in A or B (or
both). - NB no repetitions allowed!
- Intersection of sets A and B
- A ? B the set of things that are in both A and
B. - Difference of sets A and B
- A ? B the set of things that are in A but not
B. - Note also notated by a backslash instead of a
minus sign.
84General Set Operations contd.
- I need to define also the non-standard notion of
flattened Cartesian product of two sets A and
B, applicable when the members of A and B are
already tuples. Notated by the symbol ?
(underlined multiplication symbol). - A ? B the set of tuples that are the
concatenations of members of A and members of B.
E.g., if lta,b,cgt is in A and ltd,e,fgt is in B then
lta,b,c,d,e,fgt is in A ? B.
85General Set Operations contd.
- So if A is the People relation and B is the
Organizations relation, - A ? B has members of form
- ? ?E156, Sam, Finks, I678gt, ?I459,
Dell, UKgt gt - BUT
- A ? B has members of form
- ?E156, Sam, Finks, I678, I459, Dell,
UKgt
86Select
- SQL
- SELECT FROM WHERE
- Note its the WHERE part that is actually doing
the selection according to a criterion. - Maths of relations
- No particular notation for result relation. Could
express as - t ?R t satisfies C
- where R is the relation in the given table and C
is the criterion. - Relational algebra notation in handout
- Result table is ?C(T) where T is the given table.
- More compact than SQL notation.
- Problem relations and RA dont account for
duplicates of rows, if there are any in the given
table.
87Project
- SQL
- SELECT FROM
- Note its the after the SELECT that is
actually doing the projection (selection of
specified attributes)!! - Maths of relations
- No particular concise notation for result
relation. Could express as - t ?P there is a tuple u in R such that t is u
with such and such components removed - where R is the relation in the given table and P
is the Cartesian product of the domains of the
selected attributes. - Relational algebra notation in handout
- Result table is ?X(T) where T is the given table
and X is the list of selected attributes. - Problem relations and RA dont account for
duplicates of rows, if there are any in the given
or result tables.
88Union and Intersection
- SQL
- UNION, INTERSECT (UNION ALL, INTERSECT ALL)
- Maths of relations
- Result relations are R1 ? R2 and R1 ? R2
- where R1and R2 are the relations in the given
tables (under the assumption that they have their
attributes listed in the same order). - Relational algebra notation in handout
- Result tables are T1 ? T2 and T1 ? T2 where T1
and T2 are the given tables. - Problem relations and RA dont account for
duplicates of rows, so dont handle UNION ALL or
INTERSECT ALL.
89Difference
- SQL
- MINUS or EXCEPT (MINUS ALL or EXCEPT ALL)
- Maths of relations
- Result relation is R1 ? R2
- where R1and R2 are the relations in the given
tables (under the assumption that they have their
attributes listed in the same order). - Relational algebra notation in handout
- Result table is T1 ? T2 where T1 and T2 are the
given tables. - Problem relations and RA dont handle EXCEPT ALL.
90Product or Cross Join
- SQL
- SELECT FROM two or more tables
- NB its the mere listing of the tables that does
the Product, but its possible also to write - SELECT FROM T1 CROSS JOIN T2 CROSS JOIN ...
- Maths of relations
- Result relation is R1 ? R2 where R1and R2 are the
relations in the given tables. - Relational algebra notation
- Result table is T1 ? T2 where T1 and T2 are the
given tables. - Problem relations and RA dont account for
duplicates of rows, if there are any in the given
tables.
91Natural Join
- Links two tables by finding rows in each that
match on the attributes the two tables have in
common (if any), and then joining the matching
rows together in a natural way. - The common attributes or columns are called the
join attributes or columns) just the AGENT_CODE
attribute in above example - Can be thought of as the result of a three-stage
process - the PRODUCT of the tables is created
- a SELECT is performed on the resulting table to
yield only the rows for which the join-attribute
values (e.g. AGENT_CODE values) are equal - a PROJECT is now performed to yield a single copy
of each join attribute, thereby eliminating
duplicate columns
92Natural Join (continued)
- SQL
- SELECT FROM T1, T2 WHERE explicit
- statements of equality of ALL the shared
attributes ... - NB its possible also to write
- SELECT FROM T1 NATURAL JOIN T2
- and see book section 7.2 for more.
- Maths of relations
- No standard concise notation. Result relation
could be expressed as the result of Project and
Select operations on R1 ? R2 where R1 and R2 are
the relations in the given tables. - Relational algebra notation
- Result table is T1 ... T2 where T1 and T2 are the
given tables and the is a symbol I cant find
in PowerPoint!
93Other Forms of Join
- Equijoin (often what you have been doing in SQL
work) - Links tables on the basis of an equality
condition that compares SPECIFIED attributes of
each table, rather than automatically taking the
common attributes. - These specified join attributes do not have to
have the same name. - Result does not eliminate duplicate columns
- Outer Join (various forms) includes (some)
non-matching rows