Title: CMSC 424
1- CMSC 424
- Database Design
- Section 401
- Dr. David Kuijt
2Contact Info
- Professor David Kuijt
- Office AVW 3205
- Phone 5-0534
- Email kuijt_at_cs.umd.edu
- Office Hours T/Th 315-400 (or by appointment)
- TA Debbie Heisler
- Office AVW 3270
- Phone 405-7027
- Email heisler_at_cs.umd.edu
- Office Hours TBA
3Basic Information
- Required text
- Korth Silberschatz Database System Concepts,
Fourth Edition, McGraw Hill 2001. - Warnings
- Late homework or projects are not acceptable.
Hand in what you have finished. Exceptions will
be made only for emergencies or medical reasons
with a doctor's note. - No makeup exams. Exceptions as above.
- Cheating will result in an immediate grade of XF
("failure through academic dishonesty" -- this
goes on your permanent transcript), and may
result in suspension or expulsion from
University. This is your only warning. Don't do
it.
4Motivation
- We live in a database world. The simplest acts
are tied to databases. The last time I called
out for delivery pizza, it involved at least four
enormous databases. What were they? - Pizza Hut knew what I had ordered before -- they
asked if I wanted the same pizza as last time.
They probably store lots more information than
that -- perhaps all my old order information.
They could use this information to make corporate
decisions (quantities of materials to order
forecasting pizza trends) as well as a local aid.
5Motivation (2)
- Every delivery food place around here uses caller
ID. That's a relatively simple database, just
giving names and incoming telephone number, but
it helps them avoid some types of fraudulent
orders and errors when writing down names,
addresses, and stuff. - I used a telephone. When you use a telephone,
all the details about the call are stored in a
database. Call length, what number you called,
time of call, billing information, and so on.
Cell phone databases are even more complex.
6Motivation (3)
- I paid by credit card. Huge databases are
involved. - Every time somebody runs your credit card through
a swipe reader or types in the number, they're
checking information in a database. - Is this a valid credit account?
- Does it have enough money to cover the bill?
- Is the credit card stolen?
- Debit the merchant account, credit the consumer
account - Whether the transaction is accepted or rejected,
all the details are recorded in a database
somewhere.
7More Motivation
- Every time you go to an ATM, use a credit card,
buy something with a UPC bar code at a
supermarket or department store, go to a movie,
concert, or Caps game, register for classes, or
get a parking ticket in Lot 4, you are working
with databases. - Everything in your wallet that isn't a photograph
is an entry in a database somewhere - medical plan cards
- credit cards
- student ID
- driver's license
- membership cards in clubs or interest groups
- everything. Even the currency!
8Final Motivation
- Databases are all around us.
- Knowledge is power.
- Databases give us power (the ability to do things
we couldnt otherwise do) - they give other people power over us, and
knowledge about what we are doing. - This class is about Databases.
- So what is a Database?
9What is a Database?
- At the simplest level, a database architecture
has two components. - (1) Data.
- Usually a whole lot of it
- Representing multiple types of different objects
- Each type may be related to itself and to other
types in multiple ways - (2) A set of methods to access and manipulate the
data.
10Data
- For any reasonable-size database the data may be
quite complex. - It is an attempt to record or model all the
aspects of the real world that are important to
one specific purpose -- telephone calls, for
example, or credit card accounts. - Lots of different objects need to be stored as
data, and they need to be stored in such a way as
to reflect the ways that the objects can interact
with each other.
11Database Example
- For example, consider a local telephone system
database. Types of data stored include - Account information
- customers individuals, groups, companies that
have leased numbers - billing addresses, payment history, calling plans
and billing contracts - Hardware information
- network structure (call routing),
- hardware age, reliability, and maintenance
information, - system load tracking,
- network billing pattern (what numbers are long
distance from what other numbers, and what ones
are local)
12Database Example (2)
- Local telephone system database continued.
Additional types of data would include - Call information
- start and end time,
- telephone number that initiated the call
- telephone number(s) that received the call
- All that information could be stored in files
with much less fuss and bother. Why use a
database? Why not just store the information in
flat files?
13Why Not Flat Files?
- Why use a database? Why not just use a flat
file? - Databases have a number of advantages over flat
files. - Data Access. The set of programs that provide
access to a database allow much more complex and
flexible queries to the database with greater
efficiency and convenience. - Reduced duplication and better control over data
consistency. Data redundancy is bad. Which item
to change in an update? How do you know that
you've found all the copies? Data inconsistency
(disagreement between various copies of the same
data) is a serious problem. - Integrity Constraints can be enforced inside a
database -- telephone numbers all 10 digits
phone numbers in Maryland all have the first
three numbers being 410 or 301.
14Why Not Flat Files? (2)
- Uniform access and control of data using a
standard language - Data Independence. We want the data to be
independent of the representation chosen for it
within the system. Tying the data to a given
representation is what caused the Y2K fuss --
only two digits were used for a "year" field. - Concurrency control. Multiple users on a single
database is a big advantage. - Recovery.
- Security. Different users of the database may
need different levels of access to information. - Centralized Control
- Platform independence (portability). Since the
internal file structure and access program
details are hidden from the user, it is much
easier to use the database on multiple platforms.
15Data Abstraction
- Most users don't need to understand all the
details of the implementation and data design of
a complex database. To make a database
convenient to use, the system provides users with
an abstract view of the data, limiting the
information available to them. There are usually
three levels of data abstraction. - Physical Level
- Conceptual Level
- View Level
16Data Abstraction (2)
- Physical level. The actual implementation
details of low-level data structures are
described at this level. - Conceptual level. This level describes all the
different data types that exist by defining a
relatively small number of simple structures,
including all the relationships that these data
types have with each other. Implementation of
these objects might be complex, but it is hidden
from the user at this level. Database
administrators are usually the only ones who have
access at this level.
17Data Abstraction (3)
- View level. There may be multiple different
views, each of which represents a simpler subset
of the functions and data available at the
conceptual level. Different user types may
require different parts of the database (for
example, a bank account database might be
accessed by cashiers, account holders, credit
card companies, and the bank's payroll manager.
Each of them can only access a small part of the
full database of bank account information).
Creating a number of restricted views makes the
database more useful for the individual user
types, giving each type access according to the
needs of that type.
18Data Abstraction (4)
- Definition a Schema is a specification of a
particular database using a particular data
model. - The three levels of data abstraction are often
referred to as - External Schema(s) (for the view level(s)).
- Conceptual Schema (for the conceptual level)
- Internal Schema (for the physical level)
19Database as Model
- A model represents a perception of a real system
- Models help us manage or understand the real
world system they represent. - When modeling a system we select aspects and
characteristics we want to represent we abstract
them to form a simple(r) system - examples a map, an airplane flight simulator,
computer weather analysis program - A database is a model of reality
20Data Models underlying the Database
- The data model is a collection of conceptual
tools for describing data and its attributes - data objects
- interrelationships of the data
- data semantics and consistency constraints
- There are two well-established data models used
in database design - Entity-Relationship (E-R) model
- Relational model
- older methods included the Network and
Hierarchical data models - Each was tied closely to the underlying
implementation, which made it more difficult to
model data and to modify or update the database.
As a result they arent much used any more
21Entity-Relationship Model
- Diagram based model
- Two primitives
- Entities -- each represents a unique real-world
object - Relationships -- each represents an association
among several entities - Each are associated in sets of the same type (for
example, one entity set might be customer,
representing the set of all entities that
represent customers at a given bank) - Third important notion Attributes
- Entities are associated with a set of attributes
22Diagram-based Model
23Entities
- Entity a distinguishable object we want to model
- e.g., room CSI 3120, Celine Dion, Elizabeth I of
England - Entities have attributes (single-valued
properties) - e.g., a person has a name, SSN, gender,
- if an attribute has more than a single value, we
should model it as an Entity - Entity Set a set of entities of the same type
- e.g., CLASSROOMs, SINGERs, HISTORICAL MONARCHs
- Entity Sets may overlap
- CSI 3120 is a member of CLASSROOMs and also a
member of CSI BUILDING ROOMs.
24Relationships
- Relationship is an association among entities
- David Kuijt teaches-in CSI 3120
- Relationship Set is a collection of relationships
of the same type - FACULTY teach-in CLASSROOMs
- Relationships may also have attributes
- e.g., the relationship teach-in has an attribute
weekday and another attribute time to store
the day and time in which a given Entity of the
set FACULTY teaches in a given Entity of the type
CLASSROOM
25Example Database Design (1)
- Application library database. Authors have
written books about various subjects different
libraries in the system may carry these books. - Entities (with attributes in parentheses)
- Authors (SS, name, tel, birthdate)
- Books (ISDN, title)
- Subjects (sname)
- Libraries (lname)
- Relations associating entities in square
brackets - Wrote-on Authors, Subjects
- Carry Libraries, Subjects
- Index Subjects, Books
26Diagram of Initial Database Design
27Poor Initial Design
- Our first design is a poor model of the
real-world system we are examining. Problems in
our first design - no relationship associating authors and books
- no relationship associating libraries and books
- common queries will be complex and difficult
- Q what libraries carry books by a given author?
- Q what books has a given author written?
- Q who is the author of a given book?
- Q how many copies of a given book exist at each
library? - Q what edition of a book does the library have?
28Example Database Design (2)
- Application library database as before
- Entities (with attributes in parentheses)
- Authors (SS, name, tel, birthdate)
- Books (ISDN, title)
- Subjects (sname)
- Libraries (lname)
- Relations associating entities in square
brackets (attributes in parentheses) - Wrote Authors, Books
- In-stock Libraries, Books (quantity, edition)
- Index Subjects, Books
29Diagram of Improved Database Design
30Keys
- Fundamental concept for databases
- Must be able to uniquely identify things within a
database (in the E-R model, Entities and
Relationships) - Avoid duplication of results in a search
identify data redundancy in other operations - Halt search on positive results
- Quick lookup in underlying data structures used
at the Physical Level of abstraction - Examples of possible keys
- Student ID number (SS) is used as a key for most
UMD databases having to do with students
31Entity Keys
- Superkey set of attributes whose values uniquely
identify the entity - candidate key a minimal superkey (a minimal
subset of a superkey whose values still uniquely
identify the entity) - primary key if there is more than one possible
candidate key, one is chosen as the primary one
used for most entity-identification purposes - weak entity has no primary key instead it
depends upon another strong entitys primary key
to exist - e.g., CHILDren of EMPLOYEEs are weak the primary
key of EMPLOYEE in addition to the attributes of
the CHILD are used for identification - weak entities are existent dependent on a
strong entity -- when the strong entity gets
deleted, so does the weak one
32Relationship Keys
- Depend upon the entity mapping of the
relationship - one-one the primary key of any of the entities
can be used to uniquely distinguish a given
relationship between two unique entities. - one-many the primary key of the many entity,
plus possibly a subset of the attributes of the
relationship, will uniquely identify a given
relationship - e.g., MOTHER gave-birth-to CHILD to identify a
specific gave-birth-to relationship requires the
primary key of MOTHER and possibly the (date) and
(time) attributes of gave-birth-to - many-many the union of the primary keys of the
entities associated, plus possibly a subset of
the attributes of the relationship, will uniquely
identify a given relationship - e.g., PERSON married PERSON SS of both and
possibly date
33Special Cases
- Relationships may associate different entities of
the same type - Ternary versions of the above
- M-N relationships many-one mappings are often
more useful in practice than many-many mappings. - DUMMY Entities can be used to convert an M-N
mapping relationship to a pair of relationships,
one M-1 and one N-1.
34Specialization-Generalization(ISA Hierarchy)
- This is a way to represent entity complexity
- specialization top-down refinement of entities
with distinct attributes - Entity type BANK ACCOUNT might be subdivided into
related but different types CHECKING ACCT and
SAVINGS ACCT - generalization bottom-up abstraction of common
attributes - Course types DATABASE, SYSTEM, and NETWORK all
have common attribute (project). From them we
can abstract a new course type PRACTICAL COURSE - other common course attributes are included
(e.g., course number)
35ISA Hierarchy Example Top-down Refinement
- Account entity with attributes balance and number
- additional complexity we want to represent two
subtypes of account - Savings Account with attribute Interest Rate
- Checking Account with attribute Overdraft Limit
36ISA Hierarchy Example Bottom-up Abstraction
- Three related entities with similar attribute
project - we abstract a new type of super entity Practical
Course and link the three entities as subtypes - other shared attributes (e.g., course number) are
also promoted to the upper level entity
37Aggregation(Part-of Hierarchy)
- This is a way to represent relationship
complexity - relationships among relationships are not
supported by the E-R model - often we want to model lower-level relationships
differently - Groups of entities and relationships can be
abstracted into higher level entities
38Part-of Hierarchy Example
- Entities driver, car, tires, doors, engine,
seats, piston, valves - Relationship drives is insufficient to model the
complexity of this system - Part-of relationships allow abstraction into
higher level entities (piston and valves as parts
of engine engine, tires, doors, seats aggregated
into car)
39Mapping an E-R Schema to Tables
- Motivation - translating E-R database designs
into Relational designs - Both models are abstract, logical representations
of a real-world enterprise - Both models employ similar design principles
- Converting an E-R diagram to tables is the way we
translate an E-R schema to a Relational schema. - Later on well examine how to convert a
Relational schema to an E-R schema
40Mapping an E-R Schema to Tables (2)
- Strong Entity E with primary key PK and
attributes A, B, E(PK, A, B, ) - Weak Entity F with (non-primary) key WK and
attributes C, D, depending upon E above for
primary key
F(PK, WK, C, D, ) - Relationship R with attributes L, M, and
associating Entities E (with primary key PK), E2
(PK2), E3 (PK3),
R(PK, PK2, PK3, , L, M, ) - Relationships between weak entities and the
strong one on which they are dependent usually do
not require representation because it is usually
a many-one relationship with no attributes on the
relationship (they are on the weak entity) and so
the resulting table R(PK, WK) is a subset of the
weak entity itself.
41Table Details
- The whole table represents a single Entity Set or
Relationship Set. - Each entry (row) in the table corresponds to a
single instance (member in that set) - For an Entity Set each column in the table
represents an attribute in the E-R diagram - For a Relationship Set each column in the table
represents either an attribute of the
Relationship or one of the parts of the primary
key of the Entity Sets it associates
42Mapping an E-R Schema to Tables (3)
- ISA relationships choose either to
- Represent the super class entity, then represent
each subclass with the primary key of the super
class and its own attribute set. This is very
similar to the way weak entities are treated. - Or, map the subclasses to separate relations and
ignore the whole super class. This is good when
the subclasses partition the whole superclasses
between them (the subclasses are disjoint and the
union of the subclasses covers the whole super
class). - Aggregate (part-of) relationship
- Translation is straightforward -- just treat the
aggregate as an entity and use the methods
defined above. - With last weeks lecture, this covers the
material of chapter 2.
43Relational Database Model
- Most popular logical data model
- Relations (also called tables) represent both
Entity Sets and Relationship Sets. - Attributes form the columns of the table (column
and attribute are synonymous) - Each row represents a single entity or
relationship (called a row or tuple) - Each instance of an attribute takes values from a
specific set called the domain of the column (the
domain defines the type)
44Relational Database Model (cont)
- A relation schema is made up of the name and
attributes of a relation with their underlying
domains - A database schema is a set of all relation
schemas. - The notions of keys, primary keys, superkeys are
all as previously described
45Query Languages
- a language in which a user requests information
from the database - a higher level language than standard programming
languages - query languages may be procedural or
non-procedural - procedural languages specify a series of
operations on the database to generate the
desired result - non-procedural languages do not specify how the
information is generated - most commercial relational database systems offer
a query language that includes procedural and
non-procedural elements
46Relational Algebra
- procedural query language
- set of operators that map one or more relations
into another relation - closed algebraic system
- best feature - operations on operations
- form relational algebraic expressions
- two types of operations set-theoretic and
database specific
47Relational Algebra Operations
- database specific
- (horizontal) selection (?)
- (vertical) projection (?)
- join
- outer join
- semijoin
- division
- set operators
- union
- difference
- intersection
- cartesian (cross) product
48Example Relations
- EMP ename salary dept
- Gary 30K toy
- Shirley 35K candy
- Christos 37K shoe
- Robin 22K toy
- Uma 30K shoe
- Tim 12K (null)
- DEPT dept floor mgr
- candy 1 Irene
- toy 2 Jim
- men 2 John
- shoe 1 George
49Database Specific Operators
- (horizontal) selection (?)
- picks a subset of the rows
- (vertical) projection (?)
- picks a subset of the columns
- join
- creates a new relation (table) out of two
- equijoin (based upon equality of attributes)
- natural join (equijoin plus projection to
eliminate duplicated columns)
50Set Operators
- union
- both relations must be union-compatible -- same
degree and same domains - set difference
- both relations must be union-compatible as above
- intersection
- same deal
- cartesian (cross) product
- note similarity to join operation join can be
defined as a cross product followed by a
selection criteria
51More Operators
- rename (?)
- results of operations in the relational algebra
do not have names - it is often useful to be able to name such
results for use in further expressions later on - conceptually similar to an assignment operator in
most programming languages - semijoin
- very useful in practical implementation of large
queries - semijoin of R and S is equivalent to the join of
R and S projected onto the attributes of R.
52Still More Operators Outer Join
- outer join is an extension of the join operation
to deal with missing information - three types left outer join, right outer join,
and full outer join - left outer join computes the natural join, then
takes all tuples (rows) in the left relation that
did not match on the join attribute and includes
them in the result, with all attributes of the
right relation padded with null values - right outer join is the same, except non-matching
tuples in the right relation are included in the
result padded with null values - full outer join includes all non-matching tuples
of both relations appropriately padded - see examples in text, p108-109
53Still More Operators
- division
- R/S given R(A,B) and S(B), then a given tuple t
is in R/S if for all s in S there exists an r in
R such that r.Bs.B and t.Ar.A. - So tuple t with attribute t.A is in the result if
and only if R contained tuples (t.A, B1), (t.A,
B2), (t.A, B3), for every possible value Bi
contained in S. - Note that S must be defined on a subset of the
attributes of R for the operation to be
meaningful.
54A Short Interlude Integrity
- the preceding slides covered chapter three up to
section 3.3 - before attacking chapter 4 (SQL), were going to
make a brief excursion up to chapter 6, touching
sections 6.1 - 6.4 - Integrity constraints attempt to enforce data
consistency and prevent accidental damage to the
database during updates - Weve already seen two forms of integrity
constraints - key declarations (stipulating that certain
attributes form a candidate key for a given
entity set) - mapping form of a relationship (one-one,
one-many, many-many)
55Integrity Constraints
- Domain Constraints
- simplest form of integrity constraint
- type declarations are one such domain constraint
(e.g., integer, floating point, double-precision,
fixed length character string). - domains can be further restricted (e.g., check
clause in SQL can ensure that hourly wages are ?
4.00 dollars) - easily tested whenever a new data item is entered
into the database - extensions like date or currency can be easily
supported on a strongly typed programming
language - Null values can be useful for values to be filled
in later, but some attributes may need to be
specified as not Null (e.g., primary keys
cannot have a null value)
56Integrity Constraints (2)
- Key Constraints
- keys must have unique values
- primary key -- a candidate key declared primary
- unique key -- a candidate key
- foreign key -- a set of attributes that are a
primary key for some other relations - foreign keys are an important concept because we
need to treat foreign keys differently from other
attributes (for example, protecting their
uniqueness and insuring referential integrity)
even though they arent a primary key in the
current relation
57Referential Integrity
- We often want to be able to ensure that an
attribute value in a tuple of a relation appears
in at least one tuple of another relation. For
example - EMP(eno, ename, salary)
- DEPT(dno, dname, floor)
- WORKS-IN(eno, dno, hours)
- note that eno is a foreign key in WORKS-IN
- We want the following to be true
- ?eno(WORKS-IN) ? ?eno(EMP) (every eno is a real
employee) - ?dno(WORKS-IN) ? ?dno(DEPT) (every dno is a real
department) - SQL allows the declaration of domain/key/referenti
al integrity constraints with the clause check in
its DDL
58Referential Integrity SQL DDL Example
- Create table customer
- (cust-name char(20) not null,
- cust-street char(30),
- cust-city char(30),
- primary key (cust-name))
- Create table branch
- (branch-name char(15) not null,
- branch-city char(30),
- assets number,
- primary key (branch-name),
- check (assets ?0))
- Create table account
- (account-no char(10) not null,
- branch-name char(15),
- balance number,
- primary key (account-no),
- foreign key (branch-name) references branch,
- check (balance ?0))
- Create table depositor
- (cust-name char(20) not null,
- account-no char(10) not null,
- primary key (cust-name),
- foreign key (cust-name) references customer,
- foreign key (account-no) references account)
59Referential Integrity and Database Modifications
- Database modifications may violate referential
integrity - Insertion inserting a value into the referencing
relation that is not in the referenced relation - Deletion deleting the last example of a given
value in the referenced relation and leaving that
value in the referencing one - proper handling may lead to cascading deletions
- Update to the referencing relation (constraints
as Insertion) - Update to the referenced relation (constraints as
Deletion)
60Assertions
- An assertion is an arbitrary expression that the
database must always satisfy - e.g., student GPA 2.8, or sum(all-charges) credit-line
- Domain constraints and referential integrity
constraints are special forms of assertion that
are easy to test - SQL supports assertions as follows
- create assertion check
- When an assertion is made the system checks it
for validity. If it is validated, every future
modification of the database is checked against
the assertion and allowed only if it is not
violated. - This can be very expensive if assertions are
complex or numerous
61Triggers
- A trigger is a statement that the system executes
automatically as a side effect of an update to
the database. - A trigger has two parts
- condition under which it is executed
- actions to be taken if it is executed
- Example instead of having an assertion balance
?0 for a checking account, use a trigger on
negative balances that sets the balance to zero
and creates a new loan for the amount of the
overdraft - Triggers make the system reactive
- Triggers are also called active rules
- Like Assertions, Triggers can be very expensive.
62Trigger Example
- define trigger overdraft on update of account T
- (if new T.balance values
- (T.branch.name, T.account-number,
- T-customer-name, - new T.balance)
- update deposit S
- set S.balance 0
- where S.account-number T.account-number))
- (note SQL syntax given here is slightly
different from that in the text, p235)
63SQL (Structured Query Language)(Astrahan, Gray,
Lindsay, Selinger, )
- Most common and influential commercial query
language well established as the industry
standard query language for relational databases - Developed (as Sequel) at the IBM Research Lab
in San Jose in the early 70s - Four basic commands
- select
- insert
- delete
- update
- Result of each query is a relation
64SQL Example
- select e.name
- from emp e
- where e.age 30
- e is a tuple variable ranging over the emp
relation - a tuple variable followed by a . and an
attribute is an indexed tuple variable and
specifies the corresponding attribute of the
tuple, very similarly to in many programming
languages - what follows the keyword select is the target
list - what follows from is the tuple variable list and
consists of a list of relations and variable
names - what follows where is the qualification clause
an arbitrary boolean expression
65SQL
- Basic format of the select command
- select distinct target_list
- from tuple_variable_list
- where qualification
- order by target_list_subset
- Semantics
- evaluate qualification select the subset of the
cartesian product of the ranges of the tuple
variables that satisfy the qualification - evaluate target list eliminate columns that are
not in the target list - prepare the result as a relation with columns
according to the target list - if distinct is used, eliminate duplicate tuples
- if order by is used, sort the result accordingly
66SQL some example queries
- We will give a number of simple query examples
using the following relational schema - sailors(sid, sname, rating)
- boats(bid, bname, colour)
- reserve(sid, bid, date)
- (1) Find the names of sailors who have reserved
boat 2 - select s.sname
- from sailors s, reserve r
- where s.sidr.sid and r.bid2
67SQL example queries (2)
- (2) Find the names of sailors who have reserved a
red boat - select s.sname
- from sailors s, reserve r, boats b
- where s.sidr.sid and r.bidb.bid and
b.colourred - (3) Find the colours of all boats reserved by Pat
- select b.colour
- from sailors s, reserve r, boats b
- where s.snamePat and s.sidr.sid and
r.bidb.bid
68SQL example queries (3)
- (4) Find the names of sailors who have reserved
at least one boat - select s.sname
- from sailors s, reserve r
- where s.sidr.sid
- (5) Find the names of sailors who have reserved a
red or a green boat - select s.sname
- from sailors s, reserve r, boats b
- where s.sidr.sid and r.bidb.bid and
- (b.colourred or b.colourgreen)
69SQL example queries (4)
- (6) Find the names of sailors who have reserved a
red and a green boat - select s.sname
- from sailors s, reserve r, boats b, reserve r2,
boats b2 - where s.sidr.sid and r.bidb.bid and
b.colourred - and s.sidr2.sid and r2.bidb2.bid and
- b2.colourgreen)
- Note in the above query if sailor Pat has
reserved one green boat and two red ones, the
name Pat will appear twice in the results. To
avoid that, use the keyword distinct in the
select line, as in - select distinct s.sname
70SQL
- Basic format of the select command
- select distinct target_list
- from tuple_variable_list
- where qualification
- order by target_list_subset
- Simple query examples use this relational schema
- sailors(sid, sname, rating)
- boats(bid, bname, colour)
- reserve(sid, bid, date)
71SQL target list
- is an abbreviation for all attributes in the
from list - select
- from sailors s
- where order by s.rating
- Each item in the target list can be as general as
attribute_name expression, where the expression
is any arithmetic or string expression over
indexed tuple variables and constants. It can
also contain some built-in functions like sqrt,
sin, mod, etc. as well as aggregates (coming up
later)
72SQL target list expression example
- With rating an integer from 1 to 10, this query
gives a rating bonus to sailors who sailed two
different boats on the same day. - select s.sid, s.sname, ratings.rating2
- from sailors s, reserve r, reserve r2
- where s.sidr.sid and s.sidr2.sid and
r.dater2.date and - r.bid ! r2.bid
- Whats wrong with the above?
- What happens if s.rating 9 before this query?
- Domain constraints might take care of this, but
we need to be careful
73SQL
- Qualifications each item in a qualification
(where clause) can be as general as
expressionexpression - Example
- select name1 s1.sname, name2 s2.sname
- from sailors s1, sailors s2
- where 2s1.rating s2.rating-1
74SQL
- Further elaboration
- tuple variables can be implicit if the system can
figure out which relation each attribute belongs
to - table names can be used as tuple variables
- Example find names, ages, and departments of
employees who are over 40 and work on the first
floor. - select ename, age, emp.dname
- from emp, dept
- where age40 and floor1 and emp.dnamedept.dname
75SQL
- SQL provides set operators union, intersect, and
minus - Example find the names of employees who work in
the toy department and make at most 60K - (select ename
- from emp
- where dnametoy)
- minus
- (select ename
- from emp
- where sal60K)
76SQL
- Note that it is usually possible to phrase a
single query in multiple ways. The previous
query could have also been written - (select ename
- from emp
- where dnametoy)
- intersect
- (select ename
- from emp
- where sal?60K)
77SQL
- Or also (even simpler)
- select ename
- from emp
- where dnametoy and sal?60K
- Writing a query in different ways will usually
change how efficient the query is -- the above
query is very likely to be faster than the
example using intersects, and that one is likely
to be faster than the one using minus.
78SQL
- SQL also provides set operators contains (a set
being a superset of another) and exists (a set
not being empty). Both return Boolean results,
so may be negated (using not).
79SQL
- Example find the names of employees who manage
all the departments on the first floor. - select mgr
- from dept d1
- where (select d2.dname
- from dept d2
- where d1.mgrd2.mgr)
- contains
- (select dname
- from dept
- where floor1)
80SQL
- SQL allows nested queries using the keyword in
- Example find the names of employees who work on
the first floor. - select ename
- from emp
- where dname in
- (select dname
- from dept
- where floor 1)
- The same query in flat form is
- select dname
- from emp, dept
- where emp.dnamedept.dname and floor1
81SQL
- The connective in tests for set membership.
Similar connectives are - not in (set non membership)
- op any (op relationship with some tuple in the
set) - op all (op relationship with all tuples in the
set) - where op is one of (, !, , )
- Example find the names of employees who make
more than everybody on the first floor. - select ename
- from emp
- where sal all
- (select sal
- from emp, dept
- where emp.dnamedept.dname and floor 1)
82SQL
- Scoping of variables works exactly as in Pascal
or C - Example find the names of students who take a
course from their advisor. - select sname
- from student
- where s in
- (select s
- from enroll
- where c in
- (select c
- from class
- where profstudent.advisor))
83Recap SQL
- Four basic commands
- select
- insert
- delete
- update
84SQL Insert
- Insert command format
- insert into relation_name values (value_list)
- or
- insert into relation_name select_statement
- Semantics of insert
- format one add the tuple corresponding to
value_list into relation_name - format two execute the select statement, then
add all the resulting tuples into relation_name - Example
- insert into student values (1, Carey, CS,
Stonebraker)
85SQL Insert
- Example relation register(S, name, paid)
- in which registered students are recorded. After
the end of registration week, we execute - insert into student
- select r.s, r.name
- from register r
- where r.paidyes
86SQL Delete
- Delete command format
- delete relation_name where qualification
- Semantics of delete execute the corresponding
select command - select full_target_list (or )
- from relation_name
- where qualification
- and then remove the resulting tuples from
relation_name
87SQL Delete
- Example with the following schema
- student(s, name, major, advisor)
- enroll(s, c, grade)
- course(c, dept)
- The following command expels CS majors who
received a grade of less than 2.5 in a CS course - delete student
- where majorCS and s in
- (select s
- from enroll, course
- where enroll.sstudent.s and grade
- and enroll.ccourse.c and deptCS)
88SQL Update
- Update format
- update relation_name
- set target_list
- where qualification
- Semantics of update it is equivalent to
executing - insert into del_temp
- select
- from relation_name
- where qualification
89SQL Update
- Semantics of update (cont) then executing
- insert into app_temp
- select ext_target_list
- from relation_name
- where qualification
- delete the tuples in del_temp from relation_name
- add the tuples in app_temp to relation_name
Ext_target_list is identical to target_list in
the original update command, but augmented with
tuple_variable.attribute_name for all attributes
of the range of tuple_variable that dont appear
in target_list.
90SQL Update
- Example give a 10 grade raise to every CS major
in CS564 - update enroll
- set grade1.1grade
- where cCS564 and s in
- (select s
- from student
- where majorCS)
91SQL Update
- Which is equivalent to
- insert into del_temp
- select s, c, grade
- from enroll
- where cCS564 and s in
- (select s
- from student
- where majorCS)
- insert into app_temp
- select s, c, grade1.1grade
- from enroll
- where cCS564 and s in
- (select s
- from student
- where majorCS)
92SQL Aggregates
- Aggregate functions are functions that take a
collection of values as input and return a single
value. SQL supports five built-in aggregate
functions - average avg
- minimum min
- maximum max
- total sum
- cardinality count
- using distinct to aggregate only unique values is
often important with avg, sum, and count
93SQL Aggregates
- Example find the number of students
- select num_of_students count(s)
- from student
- why do we not need to use distinct in this
example? - Example find the number of employee records
- select count ()
- from emp
- if an employee appears more than once in the emp
relation, for example if he had switched jobs or
had two jobs, then this command would count that
employee once for each record
94SQL Aggregates
- Qualified Aggregates
- Example find the average age of employees in the
toy department - select avg(age)
- from emp
- where dnametoy
95SQL Group By clause
- Group aggregates groups of tuples are computed
using the group by clause - the attributes given in the clause are used to
form groups - typles with the same value on all attributes in
the clause are placed in one group - Example in each department, find the minimum age
of employees who make more than 50K - select dname, min(age)
- from emp
- where sal50K
- group by dname
96SQL Having clause
- Sometimes it is useful to state a condition that
applies to groups in group by rather than to
tuples. We do that in SQL with the having
clause. SQL applies predicates of having after
groups have been formed. - Example find the average salary for employees
under 30 for each department having more than 10
such employees - select dname, avg(sal)
- from emp
- where age
- group by dname
- having count()10
97SQL Multiple Group Bys
- Example using relation emp(ss, ename, dept,
cat, sal) - Count the employees and average monthly salary
for each employee category in each department - select dept, cat, count(), avg(sal)/12
- from emp
- group by dept, cat
98SQL Multiple Group Bys
- Select from emp
- group by cat
- Select from emp
- group by dept
99SQL Multiple Group By
- Select from emp
- group by dname, cat
- note that some dname/cat groups are empty.
100SQL Examples on Having
- Find the average salary of employees under 30 for
each department with more than 10 such employees - select dname, avg(sal)
- from emp
- where age
- group by dname (group by department)
- having 10 10)
101SQL Examples on Having
- Find the average salary of employees under 30 for
each department with more than 10 employees - select e.dname, avg(e.sal)
- from emp e
- where e.age
- group by e.dname (group by department)
- having 10
- (select count(ee.ename) (number of employees
in group) - from emp ee
- where ee.dnamee.dname) ( from the same dept
as e) - (why is this query different from the previous
one?)
102SQL Examples on Having
- Find categories of employees whose average salary
exceeds that of programmers - select cat, avg(sal)
- from emp
- group by cat
- having avg(sal) (select avg(sal)
- from emp
- where catprogrammer)
103SQL Examples on Having
- Find all departments with at least two clerks
- select dname
- from emp
- where jobclerk
- group by dname
- having count() 2
104SQL Examples
- Find the names of sailors with the highest rating
- select sname
- from sailors
- where rating (select max(rating)
- from sailors)
105SQL Examples
- For each boat, find the number of sailors of
rating 7 who have reserved this boat - select bid, bname, count(s.sid)
- from sailors s, boats b, reserve r
- where s.sidr.sid and r.bidb.bid and rating7
- group by b.bid
106SQL Examples
- For each red boat, find the number of sailors who
have reserved this boat - select bid, bname, count(s.sid)
- from sailors s, boats b, reserve r
- where s.sidr.sid and r.bidb.bid
- group by b.bid
- having colourred
107SQL Examples
- Difference between the last two queries?
- First one gave a qualification on the tuples
- (take all tuples of the multijoin
- discard tuples that do not fulfill ratings7
- then group them by boat id
- then find the cardinality of each group)
- Second one gave a qualification for the groups
- (take all tuples of the multijoin
- group them by boat id
- discard groups representing boats that are
non-red - find the cardinality of remaining groups)
108And Now, For SomethingCompletely Different...
- The recent SQL material largely covers chapter 4,
at least sections 4.1 through 4.6 and some of
4.9. - Earlier we examined Relational Algebra, covering
sections 3.1 through 3.3 - Now we leave chapter 4 and head back to examine
sections 3.6 and 3.7, covering Relational Calculi - based upon predicate calculus
- non-procedural query languages (descriptive
rather than prescriptive) - we will examine two relational calculi tuple
calculus and domain calculus
109Tuple Calculus
- Query t P(t) P is a predicate associated
with some relation R - t is a tuple variable ranging over the
relation R - tA is the value of attribute A in tuple t
- students in CMSC 424
- t t ? enroll ? tcourse CMSC424
- students in CMSC 424 conforming with the
CMSC-420 prerequisite - t t ? enroll ? ? s ? enroll ? tcourse
CMSC424 ? - scourse CMSC420 ? tss sss
110Tuple Calculus
- Quantifiers and free variables
- ?, ? quantify the variables following them,
binding them to some value. (in the previous
slide, s was bound by ?) - A tuple variable that is not quantified by ? or ?
is called a free variable. (in the previous
slide, t was a free variable) - Atoms
- R(t) where t is a tuple variable
- tx ? sy where t,s are tuple variables and
- ? ? ?, ?, ?, ?, ?, ?
111Tuple Calculus
- Formulas
- an Atom is a Formula
- If P and Q are Formulas, then so are (P), ?P,
P?Q, P?Q, and P?Q - If P(t) is a Formula, then so are ?t P(t) and ?t
P(t) -
- Equivalences
- ?(P ? Q) ? ?P ? ? Q
- ?(P ? Q) ? ?P ? ? Q
- ?t P(t) ? ? (?t (? P(t)))
- ? t P(t) ? ? (? t (? P(t)))
112Tuple Calculus
- Safety
- Math is too powerful we can easily phrase
expressions that describe infinite sets - t t ? enroll
- These expressions are called unsafe
- When we are dealing with finite sets, unsafe
expressions happen in expressions that involve
negation (?) - We can avoid this problem by using an entirely
positive (non-negated) scope as the first operand
of any conjunction where we use negation. The
first operand establishes the scope and the
second one filters the established scope. - t t ? enroll ? tcourse ? CMSC-420
113Domain Calculus
- Another form of relational calculus
- Uses domain variables that take values from an
attributes domain, rather than values
representing an entire tuple - Closely related to tuple calculus
- Domain Calculus serves as the theoretical basis
for the query language QBE, just as the
relational algebra we examined earlier forms the
basis for SQL - Expressions are of the form
- P( x1, x2, x3, ..., xn)
114Domain Calculus
- Atoms
- ? R
- x ? y where x,y are domain variables and
- ? ? ?, ?, ?, ?, ?, ?
- x ? c where c is a constant
- Formulas
- an atom is a formula
- If P and Q are formulas, then so are (P), ?P,
P?Q, P?Q, and P?Q - If P(x) is a formula and x is a domain variable,
then ?x P(x) and ?x P(x) are also formulas
115Domain Calculus
- Queries are of the form
- P( x1, x2, x3, ..., xn)
- Examples
- Enroll(ss, course,
semester) - Enroll(x, y, z) ? y CMSC-424
116Reductions of Relational Algebra and Calculi
- Relational Algebra (sections 3.2-3.5), Tuple
Calculus (section 3.6), and Domain Calculus
(section 3.7) can be reduced to each other they
have equivalent expressive power. For every
expression in one, we can compute an equivalent
expression in the others.
117Functional Dependencies
- Important concept in differentiating good
database designs from bad ones - FD is a generalization of the notion of keys
- An FD is a set of attributes whose values
uniquely determine the values of the remaining
attributes. - Emp(eno, ename, sal) key FDs eno ename
- Dept(dno, dname, floor) eno sal
- Works-in(eno, dno, hours) (eno,dno) hours
- dno dname
- dno floor
118Functional Dependencies
- If ? ? R and ? ? R, then ? ? holds in the
extension r(R) of R iff for any pair t1 and t2
tuples of r(R) such that t1(?)t2(?) , then it is
also true that t1(?) t2(?) - We can use FDs as
- constraints we wish to enforce (e.g., keys)
- for checking to see if the FDs are satisfied
within the database - R( A B C D)
- 1 1 1 1 A B satisfied? no
- 1 2 1 2 A C satisfied? yes
- 2 2 2 2 C A satisfied? no
- 2 3 2 3 AB D satisfied? yes
- 3 3 2 4
119Functional Dependencies
- Trivial dependencies ? ?
- ? ? if ? ? ?
- Closure
- we need to consider all FDs
- some are implied by others e.g., FDs are
transitive if AB and BC, then AC - Given F set of FDs, we want to find F (the
closure of all FDs logically implied by F)
120Armstrongs Axioms
- Reflexivity if ? ? ? then ? ?
- Augmentation if ? ? then ?? ? ??
- Transitivity if ? ? and ? ? then ? ?
- Armstrongs Axiom