Relational Model - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Relational Model

Description:

E.g. exp-grade 'A', name = 'Jill', STUDENT.sid=Takes.sid ... Jill. 1. name. sid. Cornell. UPenn. school. SCHOOL. What if the attribute of SCHOOL was called 'name' ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 35
Provided by: zack4
Category:
Tags: jill | model | relational

less

Transcript and Presenter's Notes

Title: Relational Model


1
Relational Model Algebra
  • Susan B. Davidson
  • University of Pennsylvania
  • CIS330 Database Management Systems
  • September 9, 2008

Some slide content courtesy of Zachary Ives
Raghu Ramakrishnan
2
Thinking Back to Last Time
  • There are a variety of ways of representing data,
    each with trade-offs
  • Free text
  • Classes and subclasses
  • Shapes/points in space
  • Objects with properties
  • In general, our emphasis will be on the last item

3
The Relational Data Model (1970)
  • Originally proposed by Codd
  • Separates physical implementation from logical
  • Models the data independently from how it will be
    used (accessed, printed, etc.)
  • Describes the data minimally and mathematically
  • A relation describes an association between data
    items tuples with attributes
  • Uses standard mathematical (logical) operations
    over the data relational algebra or relational
    calculus

4
Why Did It Take So Many Years to Implement
Relational Databases?
  • Codds original work 1969-70
  • Earliest relational database research 1976
  • Why the gap?
  • You could do the same thing in other ways
  • Nobody wants to write math formulas
  • Why would I turn my data into tables?
  • It wont perform well

5
Getting More Concrete Buildinga Database and
Application
  • Start with a conceptual model
  • On paper using certain techniques well discuss
    next week
  • We ignore low-level details focus on logical
    representation
  • Design implement schema
  • Design and codify (in SQL) the relations/tables
  • Do physical layout indexes, etc.
  • Import the data
  • Write applications using DBMS and other tools
  • Many of the hard problems are taken care of by
    other people (DBMS, API writers, library authors,
    web server, etc.)

6
Conceptual Design for CIS Student Course Survey
Whos taking what, and what grade do they
expect?
PROFESSOR
fid
name
This design is independent ofthe final form of
the report!
Teaches
Takes
STUDENT
COURSE
cid
name
semester
sid
name
exp-grade
7
Example Schema
STUDENT
COURSE
Takes
  • Our focus now relational schema set of tables
  • Can have other kinds of schemas XML, object,

PROFESSOR
Teaches
8
Some Terminology
  • Columns of a relation are called attributes or
    fields
  • The number of these columns is the arity of the
    relation
  • The rows of a relation are called tuples
  • Each attribute has values taken from a domain,
    e.g., subj has domain string
  • Theoretically a relation is a set of tuples no
    tuple can occur more than once
  • Real systems may allow duplicates for efficiency
    or other reasons well ignore this for now
  • Objects and XML may also have the same content
    with different identity

9
Describing Relations
  • A schema can be represented many ways
  • To the DBMS, use data definition language (DDL)
    like programming language type definitions
  • We will use the notation relation(attributedomain
    ,)

STUDENT(sidint, namestring) Takes(sidint,
exp-gradechar2, cidstring) COURSE(cidstring,
subjstring, semchar3) Teaches(fidint,
cidstring) PROFESSOR(fidint, namestring)
10
More on Attribute Domains
  • Relational DBMSs have very limited built-in
    domains either tables or scalar attributes
    int, string, byte sequence, date, etc.
  • But more generally
  • We can have nested relations
  • Object-oriented, object-relational systems allow
    complex, user-defined domains lists, classes,
    etc.
  • XML systems allow for XML trees (or lists of
    trees) that follow certain structural constraints
  • Database people, when they are discussing design,
    often assume domains are evident to the
    readerSTUDENT(sid, name)

11
Integrity Constraints
  • Domains and schemas are one form of constraint on
    a valid data instance
  • Other important constraints include
  • Key constraints
  • Subset of fields that uniquely identifies a
    tuple, and for which no subset of the key has
    this property
  • May have several candidate keys one is chosen as
    the primary key
  • A superkey is a subset of fields that includes a
    key
  • Inclusion dependencies (a.k.a. referential
    integrity constraints)
  • A field in one relation may refer to a tuple in
    another relation by including its key
  • The referenced tuple must exist in the other
    relation for the database instance to be valid

12
SQL Structured Query Language
  • The standard language for relational data
  • Invented by folks at IBM, esp. Don Chamberlin
  • Actually not a great language
  • Beat a more elegant competing standard, QUEL,
    from Berkeley
  • Separated into a Data Manipulation Language (DML)
    Data Definition Language (DDL)
  • DML based on relational algebra calculus, which
    we discuss this week

13
Table DefinitionSQL-92 DDL and Constraints
CREATE TABLE Takes (sid INTEGER, exp-grade
CHAR(2), cid STRING(8), PRIMARY KEY (sid,
cid), FOREIGN KEY (sid) REFERENCES
STUDENT, FOREIGN KEY (cid) REFERENCES COURSE
)
CREATE TABLE STUDENT (sid INTEGER, name
CHAR(20), PRIMARY KEY sid )
14
Example Data Instance
STUDENT
COURSE
Takes
PROFESSOR
Teaches
15
Codds Relational Algebra
  • A set of mathematical operators that compose,
    modify, and combine tuples within different
    relations
  • Relational algebra operations operate on
    relations and produce relations (closure)
  • f Relation ? Relation f Relation x Relation ?
    Relation

16
Codds Logical Operations The Relational Algebra
  • Six basic operations
  • Projection ?? (R)
  • Selection ?? (R)
  • Union R1 R2
  • Difference R1 R2
  • Product R1 R2
  • (Rename) ????? (R)
  • And some other useful ones
  • Join R1?? R2
  • Semijoin R1sj? R2
  • Intersection R1 n R2
  • Division R1 R2

17
Data Instance for Operator Examples
STUDENT
COURSE
Takes
PROFESSOR
Teaches
18
Projection, ??
  • Given a list of column names ? and a relation R,
    ?? (R) extracts the columns in ? from the
    relation. Example

??sid,exp-grade?Takes
Takes
Note duplicate elimination. In contrast, SQL
returns by default a multiset and duplicates must
be explicitly removed. Why?
19
Selection, ??
  • Selection ???R takes a relation R and extracts
    those rows from it that satisfy the condition ?.
    Example

??exp-gradeA Æ sid1Takes
Takes
Can the result have duplicates?
20
What can go in a condition?
  • Conditions are built up from boolean-valued
    operations on the field names. E.g. exp-gradegt
    A, name Jill, STUDENT.sidTakes.sid
  • Predicates constructed from these using logical
    and, or, and not
  • We don't lose any expressive power if we don't
    have complex predicates in the language, but they
    are convenient and useful in practice.

21
Product X
  • Join is a generic term for a variety of
    operations that connect two relations. The basic
    operation is the product, Rx S, which
    concatenates every tuple in R with every tuple in
    S. Example

STUDENT x SCHOOL
STUDENT
SCHOOL
What if the attribute of SCHOOL was called name?
22
Join, ?? A Combination of Productand Selection
  • Products are hardly ever used alone they are
    typically use in conjunction with a selection.
    Example
  • STUDENT.sidTakes.sid (STUDENT x Takes)
  • STUDENT ?STUDENT.sidTakes.sid Takes

23
Union ?
  • If two relations have the same structure
    (Database terminology are union-compatible.
    Programming language terminology have the same
    type) we can perform set operations.

STUDENT POSTDOC
STUDENT
POSTDOC
24
Difference
  • Another set operator. Example

STUDENT
POSTDOC
STUDENT POSTDOC
25
Rename, ????
  • The rename operator can be expressed several
    ways
  • The book has a very odd definition thats not
    algebraic
  • An alternate definition
  • ????(x) Takes the relation with attribute list
    (schema) ? Returns a relation with the
    attribute list ?
  • Rename is useful when you join relations with the
    same column names but with different meanings.

26
Natural Join, ?
  • The most common join to do is an equality join of
    two relations on commonly named fields, and to
    leave one copy of those fields in the resulting
    relation. Example
  • STUDENT ? Takes
  • ?sid1?sid(??sid1,name, exp-grade, cid(STUDENT
    ?STUDENT.sidTakes.sid Takes))

What if all the field names are the same in the
two relations? What if the field names are all
disjoint?
27
Mini-Quiz
  • This completes the basic operations of the
    relational algebra. We shall soon find out in
    what sense this is an adequate set of operations.
    Try writing queries for these
  • The sids of students named Bob
  • The names of students expecting an A
  • The names of students in Amir Roths 501 class
  • The sids and names of students not enrolled in
    any class

28
Deriving Intersection
  • Intersection as with set operations, derivable
    from difference

A n B
(AB) (A B) (B A)
A-B
B-A
Can you think of a simpler expression?
A B
29
Division
  • A somewhat messy operation that can be expressed
    in terms of the operations we have already
    defined
  • Used to express queries such as The fid's of
    faculty who have taught all subjects
  • Paraphrased The fids of professors for which
    there does not exist a subject that they havent
    taught

30
Division Using Our Existing Operators
  • All possible teaching assignments, Allpairs
  • NotTaught, all (fid,subj) pairs for which
    professor fid has not taught subj
  • Answer is all faculty not in NotTaught

?fid,subj (PROFESSOR ?subj(COURSE))
Allpairs - ?fid,subj(Teaches ? COURSE)
  • ?fid(PROFESSOR) - ?fid(NotTaught)
  • ?fid(PROFESSOR) - ?fid(

?fid,subj (PROFESSOR ?subj(COURSE)) -
?fid,subj(Teaches ? COURSE))
31
Division R1 ? R2
  • Requirement schema(R1) ¾ schema(R2)
  • Result schema schema(R1) schema(R2)
  • Professors who have taught all courses
  • What about Courses that have been taught by all
    faculty?

?fid,subj(Teaches ? COURSE) ? ?subj(COURSE)
32
The Big Picture SQL to Algebra toQuery Plan to
Web Page
Web Server / UI / etc
Query Plan anoperator tree
Execution Engine
Optimizer
Storage Subsystem
SELECT FROM STUDENT, Takes, COURSE
WHERE STUDENT.sid Takes.sID AND
Takes.cID cid
33
Hint of Future Things OptimizationIs Based on
Algebraic Equivalences
  • Relational algebra has laws of commutativity,
    associativity, etc. that imply certain
    expressions are equivalent in semantics
  • They may be different in cost of evaluation!

?c Ç d(R) ?c(R) ?d(R)
?c (R1 R2) R1 ?c R2
?c Æ d (R) ?c (?d (R))
  • Query optimization finds the most efficient
    representation to evaluate (or one thats not bad)

34
Next Time An Equivalent, ButVery Different,
Formalism
  • Codd invented a relational calculus that he
    proved was equivalent in expressiveness
  • Based on a subset of first-order logic
    declarative, without an implicit order of
    evaluation
  • More convenient for describing certain things,
    and for certain kinds of manipulations
Write a Comment
User Comments (0)
About PowerShow.com