Title: CS 542
1CS 542 Database Management Systems Summer 2004
- Kajal Claypool
- kajal_at_cs.wpi.edu
2Administrative Stuff
- Class web page
- http//www.cs.wpi.edu/kajal/courses/cs542-S04/ind
ex.html - Collaboration Policy
- Homework assignments are individual efforts
- Copying from Web, friends, old assignments, any
source gt cheating! - Copying -gt taking anything verbatim, finding
the main idea and using it. - 1st cheating offense -gt F for assignment
- 2nd cheating offense -gt F for the course
3Introduction to Database Systems
- An Overview of DBMS
- Chapter 1
4Outline
- Purpose of Database Systems
- View of Data
- Data Models
- Data Definition Language
- Data Manipulation Language
- Transaction Management
- Storage Management
- Database Administrator
- Database users
- Overall System Architecture
5Why study Databases?
- Databases are everywhere
- Banking all transactions
- Airlines reservations, schedules
- Universities registration, grades
- Sales customers, products, purchases
- Manufacturing production, inventory, orders,
supply chain - Human resources employee records, salaries, tax
deductions - DBMS brings together many different CS areas
- OS, Algorithms, AI, logic, languages, multimedia
- DBAs make a lot of money!
6What is a DBMS?
- DBMS Database Management System
- Collection of interrelated data
- Set of sophisticated programs to access that data
- Database Collection of data
- Usually information relevant to an enterprise
- Models the real world
- Entities (students and courses)
- Relationship between entities (a student takes a
course)
7Why a DBMS?
- In the early days, database applications were
built on top of file systems - Drawbacks of using file systems to store data
- Data redundancy and inconsistency
- Format of one file may be different from format
of another file! - Difficulty in accessing data
- Need special programs to now generate a list of
all customers in a particular postal-code area - Data isolation
- Because data may be scattered over various files,
it becomes hard for the programmer to grab the
information from all locations
8Problems (contd.)
- Integrity problems
- Integrity constraints (e.g. account balance gt 0)
become part of program code - Hard to add new constraints or change existing
ones - Example Bank decides that savings balance must
always be greater than 50. - Atomicity
- Certain actions must be treated as one. This is
hard to do if data is scattered over many files. - Example if you withdraw money, the bank must
withdraw and then update the balance as one
operation - Concurrent Access
- Often for efficiency you may allow many users to
access the files at the same time. But you must
guarantee that my money does not go into your
account! This is hard with files. - Security
- Not every user should be able to see everything!
Hard to control with files.
9A DBMS
- Database systems offer solutions to all the above
problems - provide uniform structures for storage of
information - provide mechanisms for manipulating the
information - ensure the safety of the information stored
despite system crashes or attempts at
unauthorized access - avoid anomalous results if data is to be shared
among many users - Their Goal do all of this in a way that is both
convenient and efficient.
10Data Storage
11Information Storage
- Remember convenience and efficient are two
requirements for a DBMS system - To efficiently retrieve data , data is often
stored as complex data structures. However, these
complex data structures adversely effect the
requirement convenience of database system
users - Solution Data Abstraction
- Simplify the users interaction with the DBMS
12Data Abstraction
- Physical Level
- Lowest level
- Describes how data (a record) is actually
stored - Logical Level
- Next higher level
- Describes what data are stored in the database,
and what relationships exist between those data - Logical level user does not need to be aware of
the complexity of the physical level structures - View Level
- Highest level of abstraction
- Only part of the entire database is visible,
I.e., information that a user may need to see - Also provides a security mechanism
- Example a teller in a bank does not need to see
the salaries of all bank employees!
13View of Data
An architecture for a database system
14Data Abstraction (contd.)
- Example a record in Pascal
- type customer record name string street
string city integerend - Physical level
- Stored as contiguous blocks
- Logical level
- As shown above
- View level
- A subset
- type customer record name string
- end
15Instances and Schemas
- Concept is similar to types and variables in
programming languages - Schema the logical structure of the database
- e.g., the database consists of information about
a set of customers and accounts and the
relationship between them - Analogous to type information of a variable in a
program - Physical schema database design at the physical
level - Logical schema database design at the logical
level - Instance the actual content of the database at
a particular point in time - Analogous to the value of a variable
- Physical Data Independence the ability to
modify the physical schema without changing the
logical schema - Applications depend on the logical schema
- In general, the interfaces between the various
levels and components should be well defined so
that changes in some parts do not seriously
influence others.
16Data Models
- A collection of concepts for describing
- data
- data relationships
- data semantics
- data constraints
- Examples of Data Models
- Entity-Relationship model
- Relational model
- Object-oriented model
- Semi-structured data models (XML)
- Older models network model and hierarchical model
17Entity-Relationship Model
- E-R model of real world
- Entities (objects)
- E.g. customers, accounts, bank branch
- Relationships between entities
- E.g. Account A-101 is held by customer Johnson
- Relationship set depositor associates customers
with accounts - Widely used for database design
- Database design in E-R model usually converted to
design in the relational model (coming up next)
which is used for storage and processing
18Relational Model
Attributes
- Example of tabular data in the relational model
customer- street
customer- city
account- number
customer- name
Customer-id
Johnson Smith Johnson Jones Smith
192-83-7465 019-28-3746 192-83-7465 321-12-3123
019-28-3746
Alma North Alma Main North
Palo Alto Rye Palo Alto Harrison Rye
A-101 A-215 A-201 A-217 A-201
19A Sample Relational Database
20Data Manipulation
21Data Definition Language (DDL)
- Specification notation for defining the database
schema - E.g. create table account (
account-number char(10), balance
integer) - DDL compiler generates a set of tables stored in
a data dictionary - Data dictionary contains metadata (i.e., data
about data) - e.g. schema of a table is metadata
- Data storage and definition language
- language in which the storage structure and
access methods used by the database system are
specified - Usually an extension of the data definition
language
22Data Manipulation Language (DML)
- Language for accessing and manipulating the data
organized by the appropriate data model - DML also known as query language
- It can
- Retrieve information stored in the database
- Insert new information
- Delete existing information
- Modify existing information
- Two classes of languages
- Procedural
- user specifies what data is required and how to
get those data - Nonprocedural (or Declarative)
- user specifies what data is required without
specifying how to get those data - The portion of DML that is involved with
information retrieval is called a query language - SQL is the most widely used query language
- The DML component of SQL is nonprocedural
23SQL
- SQL widely used non-procedural language
- E.g. find the name of the customer with
customer-id 192-83-7465 select
customer.customer-name from
customer where customer.customer-id
192-83-7465 - E.g. find the balances of all accounts held by
the customer with customer-id 192-83-7465 select
account.balance from depositor,
account where depositor.customer-id
192-83-7465 and
depositor.account-number account.account-number - Application programs generally access databases
through one of - Language extensions to allow embedded SQL
- Application program interface (e.g. ODBC/JDBC)
which allow SQL queries to be sent to a database
24Safety
25Database Users
- Users are differentiated by the way they expect
to interact with the system - Application programmers
- interact with system through DML calls
- Sophisticated users
- form requests in a database query language
- Specialized users
- write specialized database applications that do
not fit into the traditional data processing
framework - Naïve users
- invoke one of the permanent application programs
that have been written previously - E.g. people accessing database over the web, bank
tellers, clerical staff
26Database Administrator
- Coordinates all the activities of the database
system the database administrator has a good
understanding of the enterprises information
resources and needs. - Database administrator's duties include
- Schema definition
- Storage structure and access method definition
- Schema and physical organization modification
- Granting user authority to access the database
- Specifying integrity constraints
- Acting as liaison with users
- Monitoring performance and responding to changes
in requirements
27Safe Sharing
28Concurrency Control
- Concurrent execution of user programs is
essential for good DBMS performance. - Because disk accesses are frequent, and
relatively slow, it is important to keep the cpu
humming by working on several user programs
concurrently - Interleaving actions of different user programs
can lead to inconsistency e.g., check is cleared
while account balance is being computed - DBMS ensures that such problems do not arise.
Users can pretend that they are using a
single-user system.
29Transaction An Execution of a DB Program
- Key concept is transaction
- An atomic sequence of database actions
(read/write) - Each transaction, executed completely, must leave
the database in a consistent state if the DB is
consistent when the transaction begins. - Users can specify some simple integrity
constraints on the data, and the DBMS will
enforce these constraints - Beyond this, the DBMS does not really understand
the semantics of your data (eg. It does not
understand how the interest is calculated on the
account balance) - Thus, ensuring that the transaction (run alone)
preserves consistency is ultimately the users
responsibility!
30Scheduling Concurrent Transactions
- DBMS ensures that the execution of T1, T2.Tn
is equivalent to some serial execution of T1,
T2Tn - Before reading/writing an object, a transaction
requests a lock on the object, and waits till the
DBMS gives it the lock. All locks are released at
the end of the transaction. - This is called Strict 2PL locking protocol
- Main Idea
- If an action of Ti (say writing X) affects Tj
(which perhaps reads X), one of them say Ti will
obtain the lock first and Tj is forced to wait
until Ti completes this effectively orders the
transactions - What if Tj already has a lock on Y and Ti
requests Y?
31Ensuring Atomicity
- DBMS ensures atomicity, all or nothing, even if
the system crashes in the middle of a Xact. - Idea
- Keep a log (history) of all actions carried out
by the DBMS while executing a set of Xacts - Before a change is made to the database, the
corresponding log entry is made and forced into a
safe location - After a crash, the effects of partially executed
Xacts are undone using the log.
32Overall System Structure
33Summary
- DBMS used to maintain and query large datasets
- Benefits include recovery from system crashes,
concurrent access, quick application development,
data integrity and security - Levels of abstraction give data independence
- A DBMS typically has a layered architecture
- DBAs are one of the highest paid computer jobs
and still in high demand! - DBMS RD is one of biggest research areas in
industry (Microsoft, IBM, Oracle) and in academia!
34Where to next?
- Storage and Manipulation of Data
- Storage at the Logical level
- Chapter 2
- Manipulation of Data
- Chapter 3