Title: C20.0046: Database Management Systems Lecture
1C20.0046 Database Management SystemsLecture 1
- Matthew P. Johnson
- Stern School of Business, NYU
- Spring, 2004
2Personnel
- Instructor Matthew P. Johnson
- KMEC 8-176, mjohnson_at_stern.nyu.edu
- Office hours KMC 8-176, Th 1215-215
- please visit!
- Tutor/TF/grader Matthew P. Johnson!
3Communications
- Web page http//www.columbia.edu/mpj9/dbms
- syllabus
- course policies
- may move in the future
- Blackboard web site
- Some materials will be available here
- Discussion board
- send general-interest messages here to benefit
all! - Go to http//sternclasses.nyu.edu
- Click on C20.0046
4Acknowledgements
- Thanks to Ramesh, Ullman, et al., Raghu and
Johannes, Dan Suciu, Arthur Keller, David Kuijt
for course materials - See classpage for other related, antecedent DBMS
courses
5What Is a Database?
- A very large, integrated collection of data.
- Models real-world enterprise.
- Entities (e.g., students, courses, instructors,
TAs) - Relationships (e.g., Joe is taking C20.0046)
- George is currently taking C20.0046
- Dick is currently teaching C20.0046
- Condi is currently TA-ing C20.0046 but took it
last semester - A Database Management System (DBMS) is a software
package designed to store and manage databases.
6Databases are everywhere
- Example Ordering a pizza
- Databases involved?
- Pizza Huts DB
- stores previous orders by customer
- stores previous credit cards used
- Credit card records
- huge databases of (attempted) purchases
- location, date, amount, parties
- phone companys records
- Local Usage Details (Pull his LUDs, Lenny.)
- Caller ID
- ensures reported address matches destination
7Your wallet is full of DB records
- Drivers license
- Credit cards
- NYUCard
- Medical insurance card
- Social security card
- Gym membership
- Money (serial numbers)
- Maybe even photos
8Databases are everywhere
- Q Websites backed by DBMSs?
- retail Amazon, etc.
- data-mining Page You Made
- search engines Google, etc.
- directories Internic, etc.
- searchable DBs IMDB, tvguide.com, etc.
- Q Non-web examples of DBMSs?
- criminal/terrorist TIA
- airline bookings
- NYPDs CompStat
- all serious crime stats by precinct
- Retailers Wal-Mart, etc.
- when to re-order, purchase patterns, data-mining
- Genomics!
9Example of a Traditional DB App
- Suppose we are building a system
- to store the information about
- checking accounts
- savings accounts
- account holders
- state of each of each persons accounts
10Can we do it without a DBMS?
- Sure we can! Start by storing the data in files
- checking.txt savings.txt
customers.txt - Now write C or Java programs to implement
specific tasks
11Doing it without a DBMS...
- Transfer 100 from Georges savings to checking
Write a C program to do the following
Read savings.txt Findupdate the record
George balance - 100 Write savings.txt Read
checking.txt Findupdate the record
George balance 100 Write checking.txt
12Problems without an DBMS...
- 1. System crashes
- Q What is the problem ?
- A George lost his 100
- Same problem even if reordered
- 2. Simultaneous access by many users
- George and Dick visit ATMs at same
- Lock checking.txt before each use what is the
problem?
Read savings.txt Findupdate the rec
George. Write savings.txt Read
checking.txt Findupdate the rec George Write
checking.txt
CRASH !
13Problems without an DBMS...
- 3. Large data sets (say 50GB)
- Why is this a problem?
- No indices
- Finding George in huge flatfile is expensive
- Modifications intractable without better data
structures - George ? Georgie is very expensive
- Deletions are very expensive
14Problems without an DBMS...
- 5. Security?
- File system may be insecure
- File system security may be coarse
- 6. Application programming interface (API)?
- suppose need other apps to access DB
- 7. How to interact with other DBMSs?
15General problems to solve
- In building our own system, many Qs arise
- how do we store the data? (file organization,
etc.) - how do we query the data? (write programs)
- make sure that updates dont mess things up?
- leave the DB consistent
- provide different views on the data?
- e.g., ATM users view v. bank tellers view
- how do we deal with crashes?
- Too hard! Go buy a DBMS!
- Q How does a DBMS solve these problems?
- A See third part of course
16Big issue Transaction processing
- Grouping of several queries (or other database
actions) into one transaction - ACID properties
- Atomicity
- all or nothing
- Consistency
- constraints on relationships
- Isolation
- concurrency control
- Simulated solipsim
- Durability
- Crash recovery
17Atomicity Durability
- Saw how George lost 100 with makeshift DBMS
- DBMS prevents this outcome
- xacts are all or nothing
- One idea Keep a log (history) of all actions in
set of xacts - Durability Use log to redo or undo certain ops
in crash recovery - Atomicity dont really commit changes until end
- Then, all at once
18Isolation
- Concurrent execution is essential for
performance. - Frequent, slow disk accesses
- ? dont waste CPU keep running
- Interleaving actions of different user programs
- ?can lead to inconsistency
- e.g., two programs simultaneously withdraw from
the same account - DBMS ensures such problems dont arise
- users can pretend they are using a single-user
system.
19Isolation
- Contrast with a file in two Notepads
- Strategy ignore multiple users
- whichever saves last wins
- first save is overwritten
- Contrast with a file in two Words
- Strategy blunt isolation
- One can edit
- To the other its read-only
20Consistency
- Each xant (on a consistent DB) must leave it in a
consistent state - can define integrity constraints
- checks the defined claims about the data remain
true
21Data Models
- Any DBMS uses a data model collection of
concepts for describing data - Schema description of partic set of data, using
some data model - Relational data model most widely used (by far)
data model - Oracle, DB2, SQLServer, other SQL DBMSs
- main concept relation table of rows columns
- a rels schema defines its fields
22Example university database
- Conceptual schema
- Students(ssn string, name string, login
string, age int, gpa real) - Courses(cid string, cname string, credits int)
- Enrolled(sidstring, cidstring, grade string)
- Physical schema
- Relations stored as unordered text files.
- Indices on first column of each rel
- External Schema (View)
- Course_info(ssn string, name string)
- My_courses(cname string, grade string)
23How the programmer sees the DBMS
- Start with DDL to create tables
- Continue with DML to populate tables
CREATE TABLE Students ( Name CHAR(30) SSN
CHAR(9) PRIMARY KEY NOT NULL, Category
CHAR(20) ) . . .
INSERT INTO Students VALUES(Howard, 123456789,
undergraduate) . . . .
24How the programmer sees the DBMS
- Tables
- Still implemented as files, but behind the scenes
can be quite complex
Takes
Students
Courses
data independence separate logical view from
physical implementation
25Querying Structured Query Language
- Find all the students who have taken C20.0046
- SELECT SSN
- FROM Takes
- WHERE CIDC20.0046
- Find all the students who C20.0046 last fall
- SELECT SSN
- FROM Takes
- WHERE CIDC20.0046 AND SemesterFall, 2003
- Find the students names
- SELECT Name
- FROM Students, Takes
- WHERE Students.SSNTakes.SSN AND
- CIDC20.0046 AND SemesterFall, 2003
- Query processor does this efficiently.
26Database Industry
- Relational databases are a great success of
theoretical ideas. - based on most theoretical type of math there
is set theory - DBMS companies are among the largest software
companies in the world. - Oracle, IBM (with DB2), Microsoft (SQL Server,
Microsoft Access), Sybase. - Also opensource MySQL, Postgres, etc.
- 20B industry.
- XML (semi-structured data) also important
- New lingua franca for exchanging data
27The Study of DBMS
- Several aspects
- Modeling and design of databases
- DBMS programming querying and update
- DBMS implementation
- This course covers all three
- though more time on first two
- Also will look at some more advanced areas
- XML, data-mining, LDAP?
28Databases are used by
- DB app programmers
- desktop app programmers
- web developers
- Database administrators (DBAs)
- design schemas
- security/authorization
- crash recovery
- tuning
- better paid than programmers!
- Everyone else (perhaps indirectly)
29Course outline
- Database design
- Entity/Relationship models
- Modeling constraints
- The relational model
- Relational algebra
- Transforming E/R models to relational schemas
- SQL
- Views and triggers
30Outline (Continued)
- Connecting to a database from a programming
language - Storage and indexing
- Transactions
- XML
- Advanced topics
- May change as course progresses
- partly in response to audience
31Textbook
- Database Systems The Complete Book
- Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer
D. Widom - 1st Edition (2001)
- Available
- NYU bookstore
- Amazon/BN (direct links on classpage)
- Amazon.co.uk (total is/was about 20 less)
- First two chapters in PDF on classpage
32SQL Readings
- Optional reference SQL in a Nutshell
- Online (free) SQL tutorials include
- A Gentle Introduction to SQL (http//sqlzoo.net/)
- SQL for Web Nerds (http//philip.greenspun.com/sql
/)
33Grading
- Prerequisites Programming experience
- presumably C/C/Java
- Work Grading
- Homework 30 O(4)
- Project 30 - see below.
- Midterm (closed book/notes) 15
- Final (closed book/notes) 20
- Class participation 5
- Stern Curve
- Class attendance is required
- Absences will affect your total grade
34The Project design end-to-end DB app
- data model
- Identify entities (and fields), relationships
- Identify resulting relations (tables)
- creation of DB in Oracle
- Insertion of real(alistic) data
- (web) app for accessing/modifying data
- Identification of interesting questions to ask
- Production of DBMS interface
- Work in pairs (start forming now)
- Choose topic on your own previous e.g.s online
- Start forming your group today!
35Collaboration
- Homework and exams done individually
- Project done with your team members only
- Non-cited use of others problem solutions, code,
etc. plagiarism - See Sterns stern academic honesty policy
- Contact me if youre at all unclear before a
particular case - Cite any materials used if youre at all unclear
after a particular case
36On-going Feedback
- Dont wait until the end-of-semester course evals
to complain or give feedback on how to improve
course. (Its too late for you then!) - Come see me early on during my office hours
- or send me email with your concerns
- Were in touch, so you be in touch.
37Summary
- DBMS used to maintain, query large datasets.
- Benefits include recovery from system crashes,
concurrent access, data integrity, security, and
quick application development. - Database skills are critical in financial
services, marketing and other business areas!
38So what is this course about, really ?
- A bit of everything !
- Languages SQL, XPath, XQuery
- Data modeling
- Some theory!
- Functional dependencies, normal forms
- e.g., how to find most efficient schema for data
- Algorithms and data structures (in the third
part) - e.g., indices make data much faster to find but
how? - Lots of implementation and hacking for the
project - Business DBMS examples/cases
- Most importantly how to meet real-world needs
39For next time
- Get the book
- Read chapters 1, 2.1-2.2
40For right now written survey
- name
- previous cs/is/math/logic courses
- previous programming experience
- career plans programmer, DBA, MBA, etc.
- why taking class
- any religious holidays during class