C20.0046: Database Management Systems Lecture - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

C20.0046: Database Management Systems Lecture

Description:

phone company's records. Local Usage Details ('Pull his LUDs, Lenny.') Caller ID ... directories: Internic, etc. searchable DBs: IMDB, tvguide.com, etc. ... – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 41
Provided by: pagesSt
Category:

less

Transcript and Presenter's Notes

Title: C20.0046: Database Management Systems Lecture


1
C20.0046 Database Management SystemsLecture 1
  • Matthew P. Johnson
  • Stern School of Business, NYU
  • Spring, 2004

2
Personnel
  • Instructor Matthew P. Johnson
  • KMEC 8-176, mjohnson_at_stern.nyu.edu
  • Office hours KMC 8-176, Th 1215-215
  • please visit!
  • Tutor/TF/grader Matthew P. Johnson!

3
Communications
  • Web page http//www.columbia.edu/mpj9/dbms
  • syllabus
  • course policies
  • may move in the future
  • Blackboard web site
  • Some materials will be available here
  • Discussion board
  • send general-interest messages here to benefit
    all!
  • Go to http//sternclasses.nyu.edu
  • Click on C20.0046

4
Acknowledgements
  • Thanks to Ramesh, Ullman, et al., Raghu and
    Johannes, Dan Suciu, Arthur Keller, David Kuijt
    for course materials
  • See classpage for other related, antecedent DBMS
    courses

5
What Is a Database?
  • A very large, integrated collection of data.
  • Models real-world enterprise.
  • Entities (e.g., students, courses, instructors,
    TAs)
  • Relationships (e.g., Joe is taking C20.0046)
  • George is currently taking C20.0046
  • Dick is currently teaching C20.0046
  • Condi is currently TA-ing C20.0046 but took it
    last semester
  • A Database Management System (DBMS) is a software
    package designed to store and manage databases.

6
Databases are everywhere
  • Example Ordering a pizza
  • Databases involved?
  • Pizza Huts DB
  • stores previous orders by customer
  • stores previous credit cards used
  • Credit card records
  • huge databases of (attempted) purchases
  • location, date, amount, parties
  • phone companys records
  • Local Usage Details (Pull his LUDs, Lenny.)
  • Caller ID
  • ensures reported address matches destination

7
Your wallet is full of DB records
  • Drivers license
  • Credit cards
  • NYUCard
  • Medical insurance card
  • Social security card
  • Gym membership
  • Money (serial numbers)
  • Maybe even photos

8
Databases are everywhere
  • Q Websites backed by DBMSs?
  • retail Amazon, etc.
  • data-mining Page You Made
  • search engines Google, etc.
  • directories Internic, etc.
  • searchable DBs IMDB, tvguide.com, etc.
  • Q Non-web examples of DBMSs?
  • criminal/terrorist TIA
  • airline bookings
  • NYPDs CompStat
  • all serious crime stats by precinct
  • Retailers Wal-Mart, etc.
  • when to re-order, purchase patterns, data-mining
  • Genomics!

9
Example of a Traditional DB App
  • Suppose we are building a system
  • to store the information about
  • checking accounts
  • savings accounts
  • account holders
  • state of each of each persons accounts

10
Can we do it without a DBMS?
  • Sure we can! Start by storing the data in files
  • checking.txt savings.txt
    customers.txt
  • Now write C or Java programs to implement
    specific tasks

11
Doing it without a DBMS...
  • Transfer 100 from Georges savings to checking

Write a C program to do the following
Read savings.txt Findupdate the record
George balance - 100 Write savings.txt Read
checking.txt Findupdate the record
George balance 100 Write checking.txt
12
Problems without an DBMS...
  • 1. System crashes
  • Q What is the problem ?
  • A George lost his 100
  • Same problem even if reordered
  • 2. Simultaneous access by many users
  • George and Dick visit ATMs at same
  • Lock checking.txt before each use what is the
    problem?

Read savings.txt Findupdate the rec
George. Write savings.txt Read
checking.txt Findupdate the rec George Write
checking.txt
CRASH !
13
Problems without an DBMS...
  • 3. Large data sets (say 50GB)
  • Why is this a problem?
  • No indices
  • Finding George in huge flatfile is expensive
  • Modifications intractable without better data
    structures
  • George ? Georgie is very expensive
  • Deletions are very expensive

14
Problems without an DBMS...
  • 5. Security?
  • File system may be insecure
  • File system security may be coarse
  • 6. Application programming interface (API)?
  • suppose need other apps to access DB
  • 7. How to interact with other DBMSs?

15
General problems to solve
  • In building our own system, many Qs arise
  • how do we store the data? (file organization,
    etc.)
  • how do we query the data? (write programs)
  • make sure that updates dont mess things up?
  • leave the DB consistent
  • provide different views on the data?
  • e.g., ATM users view v. bank tellers view
  • how do we deal with crashes?
  • Too hard! Go buy a DBMS!
  • Q How does a DBMS solve these problems?
  • A See third part of course

16
Big issue Transaction processing
  • Grouping of several queries (or other database
    actions) into one transaction
  • ACID properties
  • Atomicity
  • all or nothing
  • Consistency
  • constraints on relationships
  • Isolation
  • concurrency control
  • Simulated solipsim
  • Durability
  • Crash recovery

17
Atomicity Durability
  • Saw how George lost 100 with makeshift DBMS
  • DBMS prevents this outcome
  • xacts are all or nothing
  • One idea Keep a log (history) of all actions in
    set of xacts
  • Durability Use log to redo or undo certain ops
    in crash recovery
  • Atomicity dont really commit changes until end
  • Then, all at once

18
Isolation
  • Concurrent execution is essential for
    performance.
  • Frequent, slow disk accesses
  • ? dont waste CPU keep running
  • Interleaving actions of different user programs
  • ?can lead to inconsistency
  • e.g., two programs simultaneously withdraw from
    the same account
  • DBMS ensures such problems dont arise
  • users can pretend they are using a single-user
    system.

19
Isolation
  • Contrast with a file in two Notepads
  • Strategy ignore multiple users
  • whichever saves last wins
  • first save is overwritten
  • Contrast with a file in two Words
  • Strategy blunt isolation
  • One can edit
  • To the other its read-only

20
Consistency
  • Each xant (on a consistent DB) must leave it in a
    consistent state
  • can define integrity constraints
  • checks the defined claims about the data remain
    true

21
Data Models
  • Any DBMS uses a data model collection of
    concepts for describing data
  • Schema description of partic set of data, using
    some data model
  • Relational data model most widely used (by far)
    data model
  • Oracle, DB2, SQLServer, other SQL DBMSs
  • main concept relation table of rows columns
  • a rels schema defines its fields

22
Example university database
  • Conceptual schema
  • Students(ssn string, name string, login
    string, age int, gpa real)
  • Courses(cid string, cname string, credits int)
  • Enrolled(sidstring, cidstring, grade string)
  • Physical schema
  • Relations stored as unordered text files.
  • Indices on first column of each rel
  • External Schema (View)
  • Course_info(ssn string, name string)
  • My_courses(cname string, grade string)

23
How the programmer sees the DBMS
  • Start with DDL to create tables
  • Continue with DML to populate tables

CREATE TABLE Students ( Name CHAR(30) SSN
CHAR(9) PRIMARY KEY NOT NULL, Category
CHAR(20) ) . . .
INSERT INTO Students VALUES(Howard, 123456789,
undergraduate) . . . .
24
How the programmer sees the DBMS
  • Tables
  • Still implemented as files, but behind the scenes
    can be quite complex

Takes
Students
Courses
data independence separate logical view from
physical implementation
25
Querying Structured Query Language
  • Find all the students who have taken C20.0046
  • SELECT SSN
  • FROM Takes
  • WHERE CIDC20.0046
  • Find all the students who C20.0046 last fall
  • SELECT SSN
  • FROM Takes
  • WHERE CIDC20.0046 AND SemesterFall, 2003
  • Find the students names
  • SELECT Name
  • FROM Students, Takes
  • WHERE Students.SSNTakes.SSN AND
  • CIDC20.0046 AND SemesterFall, 2003
  • Query processor does this efficiently.

26
Database Industry
  • Relational databases are a great success of
    theoretical ideas.
  • based on most theoretical type of math there
    is set theory
  • DBMS companies are among the largest software
    companies in the world.
  • Oracle, IBM (with DB2), Microsoft (SQL Server,
    Microsoft Access), Sybase.
  • Also opensource MySQL, Postgres, etc.
  • 20B industry.
  • XML (semi-structured data) also important
  • New lingua franca for exchanging data

27
The Study of DBMS
  • Several aspects
  • Modeling and design of databases
  • DBMS programming querying and update
  • DBMS implementation
  • This course covers all three
  • though more time on first two
  • Also will look at some more advanced areas
  • XML, data-mining, LDAP?

28
Databases are used by
  • DB app programmers
  • desktop app programmers
  • web developers
  • Database administrators (DBAs)
  • design schemas
  • security/authorization
  • crash recovery
  • tuning
  • better paid than programmers!
  • Everyone else (perhaps indirectly)

29
Course outline
  • Database design
  • Entity/Relationship models
  • Modeling constraints
  • The relational model
  • Relational algebra
  • Transforming E/R models to relational schemas
  • SQL
  • Views and triggers

30
Outline (Continued)
  • Connecting to a database from a programming
    language
  • Storage and indexing
  • Transactions
  • XML
  • Advanced topics
  • May change as course progresses
  • partly in response to audience

31
Textbook
  • Database Systems The Complete Book
  • Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer
    D. Widom
  • 1st Edition (2001)
  • Available
  • NYU bookstore
  • Amazon/BN (direct links on classpage)
  • Amazon.co.uk (total is/was about 20 less)
  • First two chapters in PDF on classpage

32
SQL Readings
  • Optional reference SQL in a Nutshell
  • Online (free) SQL tutorials include
  • A Gentle Introduction to SQL (http//sqlzoo.net/)
  • SQL for Web Nerds (http//philip.greenspun.com/sql
    /)

33
Grading
  • Prerequisites Programming experience
  • presumably C/C/Java
  • Work Grading
  • Homework 30 O(4)
  • Project 30 - see below.
  • Midterm (closed book/notes) 15
  • Final (closed book/notes) 20
  • Class participation 5
  • Stern Curve
  • Class attendance is required
  • Absences will affect your total grade

34
The Project design end-to-end DB app
  • data model
  • Identify entities (and fields), relationships
  • Identify resulting relations (tables)
  • creation of DB in Oracle
  • Insertion of real(alistic) data
  • (web) app for accessing/modifying data
  • Identification of interesting questions to ask
  • Production of DBMS interface
  • Work in pairs (start forming now)
  • Choose topic on your own previous e.g.s online
  • Start forming your group today!

35
Collaboration
  • Homework and exams done individually
  • Project done with your team members only
  • Non-cited use of others problem solutions, code,
    etc. plagiarism
  • See Sterns stern academic honesty policy
  • Contact me if youre at all unclear before a
    particular case
  • Cite any materials used if youre at all unclear
    after a particular case

36
On-going Feedback
  • Dont wait until the end-of-semester course evals
    to complain or give feedback on how to improve
    course. (Its too late for you then!)
  • Come see me early on during my office hours
  • or send me email with your concerns
  • Were in touch, so you be in touch.

37
Summary
  • DBMS used to maintain, query large datasets.
  • Benefits include recovery from system crashes,
    concurrent access, data integrity, security, and
    quick application development.
  • Database skills are critical in financial
    services, marketing and other business areas!

38
So what is this course about, really ?
  • A bit of everything !
  • Languages SQL, XPath, XQuery
  • Data modeling
  • Some theory!
  • Functional dependencies, normal forms
  • e.g., how to find most efficient schema for data
  • Algorithms and data structures (in the third
    part)
  • e.g., indices make data much faster to find but
    how?
  • Lots of implementation and hacking for the
    project
  • Business DBMS examples/cases
  • Most importantly how to meet real-world needs

39
For next time
  • Get the book
  • Read chapters 1, 2.1-2.2

40
For right now written survey
  • name
  • previous cs/is/math/logic courses
  • previous programming experience
  • career plans programmer, DBA, MBA, etc.
  • why taking class
  • any religious holidays during class
Write a Comment
User Comments (0)
About PowerShow.com