Title: Database Systems CS 411
1Database SystemsCS 411
- Lecture 1
-
- (with some slides integrated from those of Jiawei
Han, Kevin Chang, Alon Halevy, and Dan Suciu.)
2Self-Introduction
- AnHai Doan
- database and information system group (DAIS)
- Research interests
- databases, data mining, web mining, artificial
intelligence - Hobbies
- mountain climbing, downhill skiing, sailing
- Education history
- Vietnam gt Hungary gt Wisconsin gt Seattle gt
UIUC
3Random Comments from Students
- Take instruction seriously, gave lots of really
excellent dating advice - All in-class examples revolve around beer
- His accent is very annoying
- His accent is great. Its so hard to understand
that Im forced to concentrate in lectures - I now love databases When I own Oracle, I will
pay you back.
4Course administrivia ...
5Course Goals Content
- First course on database systems and data
management at UIUC - cover mostly relational databases
- how to design and create such databases
- how to use them (via SQL query language)
- how to implement them (only briefly)
- will touch on some advanced issues
- XML data models, semi-structured data
- data integration
- you may also try a simple research component
- more on this later
6Prerequisite
- Must have data structure and algorithm background
- CS 225 or 300 equivalent
- Good at C or Java
- project will require lot of programming
- need C or Java to do a good job at talking with
databases - you or your project group picks the language
- Knowing only C will require more work
- more difficult to talk in C to databases
7Textbook
- Required Database Systems The Complete Book,
by Garcia-Molina, Ullman and Widom, 2002 - Comments on the textbook.
- Do you have problems getting your textbook?
- Books on reserve here at the Gringer Library
- "Database Management Systems" by Ramakrishnan and
Gehrke - "Database System Concepts" by Silberschatz,
Korth, and Sudarsan
8Course Format
- For all students
- two 75-min lectures / week
- 4-6 homeworks
- project
- a midterm and a final exam
- Graduate students do an extra project
- survey papers on a research topic, write a 10-15
page report - I will talk with you in detail later in the course
9Lectures
- Lecture slides in ppt format will be posted
shortly before or after the lecture - are to complement the lectures
- Many issues discussed in the lectures will be
covered in the exams and homeworks - hence try to attend lectures regularly
- Will not cover ALL materials on the slides
- attending lectures will tell you which is covered
and which is not
10Homeworks
- Some paper-based, some may involve light
programming - Will be collected at the beginning of class on
the due date, or be collected at my secretary
place - to be decided later
- No late homework will be accepted
11Project
- Select an application that needs a database
- Build a database application from start to finish
- Significant amount of programming
- Will be done in stages
- you will submit some work at the end of each
stage - Will show a demo at semester end
12Project Groups
- Project will be done in group of 3-4 students
- a lot of work, difficult to design so that one
person can do all - learn how to work in a group valuable skills
- groups are like broccoli, they are good for you
- Try to form groups as soon as possible
- can start by posting requests on the class
newsgroup - There will be a deadline later for forming groups
- If you have not formed groups by then
- we will help assign you to groups
13More on Grouping
- All group members receive same grading
- If someone drops out, the rest pick up the work
14Exams
- Midterm final
- will be announced shortly
- check dates and make sure no conflict!
- There will be some brief review before each exam
- If you have conflicts
- do let us know in advance, see course homepage
for more information
15Tentative Grading Breakdown
- Homework 25
- Project 30
- Midterm 20
- Final 25
- Will attempt to grade on an absolute scale as
much as possible - not on a curve
16Contacting the staff ...
17Staff Office Hours
- Instructor AnHai Doan
- Room 2118 Siebel, anhai_at_cs.uiuc.edu
- Office hours Tue Thu 1 hour each, after
lecture
- TAs
- Govind Kabra, Tao Cheng (for on-campus)
- Yoonkyong Lee (for off-campus)
- see syllabus on their office hours, phone, email,
etc.
18Communications
- cs411 uiuc on google
- newsgroup class.cs411
- vitally important!
- make sure to check it daily for new announcements
- If you have a question/problem
- talk to people in your group first
- post your question on newsgroup
- email TA
- go to office hours to talk to TA or instructor
- Office hours are held on ALL WEEKDAYS
- so don't be shy
19Newsgroup
- class.cs411
- designed for you and your peer
- to communicate and help one another
- please do not post solutions to the newsgroup
- please be polite, there are ladies, no crude
jokes - TAs will monitor and try their best to help with
your questions - There can be many questions
- it is usually difficult to answer all of them or
answer in a timely manner - hence should come to office hours or email TA
20Now onto database studies ...
21A Motivating Example
- Suppose we are building a system to store the
information about - students
- courses
- professors
- who takes what, who teaches what
22Application Requirements
- store the data for a long period of time
- large amounts (100s of GB)
- protect against crashes
- protect against unauthorized use
- allow users to query/update
- who teaches CS 173
- enroll Mary in CS 411
23- allow several (100s, 1000s) users to access the
data simultaneously - allow administrators to change the schema
- add information about TAs
24Trying Without a DBMS
- Why Direct Implementation Wont Work
- Storing data file system is limited
- size less than 4GB (on 32 bits machines)
- when system crashes we may loose data
- password-based authorization insufficient
- Query/update
- need to write a new C/Java program for every
new query - need to worry about performance
25- Concurrency limited protection
- need to worry about interfering with other users
- need to offer different views to different users
(e.g. registrar, students, professors) - Schema change
- entails changing file formats
- need to rewrite virtually all applications
- Better let a database system handle it
26What Can a DBMS Do for Us?
- Data Definition Language - DDL
- Data Manipulation Language - DML
- query language
- Storage management
- Transaction Management
- concurrency control
- recovery
- Think buying a plane ticket! Can you do it
without a DBMS?
27What Can a DBMS Do for Us?
- Automate a lot of boring/mundane operations on
data - so that we dont have to program over and over
- so that we can write complex data manipulations
in just a few lines, so that we can concentrate
on app logics - Make execution very fast
- so that it scales up to very large data sets
- Make concurrent access/modification possible
- so that many users can use the data at the same
time
28Building an Application with a DBMS
- Requirements modeling (conceptual, pictures)
- Decide what entities should be part of the
application and how they should be linked. - Schema design and implementation
- Decide on a set of tables, attributes.
- Define the tables in the database system.
- Populate database (insert tuples).
- Write application programs using the DBMS
- way easier now that the data management is taken
care of.
29 Conceptual Modeling
name
category
name
cid
ssn
Takes
Course
Student
quarter
Advises
Teaches
Professor
name
field
address
30Schema Design and Implementation
- Tables
- Separates the logical view from the physical view
of the data.
Students
Takes
Courses
31Querying a Database
- Find all courses that Mary takes
- S(tructured) Q(uery) L(anguage)
- Query processor figures out how to answer the
query efficiently.
select C.namefrom Students S, Takes T,
Courses Cwhere S.name Mary and
S.ssn T.ssn and T.cid C.cid
32Query Optimization
Goal
Imperative query execution plan
Declarative SQL query
select C.name from Students S, Takes T, Courses
C where S.nameMary and S.ssn
T.ssn and T.cid C.cid
Plan tree of Relational Algebra operators,
choice of algorithms at each operator
33Traditional and NovelData Management
- Traditional Data Management
- relational data for enterprise applications
- storage
- query processing/optimization
- transaction processing
- Novel Data Management
- Integration of data from multiple databases,
warehousing. - Data management for decision support, data
mining. - Exchange of data on the web XML.
34Database Industry
- Relational databases are a great success of
theoretical ideas. - Big DBMS companies are among the largest software
companies in the world. - Oracle
- IBM (with DB2)
- Microsoft (SQL Server, Microsoft Access)
- Others
- 20B industry.
35The Study of DBMS
- Several aspects
- Modeling and design of databases
- Database programming querying and update
operations - Database implementation
- DBMS study cuts across many fields of Computer
Science OS, languages, AI, Logic, multimedia,
theory...
36For the next lectureread some parts of the
textbookthe reading requirements will be posted
under lectures/schedule tomorrow