Title: COP5725 Database Management System Final Review
1COP5725 Database Management SystemFinal Review
- Reviewed by Ramakrishna.
- Parts of this are taken from Fernando Farfans
presentation
2AGENDA
- Exercises to do.
- Ch6 JDBC
- Ch7 3-Tier Architecture
- Ch8. Storage and Indexing
- Ch10. Tree-Structured Indexing
- Ch7. XML Documents
- Ch27. XQUERY Querying XML Data
3Exercises to do.
- Remember to practice even numbered exercises of
the book - Solutions are available online at
http//www.cs.wisc.edu/dbbook/openAccess/thirdEdi
tion/supporting_material.htm - Important Recommended exercises 8.3, 8.4, 8.5,
8.7, 8.10, 8.11
4JDBC
- What is JDBC ? Explain its purpose.
- JDBC is Java DataBase Connectivity used to enable
integration of SQL with a general purpose
programming language. - Explain JDBC Architecture
- 4 components
- Application,
- Driver Manager,
- Data Source Specific Drivers,
- Data Sources (stored in MySql,Oracle,DB2,MSSQL,Acc
ess and so on).
5JDBC
- Explain the individual steps required to submit a
query to a data source and to retrieve results in
JDBC ? - JDBC Driver Management (class.forName(..)),
- Establishing Connection (Connection Object),
- Executing SQL Statements
- Statement,
- PreparedStatement,
- CallableStatement.
- Retrieving results ( Examining ResultSets).
6Stored Procedures
- Explain the term stored procedure, and give
examples why stored procedures are useful. - Stored procedures are programs that run on the
database server and can be called with a single
SQL statement. - Runs inside the process space of the database
server. - Stored procedures can also be used to reduce
network communication. - They are locally-executed and results are
packaged in to one big result. - Different users can re-use the stored procedure.
73-Tier Application Architecture
- What is a 2-tier architecture ? Explain different
types of 2-tier architectures. - Its a client-server architecture.
Client
Application Logic
Network
DBMS
Client
Architecture 1 Thin Clients ( web browsers)
83-Tier Application Architecture
- Architecture 2 Thick Clients
Client
Application Logic
Network
DBMS
Client
Application Logic
- Disadvantages ?
- No central place to update
- Need to trust client
93-tier architecture
- What are the advantages of 3-tier architecture ?
- Heterogeneous Systems
- Thin Clients
- Integrated Data Access
- Scalable
Client (user interface)
Network
Network
DBMS (database)
Client (web browser)
Application Logic (in C or Java..)
10CH8. OVERVIEW OF STORAGE AND INDEXING
- DBMS stores vast quantities of data
- Data must persist across program executions
- Data is stored on external storage devices.
- The unit of information read from or written to
disk is a page. - The cost of page I/O dominates the cost of
typical database operations. - Input from disk to main memory
- Output from main memory to disk
11CH8. OVERVIEW OF STORAGE AND INDEXING
- Simplest file structure is an unordered file, or
heap file. - An index is a data structure to optimize certain
kinds of retrieval operations. - CLUSTERED - When a file is organized so that the
ordering of data records is the same as the
ordering of data entries in some index - UNCLUSTERED - It is un-clustered otherwise
- There is UTMOST one clustered index and several
un-clustered ones for a table.
12Example
- Employee (Eid integer, name String, age
integer..) - Assume data records sorted by name
- Assume an index on name
- Index on name ? CLUSTERED
- Assume an index on age
- Index on age ? UNCLUSTERED
13Example Using Indexes
- Employee Data Records Sorted by Name
- Page 1
- (1,Alex,30)
- (2,Amy,21)
- Page 2
- (3, Bob,31)
- (4, Brenda,21)
- Select from Employee where name like A
- Use index on name ? Retrieve Page 1.
- Select from Employee where age 21
- Use index on age ? Retrieve Page 1 Page 2.
14Clustered and UnClustered Indexes
- Retrieval using a Clustered index
- Data retrieval using minimum number of Data page
I/Os. - Retrieval using an Unclustered index
- In worst case, 1 page I/O for every qualifying
tuple.
15Another Example
- Consider Select E.dno from Employee E where
E.age gt 40 - Assume B Tree index on age.
- Is it better to use this index ? Or just do a
segment scan ? - Answer Depends on several factors
- Depends on Selectivity of the condition.
- Whether the index is clustered or unclustered.
16CH8. OVERVIEW OF STORAGE AND INDEXING
- 8.3 Consider a relation stored as a randomly
ordered file for which the only index is an
unclustered index on a field called sal. If you
want to retrieve all records with sal gt 20, is
using the index always the best alternative?
Explain. - No. In this case, the index is unclustered, each
qualifying data entry could contain an rid that
points to a distinct data page, leading to as
many data page I/Os as the number of data entries
that match the range query. In this situation,
using index is actually worse than file scan.
17CH8. OVERVIEW OF STORAGE AND INDEXING
- 8.5 Explain the difference between Hash indexes
and B-tree indexes. In particular, discuss how
equality and range searches work. - Hash Index Hashing function. Quickly maps a
search key value to a bucket. Inserts/Deletes are
simple. Linked list for collisions. Good at
equality searches. Terrible for range queries. - B-tree Index Hierarchical search data
structure. Maintenance is costly. Costly for
individual record lookup. Efficient for range
queries.
18CH8. OVERVIEW OF STORAGE AND INDEXING
- Consider the following relation
- Answer Update the salary of an employee using
the employee id.
19CH8. OVERVIEW OF STORAGE AND INDEXING
- Update the age or employee id for some
department id
Update the salaries of all employees in some
department
20Constraints
21Constraints
22Constraints
23Constraints
Assume tables are already created.
ALTER TABLE ENROLLED add constraint cons1
check((select count() from ENROLLED group by
cname)gt5) ALTER TABLE ENROLLED add
constraint cons2 check((select count() from
ENROLLED group by cname)lt30)
24Constraints
25Views
26Views
27Tree-Structured Indexing
- Explain ISAM and B Trees ? What are the
differences, advantages and disadvantages ? - ISAM Indexes Sequential access Method
- Effective for static files
- Unsuitable for dynamically changing files
- B Tree
- Dynamic index structure
- Adjusts well to changes
- Does well for both range and equality selections
28Tree-Structured Indexing
- Insertions and deletions in ISAM
- Happen only in leaf pages
- Insertions adding records to the overflow chain
- Search inefficient as the chain grows
- In B tree even after several insertions and
deletions, - The tree is kept balanced
- Height of the tree ? length of the path from root
to leaf
29Query Evaluation
- Consider a relation R(a,b,c,d,e) containing
5,000,000 records, where each data page of the
relation holds 10 records. R is organized as a
sorted file with secondary indexes. Assume that
R.a is a candidate key for R, with values lying
in the range 0 to 4,999,999, and that R is stored
in R.a order. For each of the following
relational algebra queries, state which of the
following three approaches is most likely to be
the cheapest - Access the sorted file for R directly.
- Use a (clustered) B tree index on attribute
R.a. - Use a linear hashed index on attribute R.a.
30Query Evaluation
31CH7. XML DOCUMENTS
- 7.1 When is an XML document well-formed? When is
an XML document valid? - An XML document is valid if it has an associated
DTD and the document follows the rules of the
DTD. An XML document is well-formed if it follows
three guidelines - It starts with an XML declaration.
- It contains a root element that contains all
other elements. - All elements are properly nested.
32CH27. XQUERY
- ltbookstoregt ltbook category"COOKING"gt lttitle
lang"en"gtEveryday Italianlt/titlegt ltauthorgtGiada
De Laurentiislt/authorgt ltyeargt2005lt/yeargt ltpri
cegt30.00lt/pricegt lt/bookgt ltbook
category"CHILDREN"gt lttitle lang"en"gtHarry
Potterlt/titlegt ltauthorgtJ K. Rowlinglt/authorgt lt
yeargt2005lt/yeargt ltpricegt29.99lt/pricegt lt/bookgt
ltbook category"WEB"gt lttitle lang"en"gtXQuery
Kick Startlt/titlegt ltauthorgtJames
McGovernlt/authorgt ltauthorgtPer
Bothnerlt/authorgt ltauthorgtKurt
Caglelt/authorgt ltauthorgtJames Linnlt/authorgt ltau
thorgtVaidyanathan Nagarajanlt/authorgt ltyeargt2003lt
/yeargt ltpricegt49.99lt/pricegt lt/bookgt ltbook
category"WEB"gt lttitle lang"en"gtLearning
XMLlt/titlegt ltauthorgtErik T. Raylt/authorgt ltyear
gt2003lt/yeargt ltpricegt39.95lt/pricegt lt/bookgtlt/boo
kstoregt
33CH27. XQUERY
- Querydoc("books.xml")/bookstore/book/title
- Resultlttitle lang"en"gtEveryday
Italianlt/titlegtlttitle lang"en"gtHarry
Potterlt/titlegtlttitle lang"en"gtXQuery Kick
Startlt/titlegtlttitle lang"en"gtLearning
XMLlt/titlegt
34CH27. XQUERY
- Querydoc("books.xml")/bookstore/bookpricelt30
- Resultltbook category"CHILDREN"gt lttitle
lang"en"gtHarry Potterlt/titlegt ltauthorgtJ K.
Rowlinglt/authorgt ltyeargt2005lt/yeargt ltpricegt29.99lt
/pricegtlt/bookgt
35CH27. XQUERY
- Queryfor x in doc("books.xml")/bookstore/book
where x/pricegt30 order by x/title return
x/title - Resultlttitle lang"en"gtLearning
XMLlt/titlegtlttitle lang"en"gtXQuery Kick
Startlt/titlegt
36CH27. XQUERY
- Queryltulgt for x in doc("books.xml")/bookstor
e/book/title order by x return ltligtxlt/ligt
lt/ulgt - Resultltulgt ltligtlttitle lang"en"gtEveryday
Italianlt/titlegtlt/ligt ltligtlttitle lang"en"gtHarry
Potterlt/titlegtlt/ligt ltligtlttitle
lang"en"gtLearning XMLlt/titlegtlt/ligt ltligtlttitle
lang"en"gtXQuery Kick Startlt/titlegtlt/ligtlt/ulgt