Title: Lecture 4: DBMS Architecture
1Lecture 4 DBMS Architecture
Sept. 6 2006 ChengXiang Zhai
Most slides are adapted from Kevin Changs
lecture slides
2DBMS Mission Statement
- Simply maintenance and computation of data
- But how to do it?
Data
Operations
Results
3DBMS Architecture
User/Web Forms/Applications/DBA
query
transaction
Query Parser
Transaction Manager
Query Rewriter
Logging Recovery
Query Optimizer
Lock Manager
Query Executor
Files Access Methods
Lock Tables
Buffers
Buffer Manager
Main Memory
Storage Manager
Storage
4A Design Dilemma
- To what extent should we reuse OS services?
- Reuse as much as we can
- Performance problem (inefficient)
- Lack of control (incorrect crash recovery)
- Replicating some OS functions (mini OS)
- Have its own buffer pool
- Directly manage record structures with files
5OS vs. DBMS
- Conjecture Perhaps pretty close
- Proof
- There exists someone who can write popular
textbooks in both OS and DBMS! - Operating Database System Concepts
- Jim Gray is from OS background!
6OS vs. DBMS Similarities??
- What do they manage?
- What do they provide?
7OS vs. DBMS Similarities
- Purpose of an OS
- managing hardware
- presenting interface abstraction to applications
- DBMS is in some sense an OS?
- DBMS manages data
- presenting interface abstraction to applications
- Both as API for application development!
8Applications built upon DBMS
- ERP Enterprise Resource Planning
- SAP, Baan, PeopleSoft, Oracle, IBM,...
- CRM Customer Relationship Management
- E.phiphany, Siebel, Vantive, Oracle, IBM, ...
- SCM Supply Chain Management
- Trilogy, i2, Oracle, IBM, ...
- A lot more in the Info Tech era
- e-business software
- scientific data
- multimedia
- data analysis and decision support
9OS vs. DBMS Related Concepts
- Process Management ? What DB concepts?
- process synchronization
- deadlock handling
- Storage management ? What DB concepts?
- virtual memory
- file system
- Protection and security ? What DB concepts?
- authentication
- access control
10OS vs. DBMS Differences??
11OS vs. DBMS Differences
- DBMS Top-down to encapsulate high-level
semantics! - Data
- data with particular logical structures
- Queries
- query language with well defined operations
- Transactions
- transactions with ACID properties
- OS Bottom-up to present low-level hardware
12DBMS on top of OS Relations vs. File system
- Data object abstraction
- file array of characters
- relation set of tuples
- Physical contiguity
- large DB files want clustering of blocks
- extent larger granularity allocation unit
- sol1 managing raw disks by DBMS
- sol2 simulate by managing free spaces in DBMS
- Multiple trees (access methods)
- file access directory hierarchy (user access
method) - block access inodes
- tuple access DBMS indexes
13DBMS on top of OS BM vs. VM
- Query-aware replacement needed for performance
- not always LRU
- Examples?
- how about sort-merge join??
- how about nested-loop join??
14DBMS on top of OS BM vs. VM
- System-controlled replacement needed for
correctness - not always LRU
- Examples?
15Not Really OS Problems Deferred Update Semantics
- Update emp.sal 0.8emp.sal if emp.sal gt mgr.sal
- empname sal manager
- Smith 10k Brown
- Jones 9k
- Brown 11k Jones
- what are the possible semantics?
- INGRES solution deferred updates
- buffer updates in intentions list for actual
updates (also serve as redo log) - an example of needing buffer knowledge in DBMS,
so perhaps not sensible to do BM totally in OS
16As the data model and application context change,
so does the DBMS architecture
17Post-Relational DB Projects
- Motivation
- RDBMS not powerful enough for non-administrative
data-intensive applications such as CAD/CAM,
GIS - Buzz terms object-oriented, extensible
- Sample projects
- Postgres U.C. Berkeley
- Starburst IBM Almaden highly extensible
- after System R (relational), R (distributed)
- ultimately finding its way into IBM DB2 UDB
- Exodus U. Wisconsin
- not a complete DB an OO-style storage manager
toolkit - followed by Shore at Wisconsin, Predator at
Cornell
18POSTGRES Post INGRES
- Stonebraker, U.C. Berkeley
- 1977-1985 INGRES
- among the first relational DB implementation
- ? Ingres Inc. --gt .. ? acquired by Computer
Associates - 1986-1994 POSTGRES
- among the first object-relational DB
implementation - ? Illustra ? acauqired by Informix
- PostgreSQL (the SQL version)
19RDBMS the Relational Root
- Data model (Codd, 1970s)
- a database is a set of relations
- relation of n attributes a set of n-tuples
- n-tuple (v1, , vn), where vi is in domain Si
20Relational Model Normal Forms
- Basic 1NF (First Normal Form)
- implicitly required in the relation model
- definition
- only simple domains of atomic elements (Codd)
- simple domains represent the base (built-in)
types - ? why?
- Stronger normal forms
- 4NF, Boyce-Codd Normal Form, 3NF, 2NF,
- ? why?
21Normalizing Relations Example
- Unnormalized relation of book objects
- Normalized relations by decomposition
- ?? Problems of the relational model?
Books title authors date great future smith,
jones 4/01/01 career jones 7/12/00
Books title day month year great future
4 1 01 career 7 12 00
Books title authors great future
smith great future jones career jones
22Relational Model Problems
- A relational DB is like a garage that forces you
to take your car apart and store the pieces in
little drawers. (some researcher) - Object notion lost by decomposition
- non-intuitive object is decomposed into several
relations - inefficient a lot of online assembling by joins
- Base types are too restrictive
- integers and strings are very primitive
- data types are typically application specific
- Relational algebra is the only allowed operation
- simple, declarative, but also restrictive
- application host language embedded SQL
- ?? How to remedy these problems?
23Quest for a Richer Model?
- Object-oriented data model
- Extensible ADTs
- Programming-language constructs
24ORDBMS vs. OODBMS
- Question How important is the relation?
- ORDBMS
- RDBMS OO features
- query-based
- OODBMS
- OO PL database features (persistent objects)
- programming-based
- Meeting in the middle
25Stonebrakers Matrix
- Prediction ORDBMS will dominate
- evidence big DB players are all on this side
Simple Data Complex Data Query RDBMS ORDBMS No
Query File System OODBMS
26Object Orientation Concepts
- Classes
- classes as types
- encapsulation interface implementation
- inheritance building class hierarchies
- Objects
- complex objects
- built from constructors, e.g., set-of, array,
nested objs - object identity (OID)
- system generated as unique object reference
- enables (efficient) object linking and navigation
27POSTGRES Data Model
- POSTGRES data model
- OO constructs
- classes as relations
- object (class instance) tuple
- object-id tuple-id
- method attribute or function of attributes
- inheritance (multiple parents)
- ADT constructs
- types
- functions
28POSTGRES Functions
- Arbitrary C functions
- e.g. overpaid(Employee)
- arbitrary semantics-- not optimized
- no fancy access methods-- typically sequential
scan - Binary operators
- hints to provide semantics
- extensible access methods
- extensible Btree or user-defined index
- PostQuel procedures
- parameterized queries as functions
- e.g. sal-lookup(name)
- retrieve Emp.salary where Emp.name name
29POSTGRES Storage System
- We were guided by a missionary zeal to do
something different - No-overwrite system
- Logging
- old values are not overwritten-- no value logging
necessary - log only needs to keep transaction state
(commit/abort/going) - ?? crash recovery-- how?
- Vacuum-cleaner daemon to archive historical data
- Advantages
- recovery is cheap
- time travel is easy
30Storage System Problems
- Problems
- flushing differential data (why?) by commit time
can be costly - unless stable main memory
- more costly than sequentially writing out logs
why ?? - reads have to stitch together current picture
- And, yes, there are lots details unexplored or
unexplained
31Questing for the Right Models
- Speaking about knowledge representation The
simple relational model is by far the only
successful KR paradigm. - When the relational model came along, the network
guys resisted and their companies went under. - When the OO model came along, the relational guys
absorb its best, and their companies prospered
again! - -- Jeffery Ullman
32What You Should Know
- What are some major limitations of services
provided by an OS in supporting a DBMS? - In response to such limitations, what does a DBMS
do? - As the data model and task environment change,
the architecture will also need to change
33Carry Away Messages
- One usually doesnt fit all!
- An OS is designed to serve all kinds of
applications, so its not optimal for supporting
a DBMS - Other examples a search engine is designed to
serve all kinds of people, so its not optimal
for a particular person (personalized search) - When a problem is recognized, there are often
opportunities for breakthroughs in multiple areas - DBMS could take over OS functions
- OS could provide more opportunities for
customization - From day 1, high efficiency has been the
primary challenge/concern in designing and
implementing a DBMS reliability may be the
second major concern - In contrast, accuracy of answers is at least
as important as efficiency for a Web search
engine - In the future, accuracy of answers will likely
become more important