Title: Theory of Databases
1Theory of Databases
- Module lecturer Dr Meurig Beynon, Room 3.15
- Module Content
- Generalities on databases
- Database models Theory and Practice
- Entity-relationship models
- Relational models algebraic and logical
- Commercial relational languages
2CS319 Theory of Databases
- Module Content (cont.)
- Technical Issues and Algorithms
- Optimisation
- Integrity constraints
- Database design
- Future Trends Theory and Practice
- Object-oriented models
- Extended relational systems
- The Relational Model and SQL
3Books on Classical Relational Databases
- Date, C.J. A Database Primer
- Date, C.J. An Introduction to Database Systems
(Vol. 1) Addison-Wesley 6th edition, 1995 - Korth, H.F. and Silberschatz Database System
Concepts McGraw-Hill 2nd edition 1991 - Ullman, J.D. Principles of Database Systems 2nd
edition Computer Science Press 1982
4Books on more general database issues
- Ullman, J.D. Principles of Database
Knowledge-base Systems Vols 1 2 Computer
Science Press 1988 - Brown, A.W. Object-oriented Databases
Applications in Software Engineering McGraw-Hill
International Series in Engineering, 1991 - Gray, P.D. Logic, Algebra Databases,
Ellis-Horwood Series, Computers and their
Applications 29 - Kent, William Data and Reality basic assumptions
in data processing reconsidered, North-Holland
1987
5Books on Relational Theory in its application
- Date, C.J. Relational Database Writings 1985-1989
Addison-Wesley and other collections in this
series - Hugh Darwen and C.J.Date Foundations for
Object/Relational Databases The Third Manifesto
1998 Harlow, Addison-Wesley Longman - Dr Hugh Darwen was an IBM consultant on
databases - he is our guest lecturer in weeks 18 and 19
- NOTE he will give 4 lectures in week 18 thus
- Monday February 21th at 2pm and 5pm in CS1.04
- Thursday February 24th at 2pm to 5pm in CS1.04
- He will also give 4 lectures in week 19 at
similar times
6CS319 Schedule for 2005
- Dr Hugh Darwen was an IBM consultant on
databases - he is our guest lecturer in weeks 18 and 19
- NOTE the lectures in week 18 will be on The
Relational Model and SQL thus - Monday February 21st at 2pm and 5pm in CS1.04
- Thursday February 24th at 2pm and 5pm in CS1.04
- the lectures in week 19 will be on
- Temporal Data and the Relational Model thus
- Monday February 28th at 2pm and 5pm in CS1.04
- Thursday March 3rd at 2pm and 5pm in CS1.04
7DB generalities What is a database?
- Chris Date
- Database computer-based record keeping system
- R.W. Engles "A Tutorial on DB Organisation"
(1974) - collection of stored operational data used by
the applications system of a particular
enterprise - enterprise hospital, university, bank, company
etc - operational data
- data on products, accounts, patients etc
- typically persistent cf conventional program IO
data
8DB generalities Why use a database?
- Case-study Banking (after Korth Silberschatz
Chap. 1 - ? How to meet needs using a traditional
file-processing system supported by a
conventional OS - Files permanent records of customers, accounts
- Applications programs (APs) enable user to
modify files - to credit or debit an account
- to add a new account
- to find the balance in an account
- to generate monthly statements
- APs written by systems programmers as required
- new requirements ? new files new programs
9Original context for data modelling 1
- 1970s style applications
- unsophisticated computer users
- batch mode interaction
- modest response times
- no visualisation or GUI
- modest expectations for
ease-of-use - programming perceived as technical
- simple computing infrastructure and
environment - no PC, web etc
- no live feeds of data
- textual interaction the norm
10Original context for data modelling 2
- 1970s style applications
- Business context
- simple business model, limited automation,
access etc - low volume of data
- not initially distributed
- Computing context
- - existing/emerging DB proposals unconvincing
- - computers not very powerful
- - human and computing resources very expensive
11Summary of issues for data management
- Problems of data management for file systems
- that DBs were originally intended to address
- Data redundancy and inconsistency
- Difficulty in accessing data
- Data isolation
- Concurrent access anomalies
- Security problems
- Integrity problems
12DB generalities s and s of DB use
- Conventional file systems have certain
characteristics - will review the key issues for data management
- indicates a positive impact of using a database
- indicates a potentially negative impact of
using a DB
13DB generalities Issues for data management
- Problems that DBs were originally intended to
address - Data redundancy and inconsistency
- Difficulty in accessing data
- Data isolation
- Concurrent access anomalies
- Security problems
- Integrity problems
14DB generalities Issues for data management
- Data redundancy and inconsistency
- each programmer potentially uses different format
file, develops at different stage in history of
enterprise - data duplicated
- in a DB rationalise and standardise data
- rationalise conceptually create an authentic
shared source for data - rationalise doesn't necessarily mean
centralise
15DB generalities Issues for data management
- Data redundancy and inconsistency
- each programmer potentially uses different format
file, develops at different stage in history of
enterprise - data duplicated
- compromises are needed,
- where users suit themselves, can get efficient
results - vs no perfect data organisation to suit all
users - duplication can give insurance against info loss
16DB generalities Issues for data management
- Difficulty in accessing data
- have to respond to unforeseen requests, hence new
functionality - in file-processing environment, have to write new
programs, and possibly devise new data structures - in a DB, simplify access and manipulation by
intelligent organisation of data cf. modelling
approach to requirements, as e.g. in use of UML
in OOSE
17DB generalities Issues for data management
- Data isolation
- data has to be retrieved from many sources when
APs written - in DB, aim to hide the source and form of
physical data by viewing the data at a higher
level of abstraction - automation decreases the amount of human
interaction with data - risk of corrupted data passing between
integrated files is greater
18DB generalities Issues for data management
- Concurrent access anomalies
- would like multiple access for efficiency and
faster response time e.g. simultaneous withdrawal - concurrency can't be managed without a form of
overall control
19DB generalities Issues for data management
- Security problems
- would like to restrict access to authorised users
for confidential info - security can't be managed without a form of
overall control - issue as to whether this control is most easily
exercised inside or outside computer system e.g.
non-trivial problem to determine what can be
inferred from responses to queries that aren't
explicit
20DB generalities Issues for data management
- Integrity problems
- data in file system must satisfy integrity
constraints constraints may arise dynamically
difficult to modify programs to cope with this
also hard to guarantee integrity if data is
stored in different files - automated management demands some form of
overall control - automation reduces scope for human intervention
/ interpretation
21DB generalities Issues for data management
- Conclusion from above discussion ...
- For many commercial applications (as in
enterprises above) good solution is offered by a
database management system (DBMS). A DBMS is an
unconventional OS operating over a structured
file system. The s above indicate some of the
positive benefits of the use of a DBMS.
22DB generalities Issues for data management
- For many commercial applications (as in
1970s-style enterprises above) good solution is
offered by a database management system (DBMS). - A DBMS is an unconventional OS operating over a
structured file system.
23Generalities of DBs the DBMS concept
- Motivating idea
- devise an abstract model of the entire corpus of
operational data that simplifies the data
processing activity, so that - simple queries can be handled without writing new
application programs - where applications programs must be written, the
task of accessing and manipulating operational
data consistently and efficiently is greatly
simplified
24Generalities of DBs themes of the module
- In this module, primarily concerned with a
particular data model ("the relational database")
that has proved very successful in meeting the
needs of many commercial enterprises involved in
data processing. - For other applications, such as interactive
design, different principles seem to be required.
For instance, human interaction and intervention
in data processing for design is of the essence.
In the latter part of the course, will consider
alternative approaches (e.g. object-oriented
databases, deductive databases).
25Generalities of DBs themes of the module
- There is a management aspect to the use of DBs
the issues above can to some extent be addressed
either by manual processes or through automation.
In some cases (e.g. security), it isn't clear
which is most effective. Hardest to automate in
areas where essentially new data and new modes of
organising data are being generated dynamically. - Relational databases are arguably associated with
applications of a particular kind, where the
business processes are especially stable and
ritualised.
26Generalities of DBs themes of the module
- Two views of impact of databases
- can view the DBMS
- as a program generator for the end-user
- cf. current research on end-user programming
- as a means to record persistent real-world state
- cf. current research on virtual reality
- Key issue Is it possible to align paradigms for
programming and general-purpose data modelling?
27DB generalities the ingredients of a database
- Data
- integrated
- shared
- possibly distributed
- Hardware
- primary storage secondary storage
- Software
- database management system DBMS
- protects users from hardware level detail
- serves the needs of all users
28DB generalities the ingredients of a database
- Users
- end-user
- non-specialist accessing data via a query
language - naïve user accessing data via a special-purpose
interface - performs data retrieval and update (extend /
modify) -
- applications programmer
- writes programs that use the DB by embedding
queries to the DB in a HLL - develops interfaces for the naïve user
29DB generalities the ingredients of a database
- Users
- Database Administrator (DBA)
- responsible for overall control
- decides what data is to be stored
- designs the conceptual scheme
- used to represent the operational data
- implements authorisation checks
- decides strategy for backup and recovery
- monitors performance
- oversees modification to suit user requirements
30DB generalities data abstraction in a database
- Data abstraction addresses issues of use,
design, management and implementation in a
database - The data model serves to describe in a formal
manner the way in which data is viewed at three
different levels of abstraction - physical level
- conceptual level
- view level
31DB generalities data abstraction in a database
- physical level
- how is the data actually represented in the
hardware? - bits, bytes
- conceptual level
- what meaningful relationships are expressed by
the physical data? - entities, and relationships between entities
- view level
- what particular relationships are required by
users? - more abstract because partial typically very
high-level knowledge constitutes the view
32DB generalities data abstraction in a database
- Illustrating data abstraction
- Data base stores the date of birth of a client
as a bit string. When we identify the senior
citizens, we find all clients aged over 65. - Representations at different levels of
abstraction - conceptual date of birth of a client
- physical the bit string that records this
information - view refers to age, which isn't stored in the
DB.
33DB generalities data abstraction in a database
DESIGN MANAGEMENT
USE
IMPLEMENTATION
34DB generalities data abstraction in a database
- Internal and external translation schemas serve
- to protect the conceptual model from change when
physical organisation changes or new views are
required - to protect the user from a need to change views
and the database programmer from a need to alter
the physical organisation if and when the
conceptual model is modified
35DB generalities data abstraction in a database
- The DBA conceives the database in terms of the
conceptual model. - Users and application programs access the
physical data via the conceptual model. - physical data independence protecting the
conceptual model from change when the physical
organisation changes - logical data independence protecting the user
from the need to change views when the conceptual
model changes
36DB generalities data abstraction in a database
- Analogous to the way in which the programmer is
protected from low-level issues by using ADTs - program ? Abstract Data Type ? ADT implementation
- physical data independence
- cf. change in an ADT implementation
- logical data independence
- cf. use of old program over enhanced ADT
37Recall - Generalities of DBs the DBMS concept
- Motivating idea
- devise an abstract model of the entire corpus of
operational data that simplifies the data
processing activity, so that - simple queries can be handled without writing new
application programs - where applications programs must be written, the
task of accessing and manipulating operational
data consistently and efficiently is greatly
simplified
38Characteristics of electronic data 1970 (1)
- Abstract model of the entire corpus of
operational data - Separation between persistent transient data
sharper - file vs executing program
- Isolation of persistent data more complete
- changes to persistent data initiated by human
action - persistent data accessed through text interfaces
- Electronic data storage management rare,
expensive - intelligent interpretation of electronic state
by human - no direct connection between environment and data
39Modern context for general data modelling
Programs
40Characteristics of electronic data 1970 (2)
- Abstract model of the entire corpus of
operational data - Demands of the abstract model in 1970 quite low
- small volumes of data, modest performance
- limited levels of volatility and automation
tolerated - Today is very different, BUT subject to viewing
human agency as a metaphor for any agency, the
key issues to be addressed by a classical
database are still vital - Any DB modelling paradigm must handle 70s problems
41DB generalities data models for a database
- Many different paradigms have been proposed for
developing abstract data models for databases - There are two principal kinds of abstract data
model - object-based models
- record-based models
- The earliest database systems were record-based
- - this reflects the file system culture that they
displaced
42DB generalities data models for a database
- Object-based models
- The main models in this category are
- entity-relationship models
- object-oriented data models
- Others include semantic and functional data
models. - E-R model widely used to model data abstractly
- OO model gaining acceptance in practice
effectively represents data operations on data.
43DB generalities data models for a database
- Record-based Logical Models
- Used at the conceptual and view levels. Specify
both - overall logical structure of the database
- higher-level description of the implementation.
- Record-based because uses records in fixed-format
of several types. This simplifies implementation
cf. trend towards richness and variety in
structures used to implement OODBs
44DB generalities data models for a database
- Varieties of record-based logical model
- hierarchical model
- records links organised as a family of trees
- network model
- records links organised as a family of graphs
- relational model
- uses tables to record relationships between data
45DB generalities data models for a database
- Physical Data Models
- There are also models of data at the lowest level
of abstraction, concerned with physical
organisation. These are not our primary concern
in this module. - Relevant issues for relational databases include
- are data tables stored using hashing?
- how are data tables indexed?
- how are entries in data tables encoded and
ordered? - what algorithms are used to retrieve and update?
46DB generalities classical database features
- Instances and Schemes
- State of a DB changes over time distinguish
structure of DB from current state as defined by
the data in it. - overall design of DB database scheme
- current content of DB instance of the DB
- Useful analogy with procedural variables
- database scheme type definition for variable
- instance of database value of the variable
47DB generalities classical database features
- Classical DBs support data abstraction at many
levels - The three levels of abstraction at which the
database can be viewed are modelled by
appropriate schemes. - physical scheme at the lowest level
- conceptual scheme at intermediate level
- several subschemes (possibly user-defined)
- at the highest level (also called views of the
DB)
48DB generalities classical database features
- Data Definition Language (DDL)
- database scheme is defined using a DDL
- compiling the DDL description creates a Data
Dictionary - the storage and access methods used by the DB
- are specified in a storage and definition
language - Implementation details for storage are usually
hidden from users
49DB generalities classical database features
- Data Manipulation Language (DML)
- data manipulation means accessing DB to
- retrieve, insert, delete, or modify data
- most common use of DML is for data retrieval
informally described as "querying the DB" - retrieval component of DML query language
- (and by abuse, sometimes use term query
language as synonym for DML)
50DB generalities classical database features
- Varieties of Data Manipulation Language
- There is a tension between
- efficiency at physical level
- intelligibility / ease of use at higher level
- Have both procedural and non-procedural DMLs
- procedural requires knowledge of data
implementation - non-procedural need only specify what data is
needed
51DB generalities classical database features
- Data Manipulation Languages for typical data
models - object-based, hierarchical, network models have
procedural DMLs - user can take explicit responsibility for
optimising queries, but needs knowledge of data
organisation - relational models use non-procedural DMLs
- can formulate queries without knowledge of data
organisation, but implementation has to be
optimised
52DB generalities classical database features
- Database Manager (as program module!)
- interfaces between low-level data in DB and
application programs user queries. - Large volumes of data (relative to available
technology) - gigabytes thousand megabytes 1 billion bytes
! - terabytes million megabytes 1 trillion bytes
- Requires auxiliary storage media, such as disk,
CD etc. - Optimisation is primarily concerned with
eliminating - data transfers between main and auxiliary memory.
53DB generalities classical database features
- Functions of the DB manager program module
- query processing
- interacting with the file manager modules doing
actual operations on physical data - integrity enforcement
- checking that data in the DB conforms to
specified constraints - security enforcement
- ensuring that authorisation is given for access
to data - backup and recovery
- coping with failure, and recovery to consistent
DB state - concurrency control
- ensuring that simultaneous transactions do not
interfere.
54DB generalities classical database features
- Role of the Database Administrator
- ... functions performed by the DBA .
- scheme definition using DDL
- specification of storage structure and access
method - modification of schemes and physical storage as
required - granting authorisation for data access
- specifying integrity constraints
55DB generalities classical database features
- Overall system structure
- Processing components
- file manager
- database manager
- query processor
- DML precompiler (to process DML embedded in APs)
- DDL compiler
- Data structures
- data files actual content of db
- data dictionary meta-data
- indices auxiliary files to assist fast access
56 to follow Will review features of a DB system
with reference to Ingres and Quel