Title: What is a Database
1What is a Database?
- Definition A database is a structure that can
store information about multiple types of
entities, the attributes of those entities, and
the relationships among the entities. (Pratt
Adamski, 2003) - Collection of valuable data that exists over a
period of time - Relationships exist between the data
- Structured
- Managed by a database management system (DBMS)
2DBMS
- DBMS Database Management System
- Program that enable the creation and maintenance
of databases - Many different functions including control of
access and design tools - May contain more than one databases
- Used to provide decision support and transaction
processing
3Advantages of a DBMS
- Reduction of redundancies and errors within the
data - Easier to extract information
- The system is more flexible
- Security is improved
- Integrity can be improved
- Data model needs to be produced
4Disadvantages of a DBMS
- Takes longer to access the data
- Need special knowledge
- Applications may be dependant to specific DBMS
versions - Initial cost may be large.
5Characteristics
- A good DBMS has the following
- Data integrity
- Security
- Concurrency control
- Transaction processing
- Recovery functions
6Database Lifecycle
7Database Life Cycle
- System Definition
- Identify the requirements that a system must meet
to satisfy the needs of end users, including
input, processing, storage, output, control, etc.
- Database Design
- Conceptual design
- Analyse overall data requirements needed for
business function - Develop preliminary conceptual data model (ER)
- Logical DB design
- Transform the conceptual data model into
relational model - Map conceptual design to database schema
- Identify data integrity and security requirements
- Populate repository
- Physical DB design
- Performance issues decide on physical
organisation of data, DBMS, external programs,
hardware, operating system, data communication
network, etc
8Database Life Cycle
- Implementation and Loading
- Code database processing programs
- Convert data from prior systems
- Install database and import data
- Complete database training materials and
documentation - Testing and Evaluation
- A set of activities conducted with the intent of
finding errors in software - Identify the correctness, completeness and
quality of developed programs - Maintenance and Evolution
- Analyse database and DB application to ensure
that user and system requirements are met - Audit security and disaster recovery procedures
- Fix errors in DB and DB applications
9Data Modelling
- Important step in the development of the DB
- Good model
- Completeness, accuracy
- No redundancy
- Data reusability
- Stability, flexibility, simplicity
- Communication effectiveness
10Entity-Relationship Model
- E-R model
- Visual representation of the structure
Relationship
11ER Terminology
- Entity
- Real work class of facts
- Named using singular words
- Definitions used to clarify
- Relationship
- A set of associations between one or more entity
type. (Connolly Begg, 2002) - Examples
- A Staff member works at a Branch
- A Branch has Staff members
works at
has
12ER Terminology
- Cardinality
- One-to-One
- One-to-Many
- Many-to-Many
- Participation
- Mandatory or optional
- A customer may not have an invoice
- An invoice has to be associated with a customer
- Represented by circles and bars
13Example Relationship
14Attribute
- State of an entity
- Correspond to columns in a table
- Book
- ISBN
- Author
- Name
- Publisher
15Primary Key
- Attribute(s) that are distinct for each entry
- Primary key is UNIQUE
- Examples Student Number, Book ISBN, PIN,
Drivers Licence Number, Course Code...
16Foreign Keys
- A foreign key attribute set is a primary key
attribute in one table that is repeated as a
foreign key in another table - A non-key attribute in one relation that also
appears as a primary key in another relation
17What is Normalization?
- Reorganisation and evaluation of table structures
- Produces a set of stable, well structured tables.
- Remove repeating/redundant data
18Why Normalize?
- Remove redundant data
- Reduce update anomalies
- Reduce delete anomalies
- Reduce insert anomalies
19Normal Forms
- UNF a table that contains one or more repeating
groups - 1NF A relation is in 1NF is all value are
atomic and no rows are repeated. - 2NF A relation is said to be in 2NF if it is in
1NF and every non-key attribute is fully
functionally dependent on the primary key - 3NF A relation is said to be in 3NF if it is in
2NF and every non-key is non-transitively
dependent on the primary key - All attributes in every table must be determined
by the key, the whole key, and nothing but the
key
20Un-normalized Data
- Grade-Report
- sNumber sName sAddress uNumber
uTitle iName iRoom sMark - 38214 Bright 123 Smith St IS350
Database CODD B104 A - 38214 Bright 123 Smith St IS465 Sys
Anal KEMP B213 C - 69173 Smith 35 Main Rd IS465 Sys
Anal KEMP B213 A - 69173 Smith 35 Main Rd PM300 Proj
Mgt LEWIS D317 B - 69173 Smith 35 Main Rd QM440 OpSys KEMP B213
A - What entities are present?
- What fields do they have?
- Can the table be decomposed further? State the
reason. - Decompose the table into 3NF
-
21SQL (Structured Query Language)
- Developed by IBM in late 70s for DB2
- SEQUEL
- Used to
- Build
- Alter
- Query
22Data Manipulation Language
- Used to retrieve, add, modify and delete
- Commands
- SELECT
- INSERT
- UPDATE
- DELETE
- ANSI vs Various DBMS implementations
23SELECT STATEMENT
- SELECT column(s)
- FROM table(s)
- WHERE IS, NOT, NULL, , ltgt, gt , lt, IN, NOT IN,
BEETWEEN, LIKE - GROUP BY column(s)
- HAVING COUNT
- ORDER BY column(s) ASC, DESC
24SQL Functions
- Five functions
- AVG
- MAX
- MIN
- SUM
- COUNT
- MS-Access functions
- DATE()
- LCASE/UCASE
- MID
- LTRIM/RTRIM
- Etc
25Database Development Activities
- Six main phases
- Enterprise modelling
- Conceptual data modelling
- Logical database design
- Physical database design and creation
- Database implementation
- Database maintenance
26Data and Database Administration
- Two factors are driving the changes in the data
administration and DB administration - the availability of more technologies and
platforms that must be managed concurrently - The increased pace of business changes
27Main Activities Performed by DBA
- Physical data modelling and meta-data management
- Space planning, creation and upkeeping
- Database object maintenance and security
- Application design and execution problem
identification - Backing up, recovering, loading, and moving data
- Pro-actively monitoring the database and taking
corrective actions
28Growth Change
- What happens when the database grows?
- Analyse space utilization and performance
- Additional space allocated
- Change in structure
- Check if already available (where?)
- Y use current data
- N add new data-types or relationships
- Change in usage
- Move to faster devices
- Change placement (clustering)
- Change contents
29Six Types of Information Systems
- Transaction Processing Systems (TPS)
- Management Information Systems (MIS)
- Decision Support Systems (DSS)
- Executive Information Systems (EIS)
- Expert Systems (ES)
- Geographical Information Systems (GIS)
30Organisations
- Organisation classification
- Machine Bureaucracy
- Divisionalised Bureaucracy
- Entrepreneurial structure
- Professional
- Ad hoc racy
31Distributed DBMS
- Distribute Databases
- DDB is a single logical database that is
physically divided among computers at several
sites on a network. - Distribute Database Management Systems
- DDBMS governs the storage and processing of
logically related data over interconnected
computer systems in which both data and
processing functions are distributed among
several sites.
32Distributed DBMS Characteristics
- A collection of logically related shared data
- Data is split into a number of fragments
- Fragments may be replicated
- Fragments/replicates are allocated to sites
- The sites are linked by a communication network
- The data at each site is under the control of a
DBMS - The DBMS at each site can handle local
applications - Each DBMS participates in at least one global
application
33Distributed Processing Vs Distributed Databases
- Distributed processing
- The ability to have several computers working
together in a network, where each processor runs
different activities - E.g. A centralized database that can be accessed
over a computer network - Distributed processing does not require
distributed database - Distributed database requires distributed
processing - Both require network to connect all components
34Distributed Database Design
- Data partitioning
- Horizontal partitioning
- Division of relation into subsets of tuples
- Each stored in different node
- Represents a SELECT statement with a WHERE clause
- Vertical partitioning
- Division of relation into attribute subsets
- Each located at different node
- Equivalent of a PROJECT statement
- Mixed
- Combination of horizontal and vertical
- Table divided horizontally, each having subset of
attributes
35Distributed Database Design
- Data replication
- Storage of data copies at multiple sites served
by a computer network - Replication conditions
- Fully replicated multiple copies of all
partitions at multiple sites - Partially replicated multiple copies of some
partitions at multiple sites - Factors for data replication decision
- Database size
- Usage frequency
- costs
36Distributed Database Design
- Location transparency
- One of the major characteristics of DDBMS, which
states that users do not need to be aware of the
location of data in a distributed databases - Replication transparency
- One of the major characteristics of DDBMS. The
steps taken by the DDBMS to update the various
copies of data should be done behind the scenes
users should be unaware of the steps - Fragmentation transparency
- Users should not be aware of the fragmentation.
If users are unaware of fragmentation, the DDBMS
has fragmentation transparency
37Transaction processing terms
- A transaction is an atomic unit of work. i.e.
changes to the database that must be done
together. - For example, in discharging a patient from a
hospital the system needs to update tables to
indicate his bed is free and needs to update
tables to store his bill. Doing one without the
other would be incomplete.
38Transaction processing terms
- A program that changes data in tables is called
transaction. - Single user vs Multi user. DBMS may allow more
than one user to update data stored in the DB.
However users may update the same data at the
same time i.e. Concurrently - Concurrency control is a term that describes
procedures used to ensure correct interleaving of
transactions
39Transaction processing
- Begin transaction
- Read, Write
- End transaction
- Commit transaction
- Rollback
- Redo
- Undo
40Concurrency Control
- A program that changes data in tables (as an
atomic unit of work) is called transaction - Concurrency control (CC) controls flow in a
multi-user database - CC is managing and maintaining data integrity
- CC is related to transactions
41Locks
- Two types of locks
- Read / Shared others allowed to read locked
data but cannot update. Placing a shared lock
prevents another user for placing exclusive lock
on that record set - Write / Exclusive prohibits other transaction
from accessing data (reading). A transaction
should put a write lock on record set when is
about to update that record set. Placing write
lock prevents users from placing any type of lock
on that record set - LOCK TABLE statement
- LOCK TABLE tableName IN SHARE/EXCLUSIVE MODE
42Deadlocks
- Also referred to as Deadly Embrace
- Locking introduces this problem
- Example
- Application A puts read lock on table X
- Application B puts read lock on table X
- A requests write lock to X (wait)
- B requests write lock to X (wait)
43Deadlock
- Results when two or more transactions have locked
a common resource, and each waits for the other
to unlock that resource. - Unless the DBMS intervenes (by maintaining a
matrix of usage), the transactions will wait
indefinitely.
44Decision Support Systems (DSS)
- Methodology designed to extract information from
data accumulated over time - Arrangement of computerized tools to assist
decision making - Used at all levels tailored to focus on
specific areas or problems - Provides ability to conduct ad hoc queries
45Data Warehouse
- Data Warehouse (DW) is a broad based, shared
database for management decision making that
contains data that has been accumulated over time - Data must be high quality, aggregated, often
denormalised and is not necessarily absolutely
current - The Data Warehouse is an integrated,
subject-oriented, time-variant, non-volatile
database that provides support for
decision-making.
46On-Line Analytical Processing (OLAP)
- Decision support methodology based on viewing
data in multiple dimensions - OLAP is well suited for querying and multi time
period trend analysis - OLAP concepts
- Drill down (yearly-gtmonthly-gtweekly)
- Slice (subset of the data e.g. products-gtshoes)
- Rotate (interchange the dimensions)
47Data Mining
- DSS tools reactive
- Data-mining proactive
- Automatically search for anomalies and possible
relationships
4803P3 Final Exam
- Theory test
- When
- Tuesday, 22 November 2005
- Reading commences 920am
- Writing Commences 930am
- Duration 2 hours
- Where Exam Hall
- Exam timetable
- http//www.ballarat.edu.au/aasp/student/admin/he/f
inal_exam_timetable_210905.pdf
4903P3 Final Exam
- Question 1- Normalization
- Functional dependency
- 1NF, 2NF and 3NF
- Decompose tables to normalized form
- Maintain relationship between tables
- Question 2 - ER design
- Entities
- Relationships (resolve many-to-many)
- Attributes
- Cardinalities
- keys
5003P3 Final Exam
- Question 3 - Short answered questions
- Important points
- Short answers
- Read question carefully
- Question 4 SQL
- Select
- Insert
- Delete
- Update
- Create
- Etc