Title: Introduction to Databases
1Introduction to Databases
Simplicity is the ultimate sophistication.Leona
rdo da Vinci
2Class Outline
- What is data and why is it important?
- What is a database and database schema?
- What is a database management system?
- What is a database application?
- What are the levels of database representation?
- What were the limitations of the systems that led
to the development of the current relational
database systems? - What are various types of database systems?
- What do databases have to do with HEALTH SCIENCE?
3Databases versus The Rest
4Principles of Information Resource Management
- Organizational resources flow into and out of the
organization - Two types of major organizational resources
Physical resources, Conceptual resources (data
information) - As scale of organization grows, it becomes
increasingly difficult to manage by observation
(i.e., reliance on conceptual resources) - Conceptual resources can be managed just like
physical resources or assets (e.g., employees,
, equipment, etc.) - Management of data information means getting it
before its needed, protecting it, assuring
quality, and getting rid of it when no longer
required - Management of data information can be achieved
only through organizational commitment
Adapted from McFadden, F.R. Hoffer, J.A.
(1994). Modern Database Management. Redwood
City, CABenjamin/Cummings Publishing (p. 6)
5Information is a major organizational resource
(2003) 0.5 with rare mutation(2004) 5 with
rare mutation
Information (organized data)
Dr. Jill conducted 100 CFTS (2004)experiments of
100 patients of which 500 had a rare mutation
6What is a Database?
- Organized collection of related information or
data stored on a computer disk for easy,
efficient use
data
information
7In the beginning(in the 1950s)
There were no databases. Just file (or data
processing) systems.
- File systems were typically organized by function
(use) - The first data management systems performed
clerical tasks (transactional processing) such as
order entry processing, payroll, work scheduling. - e.g., files for patients (file folder analogy)
each record for a single patient another file
for appointment/ billing information
Name Jane Doe Address 123 Easy
St. City London Phone 455-0897
Date Sept 14, 1955 Time 200 p.m. Patient Jane
Doe, 455-0897 OHIP 123456789
8Limitations of Data File Systems
CustomerprocessingApplication
Customerfile
OrderprocessingApplication
Orderfile
- Worked adequately if data collection needs were
relatively small. - Problems arose as data files, information needs,
and reporting requirements grow in complexity due
to - Extensive programming - use of third-generation
languages (e.g., COBOL, FORTRAN) in which the
programmer must specify what is be done as well
as how it is to be done
9Limitations of Data File Systems
- Poor mechanisms for sharing data across
organization - files are often incompatible with
one another (separate, isolated data) - Data redundancy - duplicate information in two
or more files - Program/ data dependence - if the file structure
changed, ALL programs using the file had to be
modified - time-consuming - Lack of flexibility - could not do ad hoc queries
or reports required separate programs for every
report or query - Poor security - difficult to program, therefore,
often omitted - Difficulty of representing data in the users
perspective
10Historical Roots of Database Systems
- Developed to overcome limitations of file
systems, developed initially on mainframe
computers in late 60s and early 70s - a typical
early DBMS cost 100,000 (many are still in use) - First general databases were created for General
Electric Company (GEC) - Integrated Data Store
(IDS), designed to run on GEC machines B.F.
Goodrich ported IDS to IBM 360 - became dominant
until 1980s - As PCs gained popularity (1980s), single-user,
personal databases developed at present, most
database technology is used in workgroups
11What is a Database Management System (DBMS)?
- A set of programs used to define,administer, and
process the database and its applications
conveniently and efficiently - Program (or collection of programs) that enables
users to create the database. The DBMS manages
the storage and retrieval of data, and provides
the user with certain functionalities to
guarantee that the data will be logically
organized and consistently applied.
DBMS
DatabaseApplication
Database
User(s)
(e.g., Oracle, dBase, Access, Paradox)
12Features of a DBMS
DBMS
developer
DBMS Engine
users
13What is a Database Application?
Database
- A computer program that performs a specific task
of practical value in a business situation - An interface that allows the user to enter and
manipulate data User can request abstract views
of data - Created by database designers and developers
using a DBMS program or a programming language
DBMS
Database application
14Major Components of a Database Application
5. Program - used to automate a database
15Evolution of Database Models
Hierarchical
still in use in many older (1970s)
legacysystems very few new databasesreferred
to navigational systems
Network
the vast majority currently use this, therefore,
our courses focus is here
Relational
Semantic
Very few new databases are being created using
Object-Oriented Programming (not many ODBMS for
businesses to implement this model)
Object-Relational
Object-Oriented
16The Relational Database Model
Genetics Lab
Gel Images
Experiments
Results
Patients
Gene
- represented by tables (like spreadsheets)
- tables are NOT linked with physical pointers
- unlike earlier systems, all three types of
relationships can be represented - accommodates the design of larger databases that
involve complex relationships and intricate
manipulations
17Evaluation of the Relational database model
But 1 problem still is
- Advantages
- mechanisms for minimizing data redundancy and
inconsistency - logical database design is separated from
physical aspects - relatively program-data independent
- management of data for access, manipulation, and
security - flexible mechanisms for generating reports and
queries - program development and maintenance costs are
reduced - data can be accessed in a multiplicity of ways
within and amongst organizations - Disadvantages
- ease of use - many untrained people create and
use databases without considering its design -
usually incorporate many errors
18Comparison of Database models
- File Systems
- data dependence
- structural dependence
- demands upon programmer
- Hierarchical, Network DBMS
- data independence
- structural dependence
- demands upon programmer
- Relational DBMS
- data independence
- structural independence
- demands upon computer
19Types of Database Systems
- Centralized (single site)
- microcomputer (desktop)
- legacy mainframe/ mini computer (1 CPU)
- client/server architecture (gt1 CPU)
- Distributed
- gt1 site, requires network
- not widely adapted yet due to many problems
20Three levels of Database Representation
Primary focus of the lectures of this course is
the conceptual level because the creation of a
database begins with its design the focus of the
laboratories is the internal and external level.
21The Database System Environment
- Hardware - physical devices
- computer, peripherals, network devices
- Software
- DBMS (manages the database)
- operating systems software (manages hardware
software) - application programs (user access and manipulate
database) - People
- system administrators (manage general operations)
- database designers (architects of database
structure) - database administrators (ensure the database is
functioning) - systems analysts programmers (design
implement database) - end users (use application programs)
- Procedures - rules of the company governing use
of data - Data
22Focus of this course
- Lectures
- Conceptual design of databases determining their
purpose, developing a model, identifying the
tables that are required, designing normalized
tables and identifying their relationship to one
another.
- Laboratories
- Implement a database at the internal and external
level create databases (tables) and database
applications (queries, forms, reports, programs)
using a typical microcomputer relational database
management system, MS Access 2003.
23Better Definition of a Database
- A collection of users data, organized logically
and managed by a unifying set of principles,
procedures, and functionalities, which help
guarantee the consistent application and
interpretation of that data - (a) organized collection of related information
or data stored on a computer disk for easy,
efficient use represented in tabular format
24Better Definition of a Database (cont'd)
- (b) A database is self-describing (metadata or
system catalogues or data dictionary) - A database contains a description of its own
structure (e.g., the names of all the tables, the
names and types of data in each column in all the
tables)
Kroenke, D.M., Database Processing Fundamentals,
Design Implementation, Prentice Hall, 1998
25Better Definition of a Database (cont'd)
- (c) Indexes are stored with the database
- Data accessed from a source table for sorting and
searching is time-consuming without a pointer
system, which improves performance and
accessibility of the database - The overhead cost of indexing is that each time
data is updated, all indexes must also be
updated, therefore, reserve index for cases in
which they are needed
(d) Application Metadata - stores structure and
format of application components not all DBMS
support this feature
26Table
Users view their data in two-dimensional tables.
27Field
The fields within records contain data. Data
within a field must be of the same data type.
Each field within a table must have a unique
name. Order of fields is unimportant.
28Record
- A record is a group of related fields of
information about a single instance of one object
or event in a database. - Tables consist of zero, one, or more records.
- Order of rows is unimportant.
29Database Schema
- Database schema defines databases structure,
tables, relationships, domains, and constraint
rules - Tables
- EXPERIMENT (ExprId, GeneID, ExprName, DoctorID)
- GENE (GeneID, GeneName, GeneDescription)
- DOCTOR (DoctorID, DocFName, DocLName)
- Relationships
- Each experiment involves one and only one gene
- Each doctor is in charge of one or more
experiments - Domains (set of values in a column)
- Physical description (e.g., set of integers 0 lt x
lt 99999) - Constraints (business rules)
- Gene Name cannot be left blank