Databases

About This Presentation

Transcript and Presenter's Notes

Title: Databases

1
Databases

Practical Informatics Course
APIII 2008

2
What is a database?

Technically, a database is any collection of
data, usually in some sort of structure, but this
definition includes essentially everything, and
therefore is not very useful
More practically, a database
is a non-volatile repository for data with some
inherent meaning
is designed, built, and populated for a specific
purpose
stores multiple instances of some type of
information
stores similar details about each instance of
the information
Databases can be as simple as a list of CDs or as
complex as a medical record

3
A Sample, Simple Database

Address Book
Last Name
First Name
Street Address
City
State
Zip Code
Phone Number

Last Name
First Name
Street
City
State
Zip
Phone
4
Database Terminology
Records
The things about which information is stored.
Each record is indepen-dent of each other record.
Last Name
First Name
Sinard
John
Street
310 Cedar Street
City
New Haven
State
Zip
CT
06510
Phone
(203) 785-3891
Fields
Values/Data
The informational elements which are stored for
each record.
The actual information which is stored for each
field.
5
Database Conceptualization

It is convenient to think of simple databases as
tables (more complex databases as groups of
tables), and set aside the details of the data
storage and manipulation
In practice, these details are generally
relegated to a Database Management System (more
later), so it turns out that this abstraction
process is not only theoretical but also the
usual practice

Columns (fields, attributes)
Rows (instances, records, tuples)
Element (data, value)
6
Single Table Databases

Advantages
Simple to conceptualize
Many options for implementation, including simply
a spreadsheet
Disadvantages
Only simple databases can be accommodated
The real world is more complex than a single
entity with attributes
Repeating Fields
Attributes which can occur multiple times for a
single instance of an entity
Redundant Fields
Groups of attributes which have the same values
over multiple instances of an entity
Difficult to expand or modify as needs change

7
Shortcomings of Single-Table Databases
Repeating Fields

Some attributes (fields) may occur multiple times
for a single instance of an entity
A person can have more than one phone number
How many repeats does one leave room for?
Must allocate space in each row for the maximum
number of repeats needed for any one row
Are all repetitions equivalent or does the
position of the repeat carry some meaning?
If one repeat is deleted, do the others shift
down or stay where they are?
Problem increases when there are multiple fields
which repeat as a group

8
Shortcomings of Single-Table Databases
Redundant Fields

Values of groups of fields can repeat over
multiple instances (rows) of an entity
e.g, Two people can live at the same address
Lots of database space can be used storing the
same data over and over again
Modification or incorrect entry of the data in
any one row can lead to inconsistencies in the
data across rows
Real changes to the data require the changes to
be made in multiple rows
e.g, A family moves to a new address

9
Complex Database Architecture

A number of different approaches have been
developed to address the one-to-many problem
Each approach has its own advantages and
disadvantages
The best choice depends on the particular
application
Complex Database Architectures
Hierarchical
Network
Relational
Object Oriented

10
Relational Databases

All data is stored in tables
Each table represents an entity and corresponds
to some real-life concept
eg. person, address, phone number
Each entity has attributes (columns) which
further define that entity
eg. Person first name, last name, date of birth,
gender
Address street, city, state, zip code
Within a table, each instance (row) is defined
uniquely by the values of its attributes the
order of the rows can be changed/swapped/sorted
without consequence
Data in different tables is related to each other
by common data, column elements which are shared
by two or more tables

11
Relational Database Terminology
Entity (Table, Relation)
Attributes (Columns, Fields)
Specimens
Accession
Date
Patient
Clinical History
Instances (Rows, Tuples)
1-1-97
Smith, John
Upper GI Bleed
Ovarian Tumor
1-1-97
Jones, Mary
1-1-97
Jones, Mary
Ovarian Tumor
1-1-97
Black, Frank
Gunshot Wound
Cardinality ( of Rows)
1-1-97
Simpson, Homer
Radiation Exposure
1-1-97
T
aylor, Tim
Hip Fracture
1-1-97
Skywalker
, Luke
s/p
T
raumatic
Amputation
1-1-97
Janeway
, Katherine
Viral Infection
Degree ( of Columns)
Element
12
Relational Database Terminology
Entities
Patients
Specimens
Parts
- Each Patient may have 0, 1 or many specimens -
Each Specimen must belong to one and only one
patient
- Each Specimen must have at least 1 part, but
may have many parts - Each Part must belong to
one and only one Specimen
Relationships
13
Relational Databases
Primary Key

The Primary Key for a table is an attribute
(column, field) or combination of attributes for
which every instance (row) of the table has a
unique value
When a combination of attributes is used, it is
referred to as a composite key
Most tables in a relational database will have a
primary key defined
In general, the value of the primary key should
not change once assigned, since it will often be
pointed to by a foreign key in another table
Therefore, data which might change (name, phone
number) makes a bad key
Selection of an appropriate primary key requires
an understanding of the data which is stored
cannot be done by simply looking at the data in
the database
Depending upon the entity, it is often necessary
to create an artificial field such as a
sequential identifier to serve as the primary
key
eg for a person table, one cannot guarantee that
the last name, first name, or even the
combination of both will always be unique for
each person

14
Relational Databases
Foreign Key

A Foreign Key in a table is an attribute
(column, field) which defines a relationship
between two entities (tables)
The related rows are identified by the value in
the foreign key column of one table matching the
value in another column (usually the primary key)
of another table
This allows one-to-one, one-to-many, many-to-one,
and even many-to-many relationships
In a one-to-many relationship, the table with the
many rows contains the foreign key and points
to the primary key in the table with the one
rows
The foreign key can be identified as such to
the database management system within the
definition of the table, or simply supplied
on-the-fly during a query

15
Relational Databases
Keys
Primary Key
Foreign Key
One-to-One
Many-to-One
Putting a foreign key in the person table allows
multiple people to be related to a single
address.
Primary Key
Foreign Key
Foreign Key
The Phone_Numbers table has two foreign keys,
allowing phone numbers to be associated with
either a person or an address, or both.
Last name would not work as a primary key because
two people can have the same last name.
Primary Key
Since the phone number is unique, it can serve as
the primary key for this table.
16
Entity-Relationship Diagrams

A tool for designing relational databases
Does not contain actual data but rather names of
the tables (entities) and columns (attributes) as
well as a symbolic representation of the
relationships between these
Many different formats exist

17
Entity-Relationship Diagram
Data View
Entity-Relationship View
18
Entity - Attribute - Value Data Model
Attributes
Entities
Values
19
Entity - Attribute - Value Data Model

EAV Model Advantages
When many of the possible attributes remain
unvalued, takes less space in database
Each instance of the entity can have different
attributes stored
Can accommodate attributes which one did not know
about at the time the database was created
EAV Model Disadvantages
If most of the instances of an entity have values
for a particular attribute, it is less efficient
to use the EAV model since one must repeatedly
store the entity instance identifiers and the
attribute name
Data retrieval is less efficient
The attributes for a given entity instance will
be scattered in the table
For aggregate data analysis, first have to find
the desired attributes

20
Direct Program Access to Databases
Application Program
Application Program
Database
21
Disadvantage of Direct Program Access to Databases

Any modification to the type of data stored
(e.g., number of fields) requires a complete
rebuild of the entire database
Each application which accesses the data needs to
know the details of the physical storage of the
data
Applications need to be substantially modified if
the structure of the data stored is altered
Applications need to manually build and
maintain any desired indices

22
Database Management System
Application Program
Application Program
Program Routine
Program Routine
Program Routine
Program Routine
Database Management System (DBMS)
Database
23
Database Management Systems
DBMS

A DBMS is a type of middle-ware which sits
between the applications and the database
All access to the database must go through the
DBMS
Only the DBMS needs to know the details of the
physical storage of the data
Application routines can treat the database in an
abstract fashion, funneling all requests for
data through the DBMS using a database language
If a common language is used by different DBMSs,
a completely different database system can be
substituted without requiring any modifications
to the application routines

24
Client-Server Architecture
Client Computer
Client Computer
Client Computer
Application Program
Application Program
Application Program
Program Routine
Program Routine
Program Routine
Program Routine
Program Routine
Program Routine
DBMS Front End
DBMS Front End
DBMS Front End
Network
DBMS Back End
Database
Database Server
25
Multi-user Access to Large Databases
Most large databases need to be accessible to
multiple users at the same time

The DBMS is split into two components
The Front-end or Client resides on each
users machine
Program routines submit requests to the DBMS
front-end
The DBMS front-end communicates over the network
with the DBMS back-end

Program
Program
Routine
Routine
Routine
Routine
Routine
Routine
DBMS Front-end
Server
DBMS Back-end

The Back-end sits on a separate database
server, communicates with the database, and
processes the requests of the front-end(s)
Multi-user access requires a more sophisticated
DBMS

26
Database Abstraction
Concerns of the DBMS
Concerns of the Database Designer and the
Application

What type of data is stored in the database?
How are the fields in the database related to
each other?
What data needs to be retrieved?
What data should be indexed, based upon how the
data is routinely requested?

How and where is the data stored on disk?
How is free space in the database maintained?
What changes need to be made to an index to
accommodate a change in the data?
What is the best way to find the data requested?

Physical Data Model
Conceptual Data Model
27
Functions of a Full Featured DBMS

Handle all the details of the physical storage of
the data
Data Types
Maintain indices and keys
Support data addition, updating and deleting
Support changes in conceptual database design /
definition
Client/Server distribution
Authorization / User Privileges / Security
Data Integrity
Referential Integrity
Data Concurrency
Data Recovery
Database Backup
Transaction Logs
Data Locking and Deadlocks
Support for a Query Language

28
Preserving Data Integrity in a Database
Referential Integrity

Making sure that the rows in a child table
related to a row in a parent table are not
orphaned by deletion of the row in the parent
table

?
29
Preserving Data Integrity in a Database
Referential Integrity

Declarative Referential Integrity
Define Foreign Keys as such, indicating what
table/primary key they reference
The DBMS will make sure that any row referenced
by a foreign key in another table is not deleted
attempts to do so generates an error
Advantage Developer does not need to do anything
other than define the relationship (and check for
errors)
Disadvantage Deleting a parent requires
specifically deleting all the child rows first
Procedural Referential Integrity
Create Procedures which are executed whenever
the contents of a table are changed (called
Triggers can be defined for inserts, updates,
and/or deletes)
Advantage Greater flexibility for developer to
handle references
Disadvantage Developer has to make sure triggers
properly address referential integrity issues
Example A delete trigger on the Address
table automatically deletes any rows in the
Person table which points to the row deleted
from the Address table

30
Database Language

Applications need a language with which to
communicate with databases
The database language must support
Data definition
Data retrieval
Data manipulation
Data integrity maintenance
Access control
Data sharing

31
SQL
(A Language for communicating with relational
databases)

SQL is not a complete programming language
Not as structured as C or Pascal
No IF, GOTO, DO, or FOR statements no variable
declaration
Not procedural statements specify what, but
not how
Statements resemble English sentences
Includes the use of optional noise words
Improves readability but does not alter
functionality
Not a stand-alone product
SQL is a standard which is incorporated, to
varying degrees, into commercially available
products
eg DB2, Access, Oracle, Sybase, Informix, dBASE,
SQL Server

32
Sample SQL Statement
Address add_id Street City State Zip a1 310 Cedar
St. New Haven CT 06510 a2 Disney
World Orlando FL 32300 a3 4 Privet Drive Little
Whinging Surrey a4 Overlook Hotel Sidewinder CO 8
0200
SELECT Person.Last, Person.First, Address.City,
Address.State FROM Person, Address WHERE
Person.add_id Address.add_id AND
Address.State "FL"
33
Features of SQL

Vendor independence (widespread vendor support)
Hardware/System independence
Relational basis
Dynamic data definition
Client/Server architecture
Direct interactive access without having to write
programs to get at the data
Program level access

34
Data Mining

Data Mining is the process of trying to find
new relationships in data not previously known or
perhaps even suspected
Requires lots of data from multiple sources
Numerous approaches, often based predominantly on
hit or miss queries
Uses neural networks, case-based reasoning,
statistical analyses
Two approaches for aggregating data from multiple
systems into a single database
Data Repository
Data Federation

35
Users
Users
Laboratory Medicine
Diagnostic Radiology
Diagnostic Radiology
Anatomic Pathology
Anatomic Pathology
Laboratory Medicine
36
Data Repository vs Data Federation

Data Repository
Data physically collected in one location
Structure can be optimized for data mining
Queries do not affect production/transactional
environments
Data Federation
Data is always up-to-date
Fewer data ownership / political issues
Any given data source can withdraw from the
federation at any time
Data inconsistencies can occur
More complex to set up and maintain

Write a Comment

User Comments (0)

About PowerShow.com

Databases PowerPoint PPT Presentation