Databases - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Databases

Description:

Databases – PowerPoint PPT presentation

Number of Views:197
Avg rating:3.0/5.0
Slides: 37
Provided by: johns112
Category:
Tags: databases | ee

less

Transcript and Presenter's Notes

Title: Databases


1
Databases
  • Practical Informatics Course
  • APIII 2008

2
What is a database?
  • Technically, a database is any collection of
    data, usually in some sort of structure, but this
    definition includes essentially everything, and
    therefore is not very useful
  • More practically, a database
  • is a non-volatile repository for data with some
    inherent meaning
  • is designed, built, and populated for a specific
    purpose
  • stores multiple instances of some type of
    information
  • stores similar details about each instance of
    the information
  • Databases can be as simple as a list of CDs or as
    complex as a medical record

3
A Sample, Simple Database
  • Address Book
  • Last Name
  • First Name
  • Street Address
  • City
  • State
  • Zip Code
  • Phone Number

Last Name
First Name
Street
City
State
Zip
Phone
4
Database Terminology
Records
The things about which information is stored.
Each record is indepen-dent of each other record.
Last Name
First Name
Sinard
John
Street
310 Cedar Street
City
New Haven
State
Zip
CT
06510
Phone
(203) 785-3891
Fields
Values/Data
The informational elements which are stored for
each record.
The actual information which is stored for each
field.
5
Database Conceptualization
  • It is convenient to think of simple databases as
    tables (more complex databases as groups of
    tables), and set aside the details of the data
    storage and manipulation
  • In practice, these details are generally
    relegated to a Database Management System (more
    later), so it turns out that this abstraction
    process is not only theoretical but also the
    usual practice

Columns (fields, attributes)
Rows (instances, records, tuples)
Element (data, value)
6
Single Table Databases
  • Advantages
  • Simple to conceptualize
  • Many options for implementation, including simply
    a spreadsheet
  • Disadvantages
  • Only simple databases can be accommodated
  • The real world is more complex than a single
    entity with attributes
  • Repeating Fields
  • Attributes which can occur multiple times for a
    single instance of an entity
  • Redundant Fields
  • Groups of attributes which have the same values
    over multiple instances of an entity
  • Difficult to expand or modify as needs change

7
Shortcomings of Single-Table Databases
Repeating Fields
  • Some attributes (fields) may occur multiple times
    for a single instance of an entity
  • A person can have more than one phone number
  • How many repeats does one leave room for?
  • Must allocate space in each row for the maximum
    number of repeats needed for any one row
  • Are all repetitions equivalent or does the
    position of the repeat carry some meaning?
  • If one repeat is deleted, do the others shift
    down or stay where they are?
  • Problem increases when there are multiple fields
    which repeat as a group

8
Shortcomings of Single-Table Databases
Redundant Fields
  • Values of groups of fields can repeat over
    multiple instances (rows) of an entity
  • e.g, Two people can live at the same address
  • Lots of database space can be used storing the
    same data over and over again
  • Modification or incorrect entry of the data in
    any one row can lead to inconsistencies in the
    data across rows
  • Real changes to the data require the changes to
    be made in multiple rows
  • e.g, A family moves to a new address

9
Complex Database Architecture
  • A number of different approaches have been
    developed to address the one-to-many problem
  • Each approach has its own advantages and
    disadvantages
  • The best choice depends on the particular
    application
  • Complex Database Architectures
  • Hierarchical
  • Network
  • Relational
  • Object Oriented

10
Relational Databases
  • All data is stored in tables
  • Each table represents an entity and corresponds
    to some real-life concept
  • eg. person, address, phone number
  • Each entity has attributes (columns) which
    further define that entity
  • eg. Person first name, last name, date of birth,
    gender
  • Address street, city, state, zip code
  • Within a table, each instance (row) is defined
    uniquely by the values of its attributes the
    order of the rows can be changed/swapped/sorted
    without consequence
  • Data in different tables is related to each other
    by common data, column elements which are shared
    by two or more tables

11
Relational Database Terminology
Entity (Table, Relation)
Attributes (Columns, Fields)
Specimens
Accession
Date
Patient
Clinical History
Instances (Rows, Tuples)
1-1-97
Smith, John
Upper GI Bleed
Ovarian Tumor
1-1-97
Jones, Mary
1-1-97
Jones, Mary
Ovarian Tumor
1-1-97
Black, Frank
Gunshot Wound
Cardinality ( of Rows)
1-1-97
Simpson, Homer
Radiation Exposure
1-1-97
T
aylor, Tim
Hip Fracture
1-1-97
Skywalker
, Luke
s/p
T
raumatic
Amputation
1-1-97
Janeway
, Katherine
Viral Infection
Degree ( of Columns)
Element
12
Relational Database Terminology
Entities
Patients
Specimens
Parts
- Each Patient may have 0, 1 or many specimens -
Each Specimen must belong to one and only one
patient
- Each Specimen must have at least 1 part, but
may have many parts - Each Part must belong to
one and only one Specimen
Relationships
13
Relational Databases
Primary Key
  • The Primary Key for a table is an attribute
    (column, field) or combination of attributes for
    which every instance (row) of the table has a
    unique value
  • When a combination of attributes is used, it is
    referred to as a composite key
  • Most tables in a relational database will have a
    primary key defined
  • In general, the value of the primary key should
    not change once assigned, since it will often be
    pointed to by a foreign key in another table
  • Therefore, data which might change (name, phone
    number) makes a bad key
  • Selection of an appropriate primary key requires
    an understanding of the data which is stored
    cannot be done by simply looking at the data in
    the database
  • Depending upon the entity, it is often necessary
    to create an artificial field such as a
    sequential identifier to serve as the primary
    key
  • eg for a person table, one cannot guarantee that
    the last name, first name, or even the
    combination of both will always be unique for
    each person

14
Relational Databases
Foreign Key
  • A Foreign Key in a table is an attribute
    (column, field) which defines a relationship
    between two entities (tables)
  • The related rows are identified by the value in
    the foreign key column of one table matching the
    value in another column (usually the primary key)
    of another table
  • This allows one-to-one, one-to-many, many-to-one,
    and even many-to-many relationships
  • In a one-to-many relationship, the table with the
    many rows contains the foreign key and points
    to the primary key in the table with the one
    rows
  • The foreign key can be identified as such to
    the database management system within the
    definition of the table, or simply supplied
    on-the-fly during a query

15
Relational Databases
Keys
Primary Key
Foreign Key
One-to-One
Many-to-One
Putting a foreign key in the person table allows
multiple people to be related to a single
address.
Primary Key
Foreign Key
Foreign Key
The Phone_Numbers table has two foreign keys,
allowing phone numbers to be associated with
either a person or an address, or both.
Last name would not work as a primary key because
two people can have the same last name.
Primary Key
Since the phone number is unique, it can serve as
the primary key for this table.
16
Entity-Relationship Diagrams
  • A tool for designing relational databases
  • Does not contain actual data but rather names of
    the tables (entities) and columns (attributes) as
    well as a symbolic representation of the
    relationships between these
  • Many different formats exist

17
Entity-Relationship Diagram
Data View
Entity-Relationship View
18
Entity - Attribute - Value Data Model
Attributes
Entities
Values
19
Entity - Attribute - Value Data Model
  • EAV Model Advantages
  • When many of the possible attributes remain
    unvalued, takes less space in database
  • Each instance of the entity can have different
    attributes stored
  • Can accommodate attributes which one did not know
    about at the time the database was created
  • EAV Model Disadvantages
  • If most of the instances of an entity have values
    for a particular attribute, it is less efficient
    to use the EAV model since one must repeatedly
    store the entity instance identifiers and the
    attribute name
  • Data retrieval is less efficient
  • The attributes for a given entity instance will
    be scattered in the table
  • For aggregate data analysis, first have to find
    the desired attributes

20
Direct Program Access to Databases
Application Program
Application Program
Database
21
Disadvantage of Direct Program Access to Databases
  • Any modification to the type of data stored
    (e.g., number of fields) requires a complete
    rebuild of the entire database
  • Each application which accesses the data needs to
    know the details of the physical storage of the
    data
  • Applications need to be substantially modified if
    the structure of the data stored is altered
  • Applications need to manually build and
    maintain any desired indices

22
Database Management System
Application Program
Application Program
Program Routine
Program Routine
Program Routine
Program Routine
Database Management System (DBMS)
Database
23
Database Management Systems
DBMS
  • A DBMS is a type of middle-ware which sits
    between the applications and the database
  • All access to the database must go through the
    DBMS
  • Only the DBMS needs to know the details of the
    physical storage of the data
  • Application routines can treat the database in an
    abstract fashion, funneling all requests for
    data through the DBMS using a database language
  • If a common language is used by different DBMSs,
    a completely different database system can be
    substituted without requiring any modifications
    to the application routines

24
Client-Server Architecture
Client Computer
Client Computer
Client Computer
Application Program
Application Program
Application Program
Program Routine
Program Routine
Program Routine
Program Routine
Program Routine
Program Routine
DBMS Front End
DBMS Front End
DBMS Front End
Network
DBMS Back End
Database
Database Server
25
Multi-user Access to Large Databases
Most large databases need to be accessible to
multiple users at the same time
  • The DBMS is split into two components
  • The Front-end or Client resides on each
    users machine
  • Program routines submit requests to the DBMS
    front-end
  • The DBMS front-end communicates over the network
    with the DBMS back-end

Program
Program
Routine
Routine
Routine
Routine
Routine
Routine
DBMS Front-end
Server
DBMS Back-end
  • The Back-end sits on a separate database
    server, communicates with the database, and
    processes the requests of the front-end(s)
  • Multi-user access requires a more sophisticated
    DBMS

26
Database Abstraction
Concerns of the DBMS
Concerns of the Database Designer and the
Application
  • What type of data is stored in the database?
  • How are the fields in the database related to
    each other?
  • What data needs to be retrieved?
  • What data should be indexed, based upon how the
    data is routinely requested?
  • How and where is the data stored on disk?
  • How is free space in the database maintained?
  • What changes need to be made to an index to
    accommodate a change in the data?
  • What is the best way to find the data requested?

Physical Data Model
Conceptual Data Model
27
Functions of a Full Featured DBMS
  • Handle all the details of the physical storage of
    the data
  • Data Types
  • Maintain indices and keys
  • Support data addition, updating and deleting
  • Support changes in conceptual database design /
    definition
  • Client/Server distribution
  • Authorization / User Privileges / Security
  • Data Integrity
  • Referential Integrity
  • Data Concurrency
  • Data Recovery
  • Database Backup
  • Transaction Logs
  • Data Locking and Deadlocks
  • Support for a Query Language

28
Preserving Data Integrity in a Database
Referential Integrity
  • Making sure that the rows in a child table
    related to a row in a parent table are not
    orphaned by deletion of the row in the parent
    table

?
29
Preserving Data Integrity in a Database
Referential Integrity
  • Declarative Referential Integrity
  • Define Foreign Keys as such, indicating what
    table/primary key they reference
  • The DBMS will make sure that any row referenced
    by a foreign key in another table is not deleted
    attempts to do so generates an error
  • Advantage Developer does not need to do anything
    other than define the relationship (and check for
    errors)
  • Disadvantage Deleting a parent requires
    specifically deleting all the child rows first
  • Procedural Referential Integrity
  • Create Procedures which are executed whenever
    the contents of a table are changed (called
    Triggers can be defined for inserts, updates,
    and/or deletes)
  • Advantage Greater flexibility for developer to
    handle references
  • Disadvantage Developer has to make sure triggers
    properly address referential integrity issues
  • Example A delete trigger on the Address
    table automatically deletes any rows in the
    Person table which points to the row deleted
    from the Address table

30
Database Language
  • Applications need a language with which to
    communicate with databases
  • The database language must support
  • Data definition
  • Data retrieval
  • Data manipulation
  • Data integrity maintenance
  • Access control
  • Data sharing

31
SQL
(A Language for communicating with relational
databases)
  • SQL is not a complete programming language
  • Not as structured as C or Pascal
  • No IF, GOTO, DO, or FOR statements no variable
    declaration
  • Not procedural statements specify what, but
    not how
  • Statements resemble English sentences
  • Includes the use of optional noise words
  • Improves readability but does not alter
    functionality
  • Not a stand-alone product
  • SQL is a standard which is incorporated, to
    varying degrees, into commercially available
    products
  • eg DB2, Access, Oracle, Sybase, Informix, dBASE,
    SQL Server

32
Sample SQL Statement
Address add_id Street City State Zip a1 310 Cedar
St. New Haven CT 06510 a2 Disney
World Orlando FL 32300 a3 4 Privet Drive Little
Whinging Surrey a4 Overlook Hotel Sidewinder CO 8
0200
SELECT Person.Last, Person.First, Address.City,
Address.State FROM Person, Address WHERE
Person.add_id Address.add_id AND
Address.State "FL"
33
Features of SQL
  • Vendor independence (widespread vendor support)
  • Hardware/System independence
  • Relational basis
  • Dynamic data definition
  • Client/Server architecture
  • Direct interactive access without having to write
    programs to get at the data
  • Program level access

34
Data Mining
  • Data Mining is the process of trying to find
    new relationships in data not previously known or
    perhaps even suspected
  • Requires lots of data from multiple sources
  • Numerous approaches, often based predominantly on
    hit or miss queries
  • Uses neural networks, case-based reasoning,
    statistical analyses
  • Two approaches for aggregating data from multiple
    systems into a single database
  • Data Repository
  • Data Federation

35
Users
Users
Laboratory Medicine
Diagnostic Radiology
Diagnostic Radiology
Anatomic Pathology
Anatomic Pathology
Laboratory Medicine
36
Data Repository vs Data Federation
  • Data Repository
  • Data physically collected in one location
  • Structure can be optimized for data mining
  • Queries do not affect production/transactional
    environments
  • Data Federation
  • Data is always up-to-date
  • Fewer data ownership / political issues
  • Any given data source can withdraw from the
    federation at any time
  • Data inconsistencies can occur
  • More complex to set up and maintain
Write a Comment
User Comments (0)
About PowerShow.com