Introduction to Database Modeling and Design - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to Database Modeling and Design

Description:

Transparencies for Chapter 1 of textbook Database Systems: A Practical Approach to Design, Implementation and Management – PowerPoint PPT presentation

Number of Views:369
Avg rating:3.0/5.0
Slides: 113
Provided by: circusofl
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Database Modeling and Design


1
Chapter 1
  • Introduction to Database Modeling and Design

1
2
Introduction to Database Modeling and Design
  • Portions of the Materials at this website
    subject-
  • Database Modeling and Design -are drawn from the
  • Textbooks below
  • Database Modeling and Design, 4th Edition, 2006
  • Authors Toby Teorey, Sam Lightstone, and Tom
    Nadeau
  • Publisher Morgan Kaufmann
  • Database Systems, 2nd Edition
  • Authors Thomas Connolly and Carolyn Begg
  • Publisher Addison Wesley
  •  

3
Road Map for Learning By Subject
Introduction
Chapters 1
Chapter 2
The Entity_Relationship Model
Chapters 3
The Unified Modeling Language
Chapters 6

Chapter 4
Reqts Analysis and Conceptual DM
Normalization
Chapters 5
Transforming the Conceptual Data Model to SQL
Chapter 8
Chapters 7
Business Intelligence
Logical data Modeling Physical Data Modeling
Chapter 9
CASE Tools for Logical Database Design
4
Chapter 1 Topics
  • Data and Database Management
  • The Database Life Cycle
  • Conceptual Data Modeling

5
Data and Information
  • Were told we live in the information age.
  • People often talk about data and information as
    if there were the same. They are, in many
    regards, opposite.
  • A datum is just a factyour name is a fact, your
    phone number is a fact.
  • Information is data that is presented in a
    meaningful, understandable. beneficial format.
    Information is data that has been organized ,
    sequenced, correlated and summarized, such as a
    phone book.

6
Data and Information
  • A phone book is information. It not only contains
    names and phone numbers, but it correctly
    associates each persons phone number with their
    names. It presents this list of correlated names
    and phone numbers in alphabetical sequence, so
    that we find the phone number from the name. In
    addition, it divides the phone numbers into two
    types personal and business.
  • It is the function of the computer to convert
    data to information.

7
Definitions
  • Database The database is a place where you put
    your data data that you wish to convert to
    information at some future time.
  • Database Management System A DBMS is the
    software that converts the data in your database
    to information. It is the DBMS that provides you
    the capability for cross-referencing,
    correlating, sorting, summarizing, etc.

8
Information as A Competitive Weapon
  • Information technology and quality information
    are not
  • the goals, but merely to support organizations to
    reach
  • goals of
  • Superior products and services
  • Greater productivity
  • Eventually success

9
Data, Information, and Decision
  • Data
  • Information (Data Process)
  • Knowledge
  • Decision (Information
  • Knowledge)
  • Data/Information/Decision
  • Data Resource Management (DRM)
  • MIS (OLTP) OOAD
  • KM (Knowledge Mgt), KWS (Knowledge Work Systems)
  • DSS ESS, EIS (Executive Level)
  • Data Warehousing/Data Mart/Data Mining/OLAP
    (Executive, Collaborative and individual levels)

10
Data, Information, and Decision
  • Data Data
    processing
  • Processing
    System Analysis/Design
  • Information MIS,
    Database Systems
  • Object (DataProcessing) Object-Oriented
    SD/DA
  • Knowledge
    Artificial Intelligence
  • Information
    Expert system
  • Decision (executive level) DSS, EIS
  • Decision (all levels, sophisticated) Data
    warehousing

  • Data Mining

11
Databases Modeling
Databases Modeling
Type of Database
New Trend
Constructs
Characteristics
Relational Database
ERD EER
Row/ Column
Dimensional Modeling
Multi-Dimensional Database
OLAP DW
Cube
Distributed Component Object Model
Distributed Database
Client Object (DCOM)
XML

Object-Oriented Database
UML
Object
Class Diagram
Object Data Operations(Services)
Entity Data only
12
Topic 1
Data and Database Management
13
Relational Model Terminology
  • A relation is a table with columns and rows.
  • Only applies to logical structure of the
    database, not the physical structure.
  • Attribute is a named column of a relation.
  • Domain is a set of allowable values for one or
    more attributes.

14
Relational Model Terminology
  • Tuple is a row of a relation.
  • Degree is a number of attributes in a relation.
  • Cardinality is a number of tuples in a relation.
  • Relational Database is a collection of normalized
    relations. (?)

15
Alternative Terminology for Relational Model
16
Instances of Branch and Staff (part) Relations
17
Examples of Attribute Domains
18
Data and Database Management
  • The characteristics of file-based systems and the
    problems with the file-based approach.
  • The meaning of the term Database Management
    System (DBMS).
  • The typical functions of a DBMS and the major
    components of the DBMS environment.
  • Multi-User DBMS Architectures.
  • The personnel involved in the DBMS environment.
  • The history of the development of DBMSs.
  • The advantages and disadvantages of DBMSs

19
File-based Systems
  • A collection of application programs that perform
    services for the end users (e.g. reports).
  • Each program defines and manages its own data.

5
20
File-based Processing
6
21
Limitations of File-based Approach
  • Separation and isolation of data
  • Each program maintains its own set of data. Users
    of one program may be unaware of potentially
    useful data held by other programs.
  • Duplication of data
  • Same data is held by different programs. Wasted
    space and potentially different values and/or
    different formats for the same item.

7
22
Limitations of File-based Approach
  • Data dependence
  • File structure is defined in the program code.
  • Incompatible file formats
  • Programs are written in different languages, and
    so cannot easily access each others files.

8
23
Limitations of File-based Approach
  • Fixed Queries/Proliferation of application
    programs
  • Programs are written to satisfy particular
    functions. Any new requirement needs a new
    program.

9
24
Database Approach
  • Arose because
  • Definition of data was embedded in application
    programs, rather than being stored separately and
    independently.
  • No control over access and manipulation of data
    beyond that imposed by application programs.
  • Result - the database and Database Management
    System (DBMS).

10
25
Database
  • A shared collection of logically related data
    (and a description of this data), designed to
    meet the information needs of an organization.
  • System catalog (data dictionary or metadata)
    provides the description of the data to enable
    programdata independence.
  • Logically related data comprises entities,
    attributes, and relationships of an
    organization's information.

11
26
Database Management System (DBMS)
  • A software system that enables users to define,
    create, and maintain the database and which
    provides controlled access to this database.

12
27
Database Management System (DBMS)
13
28
Database Approach
  • Data definition language (DDL).
  • Permits specification of data types, structures
    and any data constraints. All specifications are
    stored in the database.
  • Data manipulation language (DML).
  • General enquiry facility (query language) of the
    data.

14
29
DDL Example
  • ALTER TABLE STAFF
  • ADD FOREIGN KEY (Branch_No)
  • REFERENCES Branch

30
DML Example
  • Select
  • From property_for_rent
  • Where type Flat and rent gt 400

31
Database Approach
  • Controlled access to database may include
  • A security system.
  • An integrity system.
  • A concurrency control system.
  • A recovery control system.
  • A user-accessible catalog.
  • A view mechanism.
  • Provides users with only the data they want or
    need to use.

15
32
Components of DBMS Environment
16
33
Components of DBMS Environment
  • Hardware
  • Can range from a PC to a network of computers.
  • Software
  • DBMS, operating system, network software (if
    necessary) and also the application programs.
  • Data
  • Used by the organization and a description of
    this data called the schema.

17
34
Objectives of Three-Level Architecture
  • All users should be able to access same data.
  • A user's view is immune to changes made in other
    views.
  • Users should not need to know physical database
    storage details.

35
Objectives of Three-Level Architecture Proposed
by ANSI-SPARC
  • DBA should be able to change database storage
    structures without affecting the users' views.
  • Internal structure of database should be
    unaffected by changes to physical aspects of
    storage.
  • DBA should be able to change conceptual structure
    of database without affecting all users.

ANSI-SPARC ( American National Standards
Institute Standards Planning and Requirements
Committee)
36
ANSI-SPARC Three-level Architecture
37
ANSI-SPARC Three-level Architecture
  • External Level
  • Users' view of the database. Describes that part
    of database that is relevant to a particular
    user.
  • Conceptual Level
  • Community view of the database. Describes what
    data is stored in database and relationships
    among the data.

38
ANSI-SPARC Three-level Architecture
  • Internal Level
  • Physical representation of the database on the
    computer. Describes how the data is stored in
    the database.

39
An example of the three levels
  • External level
  • 01 employee,
  • 02 employee_city pic
    x(20),
  • 02 employee_state
    pic x(2).

  • conceptual level
  • employee
  • employee_city
    varchar(20)
  • employee_state
    char(2)
  • Internal Level
  • employee
  • employee_city varchar(20) NULl
  • employee_state
    char(2) NOT NULL

40
Data Independence
  • Logical Data Independence
  • Refers to immunity of external schemas to changes
    in conceptual schema.
  • Conceptual schema changes e.g. addition/removal
    of entities.
  • Should not require changes to external schema or
    rewrites of application programs.

41
Data Independence
  • Physical Data Independence
  • Refers to immunity of conceptual schema to
    changes in the internal schema.
  • Internal schema changes e.g. using different file
    organizations, storage structures/devices.
  • Should not require change to conceptual or
    external schemas.

42
Data Independence and the ANSI-SPARC Three-level
Architecture
43
Functions of a DBMS
  • Data Storage, Retrieval and Update.
  • Must furnish users with the ability to store,
    retrieve, and update data in the database.
  • A User-Accessible Catalog.
  • Must furnish a catalog in which descriptions of
    data items are stored and which is accessible to
    users.

44
Functions of a DBMS
  • Transaction Support
  • Must furnish a mechanism to ensure that either
    all the updates corresponding to a given
    transaction are made or that none of them are
    made.
  • Concurrency Control Services
  • Must furnish a mechanism to ensure that database
    is updated correctly when multiple users are
    updating the database concurrently.

45
Functions of a DBMS
  • Recovery Services
  • Must furnish a mechanism for recovering the
    database in the event that the database is
    damaged in any way.
  • Authorization Services
  • Must furnish a mechanism to ensure that only
    authorized users can access the database.

46
Functions of a DBMS
  • Support for Data Communication
  • Must be capable of integrating with communication
    software.
  • Integrity Services
  • Must furnish a means to ensure that both the data
    in the database and changes to the data follow
    certain rules.

47
Functions of a DBMS
  • Services to Promote Data Independence
  • Must include facilities to support the
    independence of programs from the actual
    structure of the database.
  • Utility Services
  • Should provide a set of utility services.

48
Components of a DBMS
49
Components of a DBMS
  • Query processor
  • Database manager (DM)
  • File manager
  • DML preprocessor
  • DDL compiler
  • Catalog manager

50
Components of Database Manager (DM)
51
Components of Database Manager (DM)
  • Authorization control
  • Command processor
  • Integrity checker
  • Query optimizer
  • Transaction manager
  • Scheduler
  • Recovery manager
  • Buffer manager

52
Catalog Manager (System Catalog)
  • A repository of information (metadata) describing
    the data in the database.
  • Typically stores
  • Names of authorized users.
  • Names of data items in the database.
  • Constraints on each data item.
  • Data items accessible by a user and the type of
    access.

53
Catalog Manager (System Catalog)
  • It is used by modules such as
  • Authorization Control.
  • Integrity Checker.

54
Multi-User DBMS Architectures
  • Teleprocessing
  • File-server
  • Client-server

55
Teleprocessing
  • Traditional architecture.
  • Single mainframe with a number of terminals
    attached.
  • Trend is now towards downsizing.

56
Teleprocessing Topology
57
File-server
  • File-server is connected to several workstations
    across a network.
  • Database resides on file-server.
  • DBMS and applications run on each workstation.

58
File-server
  • Disadvantages include
  • Significant network traffic.
  • Copy of DBMS on each workstation.
  • Concurrency, recovery and integrity control more
    complex.

59
File-server Architecture
60
Client-server
  • Server holds the database and the DBMS.
  • Client manages the user interface and runs
    applications.

61
Client-server
  • Advantages include
  • Wider access to existing databases.
  • Increased performance.
  • Possible reduction in hardware costs.
  • Reduction in communication costs.
  • Increased consistency.

62
Roles in the Database Environment
  • Data Administrator (DA)
  • Database Administrator (DBA)
  • Database Designers (Logical and Physical)
  • Application Programmers
  • End Users (native and sophisticated)

19
63
History of Database Systems
  • First-generation
  • Hierarchical (IMS) and Network (Nomad)
  • Second generation
  • Relational
  • Third generation
  • Extended Relational
  • Object-Oriented

20
64
Advantages of DBMS
  • Control of data redundancy.
  • Data consistency
  • More information from the same amount of data.
  • Sharing of data.
  • Improved data integrity.
  • Improved security.
  • Enforcement of standards.
  • Economy of scale.

21
65
Advantages of DBMS
  • Balanced conflicting requirements
  • Improved data accessibility and responsiveness
  • Increased productivity
  • Improved maintenance through data independence
  • Increased concurrency
  • Improved backup and recovery services

22
66
Disadvantages of DBMS
  • Complexity
  • Size
  • Cost of DBMS
  • Additional hardware costs
  • Cost of conversion
  • Performance
  • Higher impact of a failure

23
67
Topic 2
The Database Life Cycle
68
System Development Life Cycle A brief overview
  • It is a systematic approach to solving business
  • problem. Its divided into seven phases
  • Identifying problems, opportunities, and
    objectives
  • Determining system requirements
  • Analyzing system needs
  • Designing the recommended systems
  • Developing and documenting software
  • Testing and maintaining the system
  • Implementing and evaluating the systems

69
System Development Life Cycle A brief overview
70
System Development Life Cycle A brief overview
71
The Database Life Cycle
  • The database life cycle incorporates the basic
    steps involved in
  • designing a global schema of the logical
    database, allocating data
  • across a computer network, and defining local
    DBMS-specific
  • schemas
  • Requirements Analysis (chapter 4)
  • Conceptual data modeling (chapter 4 )
  • Logical Design (chapters 4, 5 and 6 )
  • View integration (chapter 4 )
  • Transformation of the CDM to SQL tables
    (chapter 5)
  • Normalization of tables (chapter 6)
  • Physical Design (chapter 7 )
  • Database Implementation, Monitoring, and
    Modification

72
Definition of The Business Requirements
  • The definition of requirements is the users
    statement of how he or she wants to do business,
    and the information required to support his or
    her new methods of operations.

73
Definition of The Business Requirements
  • The requirements can be broadly divided into two
    areas
  • 1 Data requirementsthe data required for
    processing, and the natural data relationships.
  • 2 The software platform for the database
    implementation these are the limitations and
    demands imposed upon the computing solutions
    such as architectural plan, data storage
    specifications and information system performance
    expectations.

74
Data Requirements
Retail Salesperson View
Customer View
Customer
Salespersons
Customers

Order Unit
Sale Unit
Orders
Products
Orders
Airline Company
Supermarket Chain
Supermarket Chain
Customer
Time
Promotion
Time
Sale Unit
Flight
Frequent Flyer Flights
Store
Product
Fare Class
Airport
Status
75
Data Model
  • Collection of concepts for describing data,
    relationships between data and constraints on the
    data in an organization.
  • Data Model comprises
  • A structural part (a set of rules according to
  • which databases can be constructed)
  • A manipulative part (defining the types of
    operations that are allowed on the data)
  • Possibly a set of integrity rules

76
Data Model
  • Purpose
  • To represent the data in an understandable way.
  • Categories of data models include
  • Conceptual
  • Logical (Functional)
  • Physical

77
Levels of Modeling
Descriptive The dealer sold 200 cars last month.
Operational
Primarily Two Dimensional Database System
(OLTP)
Explanatory For every increase in 1 in the
interest, auto sales decrease by 5 .
Star Schema Cube
Traditional DW
(OLAP)
Predictive predictions about future buyer
behavior.
Cube sophisticated analytical tools
Data Mining
78
Levels of Modeling
Explanatory WHAT IF PROCESSING ANALYZE
WHAT HAS PREVIOUSLY OCCURRED TO BRING ABOUT
THE CURRENT STATE OF THE DATA
Predictive
Descriptive SIMPLE QUERIES REPORTS
DETERMINE IF ANY PATTERNS EXIST BY
REVIEWING DATA RELATIONSHIPS
Statistical Analysis/Expert System/ Artificial
Intelligence
Normalized Tables

Dimensional Tables
Classification Value Prediction
Roll-up Drill Down
Query
79
Descriptive Modeling
  • Conceptual Data Model (Analysis - Requirements
    Gathering Whats it?)
  • Logical Data Model (Design-How is it?)
  • Physical Data Model (Implementation)

80
Explanatory Modeling
  • Also called Dimensional Modelling (to be
    discussed in chapter 8)
  • Ways to derive the database component of a data
    warehouse
  • Every dimensional model (DM) is composed of one
    table with a composite primary key, called the
    fact table, and a set of smaller tables called
    dimension tables.

81
Predictive Modeling
  • Similar to the human learning experience
  • Uses observations to form a model of the
    important characteristics of some phenomenon.
  • Uses generalizations of real world and ability
    to fit new data into a general framework.
  • Can analyze a database to determine essential
    characteristics (model) about the data set.

82
Conceptual Data Modeling
Retail Salesperson View
Product
Customer
Orders
Product_No
Product_Name Product_Qty
Customer_no Customer_name Customer_address
N
N
N
N
1
Served-by
Sold-by
N
Salesperson_no Salesperson_name
Salesperson
83
Conceptual Data Modeling
Customer View
Order
Customer
Places
Order_no
Oredr_date Order_Qty
Customer_no Customer_name Customer_address
1
N
84
Use Hotel Case For Illustration-Customer View
Room
Guest
Room_no Type Price
Guest_no Guest_name Guest_address
Books
85
Use Hotel Case For Illustration-Hotel Front-Desk
Clerk View
Room
Guest
Room_no
Guest_no Guest_name Guest_address
Registered-by
Assigned-by
Clerk_No
Hotel Clerk
86
Use Hotel Case For Illustration-View Integration
Booking
Guest
Hotel_no Guest_no Date_from Date_to Room_no
Guest_no Guest_name Guest_address
Hotel
Room
Hotel_No Hotel_name City
Room_no Hotel_no Type Price
11
1M
Identifying Relationship
Dependent Entity (Attribute Entity)
87
View Integration From Concept of
Specialization/Generalization
  • Generalization
  • The process of minimizing the differences between
  • entities by identifying their common features.
  • Specialization
  • the reverse of generalization, is an
    inversion of the same concept.

88
Specialization/Generalization
Customers
Order
Aircraft
Generalization
Specialization
Commercial
Military
B52
B-1B
747
777
89
View Integration From Concept of Subtype and
Super-type
Faculty
Full Time Faculty
Part Time Faculty
Part (day) Time Faculty
Part (Night)Time Faculty
90
View Integration Objectives
  • How to merge data models based on specific user
    views into a global logical data model of the
    enterprise.
  • How to ensure that the resultant global model is
    a true and accurate representation of the part of
    the enterprise we are attempting to model.

91
Transformation of the CDM to SQL Tables
  • Objective of this step is to build a logical data
    model based on the conceptual data model of the
    users view of the enterprise and then to
    validate this model using the technique of
    normalization and against the required
    transactions.

92
Transformation of the CDM to SQL Tables
  • To refine the local conceptual data model to
    remove undesirable features and to map this
    model to a local logical data model. This
    involves
  • (1) Remove MN relationships.
  • (2) Remove complex relationships.
  • (3) Remove recursive relationships.
  • (4) Remove relationships with attributes.
  • (5) Remove multi-valued attributes.
  • (6) Re-examine 11 relationships.
  • (7) Remove redundant relationships.

93
Removing Advertises MN Relationship
94
Removing Leases Complex Relationship
95
Normalization
  • A technique for producing a set of relations with
    desirable properties, given the data requirements
    of an enterprise.
  • Developed by E.F. Codd (1972).
  • Often performed as a series of tests on a
    relation to determine whether it satisfies or
    violates the requirements of a given normal form.

96
Normalization
  • Four most commonly used normal forms are first
    (1NF), second (2NF), third (3NF) and Boyce-Codd
    (BCNF) normal forms.
  • Based on functional dependencies among the
    attributes of a relation.
  • A relation can be normalized to a specific form
    to prevent the possible occurrence of update
    anomalies.

97
Data Redundancy
  • Major aim of relational database design is to
    group attributes into relations to minimize data
    redundancy and reduce file storage space required
    by base relations.
  • Problems associated with data redundancy are
    illustrated by comparing the Staff and Branch
    relations with the Staff_Branch relation.

98
Data Redundancy
11
99
Data Redundancy
  • Staff_Branch relation has redundant data the
    details of a branch are repeated for every member
    of staff.
  • In contrast, the branch information appears only
    once for each branch in the Branch relation and
    only the branch number (Branch_No) is repeated in
    the Staff relation, to represent where each
    member of staff is located.

100
Update Anomalies
  • Relations that contain redundant information may
    potentially suffer from update anomalies.
  • Types of update anomalies include
  • Insertion
  • Deletion
  • Modification

101
Functional Dependency
  • Main concept associated with normalization.
  • Functional Dependency
  • Describes the relationship between attributes in
    a relation.
  • For example, if A and B are attributes of
    relation R, B is functionally dependent on A
    (denoted A Ù B), if each value of A in R is
    associated with exactly one value of B in R.

102
The Process of Normalization
  • Formal technique for analyzing a relation based
    on its primary key and the functional
    dependencies between the attributes of that
    relation.
  • Often executed as a series of steps. Each step
    corresponds to a specific normal form, which has
    known properties.

103
The Process of Normalization
  • As normalization proceeds, the relations become
    progressively more restricted (stronger) in
    format and also less vulnerable to update
    anomalies.

104
Topic 3
Conceptual Data Modeling
105
Conceptual Data Modeling
  • The process of developing a conceptual data model
    that is
  • a complete and accurate representation of an
    organization's data requirements.
  • independent of implementation details.

106
Conceptual Data Modeling What?
  • What is it?
  • 1.  It is a conceptual representation of data
    without concern for its logical (functional) or
    physical aspects.
  • 2. It is a set of high-level business data
    models which provides a framework for the data
    modeling activities at the next level.

107
Conceptual Data Modeling When?
  • When should it be done?
  • 1. In support of the data requirements of a
    process model under development at the
    corresponding level.
  • or
  • 2. Outside the system application lifecycle
    on a department, division, or company wide basis.

108
Conceptual Data Modeling Who?
  • Who should do it?
  • 1. The group responsible for assuring that
    data structure reflects business policies and
    rules.
  • 2. It should be a joint effort between the
    owners and custodians of the data, the users of
    the data, and the analysts.

109
Conceptual Data Modeling Why?
  • Documents the type of data (information) which
    must be represented in a system independent of
    specific application, organizations, or
    technology.
  • Maximizes data sharing minimizing redundancy.
  • Provides foundations for physical database
    design.
  • Describes the unique business enterprise
    specifically.

110
Conceptual Data Modeling Why?
  • Outside of application life cycle on a
    company-wide basis.
  • Data modeling expresses inherent associations
    which are the most part, independent of anyone
    one application.
  • Data entities change very little even through the
    way they are used can change for each
    application.
  • A complete maintained conceptual data model
    should shorten the requirements definition phases
    of system development life cycle.

111
Data Modeling Approach
  • Data partitioning
  • Use a top-down approach to define the data
    requirements of a system. The purpose is to
    divide and conquer (from subject to entity), and
    to evolve from the conceptual level to logical
    level until physical database is derived.
  • Standard deliverables
  • For each of the levels, there is a set of
    standard deliverables that must be produced. The
    documentation items must be well defined so that
    the data at each level is well understood.

112
Data Partitioning Via Modeling
(How)
(What, Why, Who, Where)
Subjects
Conceptual Level
Technical considerations
Entities
Relationships
Logical Level
Data Elements
Frequencies
Physical Level
Data Definition Language -DDL(create, Alter, drop
tables)
Data Manipulation Language (select, insert,
delete, update)
Write a Comment
User Comments (0)
About PowerShow.com