Title: DATABASES AND DATA WAREHOUSES
1CHAPTER 4
- DATABASES AND DATA WAREHOUSES
- A Gold Mine of Information
2Opening case Using database and data warehouse
instead of shopping carts
- Mervyn Co.
- Situations
- spend hours calculating
- Solutions
- Data warehouse---gtdata mining tool
- results
- 10 time gathering data , 90 acting
- look on line and see product info. by
units,dollars,single store, season,region, zone - less than 1 minute versus an hour
3Today, Organizations Need...
Introduction
4-2
- Information to compete effectively
- Information just to stay alive in the information
age - Information organized in such a way that you can
easily and quickly get to it - Information-processing tools that help you work
with information
4YOUR FOCUS IN THIS CHAPTER
Introduction
4-3
- The Difference Between Logical and Physical Views
of Information - Databases and Database Management Systems
- How You Can Develop Database Applications
- Data Warehouses and Data Mining Tools
5THREE THINGS ORGANIZATIONS DO WITH INFORMATION
Information Revisited
4-4
- 1.Process information in the form of transactions
- 2.Use information to make a decision
- 3.Manage information while its used
6PROCESSING INFORMATION IN THE FORM OF TRANSACTIONS
Information Revisited
4-5
- Such as payroll processing, order processing, and
handling your registration requests for classes. - This is called ONLINE TRANSACTION PROCESSING
(OLTP) - the gathering of input information,
processing that information, and updating
existing information to reflect the gathered and
processed information. - Operational databases and DBMS support OLTP.
7USING INFORMATION TO MAKE A DECISION
Information Revisited
4-6
- For answering such questions as How many
senior-level marketing majors have not taken
statistics? - This is called ONLINE ANALYTICAL PROCESSING
(OLAP) - the manipulation of information to
support decision making. - H-E-B Grocery Co.
- Data warehouses support OLAP.
8MANAGING INFORMATION WHILE ITS USED
Information Revisited
4-7
Bank of American 1986 15GB 5 times a day
2430 for each
2000 800GB 2000times daily 24
- Tasks of managing information
- Determining who can view or use information
- Specifying how to back up information
- Identifying what storage technologies to use
Most importantly, managing information includes
organizing it so that people can logically use it
without having to know anything about its
physical structure. The difference between
logical and physical is key.
9Information Revisited
4-8
- In managing information, physical deals with the
structure of information as it resides on various
storage media. - Logical deals with how knowledge workers view
their information needs, and includes such terms
as - CHARACTER - our smallest unit of information.
- FIELD - group of related characters.
- RECORD - group of related fields.
- FILE - group of related records.
- DATABASE - group of logically associated files.
- DATA WAREHOUSE - information from many databases.
10 DATABASE
Databases
4-9
a collection of information that you organize and
access according to the logical structure of that
information.
- A database is actually composed of two parts
- 1. the information itself
- the files that are logically associated
- 2. the logical structure of the information
- called the data dictionary
11A Database Is a Collection of Information
Databases
4-10
- Most databases contain two or more files with
related information. - The Inventory database (Figure 4.4, page 125)
contains two files - Part and Facility. - These two files are logically related because
parts are stored in facilities and because you
would use both of these files to manage your
inventory.
12A Database Contains a Logical Structure
Databases
4-11
- You organize and access a database by its logical
structure, not its physical position. - DATA DICTIONARY - contains the logical structure
of information in a database. - The data dictionary contains the logical
properties that describe information in a
database. - See Figure 4.5 (page 126) for the data dictionary
of the Percentage Markup field in the Inventory
database.
13A Database Has Logical Ties Among the Information
Databases
4-12
- A PRIMARY KEY is a field in a database file that
uniquely describes each record. - A FOREIGN KEY is a primary key of one file that
also appears in another file. So, foreign keys
specify how files are logically related. - For example, the Part and Facility files are
logically related. So, in Figure 4.4 you can see
that Facility Number (the primary key for the
Facility file) exists in the Part file (where
its a foreign key).
14A Database Contains Built-in Integrity Constraints
Databases
4-13
- An INTEGRITY CONSTRAINT is a rule that helps
assure the quality of the information in a
database. - A registration database at your school includes
integrity constraints concerning prerequisites
for certain classes. - Our Inventory database includes an integrity
constraint that says a part in the Part file
cannot be assigned to a facility that does not
exist in the Facility file. - Eg Ritz-Carlton guest preference database
15DATABASE MANAGEMENT SYSTEM (DBMS)
Database Management Systems
4-14
the software you use to specify the logical
organization for a database and access it.
- A DBMS contains 5 software components
- 1. DBMS engine
- 2. Data definition subsystem
- 3. Data manipulation subsystem
- 4. Application generation subsystem
- 5. Data administration subsystem
- see figure4.6
16 DBMS ENGINE--Logical to physical bridge most
important
DBMSs
4-15
accepts logical requests from the various other
DBMS subsystems, converts them to their physical
equivalent, and actually accesses the database
and data dictionary as they exist on a storage
device.
- Recall that
- PHYSICAL VIEW deals with how information is
physically arranged, stored, and accessed on some
type of secondary storage device. - LOGICAL VIEW focuses on how you need to arrange
and access information to meet your particular
business needs.
17DB and DBMS provide two advantages in separating
logical from physical view of info.
- DBMS handles all physical tasks---gtyou can
concentrate solely on your logical info. Needs - different knowledge worker logically view info.
in different ways
18DATA DEFINITION SUBSYSTEM-defining the logical
structure of a database
DBMSs
4-16
helps you create and maintain the data dictionary
and define the structure of the files in a
database.
- You use this subsystem to define the information
logical structure or properties when you first
create a database. - Once you created a database, you use this
subsystem to define new fields, delete fields, or
change field properties. - Figure 4.5 (page 126) contains this subsystem
screen for the Part file.
19Logical Structures(properties)
- Name
- type
- form
- default value
- validation rule
- Is an entry required?
- Can there be duplicates?
20DATA MANIPULATION SUBSYSTEM --Mining and changing
info. in a database
DBMSs
4-17
helps you add, change, and delete information in
a database and mine it for valuable information.
- This subsystem is most often the primary
interface between you as a user and the
information contained in a database. - Tools in this subsystem include views, report
generators, query-by-example tools, and
structured query language.
21DATA MANIPULATION TOOLS
DBMSs
4-18
- VIEW - allows you to see the content of a
database file, make whatever changes you want,
perform simple sorting, and query to find the
location of specific information. See Figure 4.7
page 137. - REPORT GENERATOR - helps you quickly define
formats of reports and what information you want
to see in a report. See Figures 4.8 and 4.9 page
138.
22DATA MANIPULATION TOOLS
DBMSs
4-19
- QUERY-BY-EXAMPLE (QBE) TOOL - helps you
graphically design the answer to a question.
Figure 4.10 (page 138) shows the QBE for
displaying the names and phone numbers of
facility managers in charge of parts that cost
more than 10. - STRUCTURED QUERY LANGUAGE (SQL) - a standardized
fourth-generation language found in most database
environments. SQL is the same as QBE, except
that you perform a query by creating a
statement(sentences-based) instead of pointing,
clicking, dragging(graphics-based).
23APPLICATION GENERATION SUBSYSTEM
DBMSs
4-20
contains facilities to help you develop
transaction-intensive applications. This
subsystem includes
- Tools for creating data entry screens (See Figure
4.12 page 139 for an example) - Programming languages specific to a particular
DBMS - Interfaces to commonly used programming languages
that are independent of any DBMS.
24DATA ADMINISTRATION SUBSYSTEM
DBMSs
4-21
helps you manage the overall database environment
by providing facilities for
- Backup and recovery
- Security management
- Query optimization
- Reorganization
- Concurrency control
- Change management
Team work Refining information privileges during
University Registration
Project on your own DBMS Support for OLTP,OLAP
and information management
25Which database model to adopt?
- hierarchical
- network
- relational --the most widely used
- object-oriented--the newest
26THE RELATIONAL DATABASE MODEL
Database Models
4-22
a database model that uses a series of
two-dimensional tables or files to store
information.
- This is the most popular model.
- Each table is called a RELATION.
- A relation contains information about a
particular ENTITY CLASS (a concept - people,
places, or things - about which you wish to store
information and that you can identify with a
unique key). Instance is an occurrence of an
entity class that can be uniquely described
27Database Models
4-23
- Figure 4.14 (page 144) shows a relational
database for a video rental store. - The entity classes are Customer, Video, Video
Rental, and Distributor. - Notice how these tables are related to each other
through the use of foreign keys. - In the Video Rental relation, youll find a
primary key that uses more than one one field to
create a unique description. This is called a
COMPOSITE PRIMARY KEY. - A primary key that uses only one field is called
an ATOMIC PRIMARY KEY. - Eg Palo Alto Childrens hospital
28THE OBJECT-ORIENTED (O-O) DATABASE MODEL
Database Models
4-24
a database model that brings together, stores,
and allows you to work with both information and
procedures that act on the information.
- An OBJECT-ORIENTED DATABASE MANAGEMENT SYSTEM
(O-O DBMS) is the DBMS software that allows you
to develop and work with an O-O database.
29Database Models
4-25
- This model takes advantage of the concept of an
OBJECT - a software module containing information
that describes an entity class along with a list
of procedures that can act on the information
describing the entity class. - Figure 4.15 (page 146) shows the same video
rental store using the O-O database model. - Notice that the objects (entity classes) - which
include Customer, Video Rental, Video, and
Distributor - contain both information and
procedures for working with that information. - Advantages(key features) See Appendix C for more
on objects - more closely models how an organizations works
- reuse
30DEVELOPING YOUR OWN DATABASE
Developing Databases
4-26
- Being able to develop your own database is a part
of knowledge worker computing. - Building a database for your personal needs
includes the following 4 steps - 1. Defining entity classes and primary keys
- 2. Defining relationships among entity classes
- 3. Defining information (fields) for each
relation - 4. Using a data definition language to create the
database
31- You own a small business and are interested in
tracking employees by the department in which
they work, job assignment, and the number of
hours assigned. - Each of your employees can be assigned to only
one department, but a department may have many
employees (a department, however, may not have
any employees assigned to it). Each employee can
be assigned to any number of jobs and a job can
have many employees assigned to it, but its not
necessary that any employees be assigned to a
certain job. - Follow along as we build the database to support
the report in Figure 4.16 on page 148.
321 - DEFINING ENTITY CLASSES AND PRIMARY
KEYS--the most important
Developing Databases
4-27
- From the report in Figure 4.16, you can identify
the entity classes as Employee, Department, and
Job. - Now, for each entity class, you must define a
primary key that provides a unique description.
These include - Employee entity class - Emp ID (e.g., 2345 for
Smith) - Department entity class - Dept (e.g., 15)
- Job entity class - Job (e.g., 14 for Acct)
332 - DEFINING RELATIONSHIPS AMONG ENTITY CLASSES
Developing Databases
4-28
- For this step, use an ENTITY-RELATIONSHIP (E-R)
DIAGRAM, a graphical method of representing
entity classes and their relationships. - See Figure 4.17 (page 148) for the initial E-R
diagram of our database and a listing of E-R
diagram symbols.
34Developing Databases
4-29
EMPLOYEE
DEPARTMENT
M1
- An Employee must be assigned to a Department.
- An Employee cannot be assigned to more than one
Department. - A Department may have many Employees assigned to
it. - A Department is not required to have any
Employees assigned to it.
35Developing Databases
4-30
- After building the initial E-R diagram, you must
follow the process of normalization. - NORMALIZATION is a process of assuring that a
relational database structure can be implemented
as a series of two-dimensional tables. - Normalization includes the following 3 steps
- 1.Eliminate repeating groups or MM relationships
- 2.Assure that each field in a relation depends
only on the primary key of that relation - 3.Remove all derived fields from the relations.
36Developing Databases
4-31
- The first rule of normalization states that no
MM relationships can exist. - There is an MM between Employee and Job.
- Find repeating ways in two ways.
- You eliminate this by creating an INTERSECTION
RELATION - a relation you create to eliminate a
repeating group. - An intersection relation will have a composite
primary key that consists of the primary key
fields from the two intersecting relations. - In Figure 4.18 (page 150), we created an
intersection relation called Employee-Job to
eliminate the MM relationship.
Teamwork Building an E-R Diagram for the video
rental store
373 - DEFINING INFORMATION (FIELDS) FOR EACH
RELATION
Developing Databases
4-32
- In this step, you follow rules 2 and 3 of
normalization. - Your goal here is two-fold
- 1.Make sure that the information in each relation
is indeed in the correct relation - 2.Make sure that the information cannot be
derived from other information.
38Developing Databases
4-33
- To determine if information is in the correct
relation, ask - does this piece of information depend only on the
primary key for this relation? - If the answer is yes, the information is in the
correct relation. - In the Employee relation (Figure 4.20 page 152),
we currently store Dept Sup. Does Dept Sup
depend only on Emp ID? - The answer is no - Dept Sup depends on Dept, so
it should be in the Department relation.
39Developing Databases
4-34
- Derived information - information that can be
mathematically determined from other information
- should not be stored in your database. - For example, Emp is a field in the Department
relation. - However, we can simply count the number of
occurrences of each Dept in the Employee relation
and determine the number of employees. - So, we remove Emp from the database.
404 - USING A DATA DEFINITION LANGUAGE TO CREATE
THE DATABASE
Developing Databases
4-35
- The final step is to actually create the
relations you identified in steps 1-3. - You do this with a data definition language.
- This step includes
- Developing a data dictionary
- Defining the various relations
- Defining primary keys and relationships
41(No Transcript)
42(No Transcript)
43DATA WAREHOUSE and DATA MINING TOOLS
- question such as
- how many size 8 shoes did we sale last month
in Southeast and Southwest region,compared with
the same months over the last 5 years? - For your organization to succeed (to
survive),users must have - 1) a way to easily develop the logical structure
of such questions - 2) the needed information presented to them
quickly without sacrificing the speed of
operational systems and databases
44 DATA WAREHOUSE
Data Warehouses
4-36
a logical collection of information - gathered
from many different operational databases - that
supports business analysis activities and
decision-making tasks. Data warehouses
- are a logical extension of databases
- support OLAP
- are among the newest and hottest buzz words and
concepts in the IT field. - Represent a different way of thinking about
organizing and managing information
45DATA WAREHOUSE FEATURES
Data Warehouses
4-37
- Data warehouses combine information from
different databases (See Figure 4.22 ) - Data warehouses are multi-dimensional
- As opposed to 2 dimensions in the relational
model - containing lays of columns and rows,multidimension
al representation of information is Often
called hyper cubes (See Figure 4.23) summary of
information, - a data dictionary maintain the logical structure
of information and two important
characteristics(origin and method) - Data warehouses support decision making
- While databases support OLTP, data warehouses
support OLAP
46 DATA MINING TOOLS
Data Warehouses
4-38
the software tools you use to query information
in a data warehouse.
- Data mining tools includes(see figure 4.24)
- QUERY-AND-REPORTING TOOLS - QBE tools, SQL, and
report generators. - INTELLIGENT AGENTS - various artificial
intelligence tools that form the basis for
information discovery in OLAP. - MULTIDIMENSIONAL ANALYSIS (MDA) TOOLS -
slice-and-dice techniques that allow you to view
multidimensional information from different
perspectives. - Eg North Memorial Medical Center
---ForestTrees
47IMPORTANT CONSIDERATIONS IN USING A DATA WAREHOUSE
Data Warehouses
4-39
- Do you need a data warehouse?
- Expense,no need(ROLAP),support-strapped..
- Do you already have a data warehouse?
- Who will the users be?
- Mastercard
- How up-to-date must the information be?
- What data mining tools do you need?
48MANAGING THE INFORMATION RESOURCE
Managing Information
4-40
- How will changes in technology affect organizing
and managing information? - Environment Canadas Ice Services
- What types of database models and databases are
most appropriate? - Functional requirements types of database
- Who should oversee the organizations
information? - CEOData administration database
administration
49OVERSEEING YOUR ORGANIZATIONs INFORMATION
Managing Information
4-41
- CHIEF INFORMATION OFFICER (CIO) is the IT
manager who directs all IT systems and personnel
while communicating directly with the highest
levels of the organization. - DATA ADMINISTRATION plans for, oversees the
development of, and monitors the information
resource. - DATABASE ADMINISTRATION is responsible for the
more technical and operational aspects of
managing information in databases.
50MANAGING THE INFORMATION RESOURCE
Managing Information
4-42
- Is information ownership a consideration?
- What are the ethics involved in organizing and
managing information? - How should databases and database applications be
developed and maintained?
51TO SUMMARIZE
4-43
- How we view information
- The physical view of information deals with how
information is physically arranged, stored, and
accessed on some type of secondary storage
device. - The logical view of information focuses on how
you need to arrange and access information to
meet your particular business needs. - A database is a collection of information that
you organize and access according to the logical
structure of that information. - The data dictionary contains the logical
structure of information in a database.
52TO SUMMARIZE
4-44
- A database management system is the software you
use to specify the logical organization for a
database and access it. - Popular database models include the relational
model and the object-oriented model. - The four steps of developing a personal database
application include - 1. Define entity classes and primary keys
- 2. Define relationships among entity classes
- 3. Define information (fields) for each relation
- 4. Use a data definition language to create the
database
53TO SUMMARIZE
4-45
- Data warehouses are a logical collection of
information - gathered from many different
operational databases - that supports business
analysis activities and decision-making tasks. - Data mining tools - the software tools you use to
query information in a data warehouse - include
query-and-reporting tools, intelligent agents,
and multidimensional analysis (MDA) tools.