DATABASES AND DATA WAREHOUSES - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

DATABASES AND DATA WAREHOUSES

Description:

This is called ONLINE TRANSACTION PROCESSING (OLTP) - the gathering of input ... DATA DICTIONARY - contains the logical structure of information in a database. ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 54
Provided by: steph248
Category:

less

Transcript and Presenter's Notes

Title: DATABASES AND DATA WAREHOUSES


1
CHAPTER 4
  • DATABASES AND DATA WAREHOUSES
  • A Gold Mine of Information

2
Opening case Using database and data warehouse
instead of shopping carts
  • Mervyn Co.
  • Situations
  • spend hours calculating
  • Solutions
  • Data warehouse---gtdata mining tool
  • results
  • 10 time gathering data , 90 acting
  • look on line and see product info. by
    units,dollars,single store, season,region, zone
  • less than 1 minute versus an hour

3
Today, Organizations Need...
Introduction
4-2
  • Information to compete effectively
  • Information just to stay alive in the information
    age
  • Information organized in such a way that you can
    easily and quickly get to it
  • Information-processing tools that help you work
    with information

4
YOUR FOCUS IN THIS CHAPTER
Introduction
4-3
  • The Difference Between Logical and Physical Views
    of Information
  • Databases and Database Management Systems
  • How You Can Develop Database Applications
  • Data Warehouses and Data Mining Tools

5
THREE THINGS ORGANIZATIONS DO WITH INFORMATION
Information Revisited
4-4
  • 1.Process information in the form of transactions
  • 2.Use information to make a decision
  • 3.Manage information while its used

6
PROCESSING INFORMATION IN THE FORM OF TRANSACTIONS
Information Revisited
4-5
  • Such as payroll processing, order processing, and
    handling your registration requests for classes.
  • This is called ONLINE TRANSACTION PROCESSING
    (OLTP) - the gathering of input information,
    processing that information, and updating
    existing information to reflect the gathered and
    processed information.
  • Operational databases and DBMS support OLTP.

7
USING INFORMATION TO MAKE A DECISION
Information Revisited
4-6
  • For answering such questions as How many
    senior-level marketing majors have not taken
    statistics?
  • This is called ONLINE ANALYTICAL PROCESSING
    (OLAP) - the manipulation of information to
    support decision making.
  • H-E-B Grocery Co.
  • Data warehouses support OLAP.

8
MANAGING INFORMATION WHILE ITS USED
Information Revisited
4-7
Bank of American 1986 15GB 5 times a day
2430 for each
2000 800GB 2000times daily 24
  • Tasks of managing information
  • Determining who can view or use information
  • Specifying how to back up information
  • Identifying what storage technologies to use

Most importantly, managing information includes
organizing it so that people can logically use it
without having to know anything about its
physical structure. The difference between
logical and physical is key.
9
Information Revisited
4-8
  • In managing information, physical deals with the
    structure of information as it resides on various
    storage media.
  • Logical deals with how knowledge workers view
    their information needs, and includes such terms
    as
  • CHARACTER - our smallest unit of information.
  • FIELD - group of related characters.
  • RECORD - group of related fields.
  • FILE - group of related records.
  • DATABASE - group of logically associated files.
  • DATA WAREHOUSE - information from many databases.

10
DATABASE
Databases
4-9
a collection of information that you organize and
access according to the logical structure of that
information.
  • A database is actually composed of two parts
  • 1. the information itself
  • the files that are logically associated
  • 2. the logical structure of the information
  • called the data dictionary

11
A Database Is a Collection of Information
Databases
4-10
  • Most databases contain two or more files with
    related information.
  • The Inventory database (Figure 4.4, page 125)
    contains two files - Part and Facility.
  • These two files are logically related because
    parts are stored in facilities and because you
    would use both of these files to manage your
    inventory.

12
A Database Contains a Logical Structure
Databases
4-11
  • You organize and access a database by its logical
    structure, not its physical position.
  • DATA DICTIONARY - contains the logical structure
    of information in a database.
  • The data dictionary contains the logical
    properties that describe information in a
    database.
  • See Figure 4.5 (page 126) for the data dictionary
    of the Percentage Markup field in the Inventory
    database.

13
A Database Has Logical Ties Among the Information
Databases
4-12
  • A PRIMARY KEY is a field in a database file that
    uniquely describes each record.
  • A FOREIGN KEY is a primary key of one file that
    also appears in another file. So, foreign keys
    specify how files are logically related.
  • For example, the Part and Facility files are
    logically related. So, in Figure 4.4 you can see
    that Facility Number (the primary key for the
    Facility file) exists in the Part file (where
    its a foreign key).

14
A Database Contains Built-in Integrity Constraints
Databases
4-13
  • An INTEGRITY CONSTRAINT is a rule that helps
    assure the quality of the information in a
    database.
  • A registration database at your school includes
    integrity constraints concerning prerequisites
    for certain classes.
  • Our Inventory database includes an integrity
    constraint that says a part in the Part file
    cannot be assigned to a facility that does not
    exist in the Facility file.
  • Eg Ritz-Carlton guest preference database

15
DATABASE MANAGEMENT SYSTEM (DBMS)
Database Management Systems
4-14
the software you use to specify the logical
organization for a database and access it.
  • A DBMS contains 5 software components
  • 1. DBMS engine
  • 2. Data definition subsystem
  • 3. Data manipulation subsystem
  • 4. Application generation subsystem
  • 5. Data administration subsystem
  • see figure4.6

16
DBMS ENGINE--Logical to physical bridge most
important
DBMSs
4-15
accepts logical requests from the various other
DBMS subsystems, converts them to their physical
equivalent, and actually accesses the database
and data dictionary as they exist on a storage
device.
  • Recall that
  • PHYSICAL VIEW deals with how information is
    physically arranged, stored, and accessed on some
    type of secondary storage device.
  • LOGICAL VIEW focuses on how you need to arrange
    and access information to meet your particular
    business needs.

17
DB and DBMS provide two advantages in separating
logical from physical view of info.
  • DBMS handles all physical tasks---gtyou can
    concentrate solely on your logical info. Needs
  • different knowledge worker logically view info.
    in different ways

18
DATA DEFINITION SUBSYSTEM-defining the logical
structure of a database
DBMSs
4-16
helps you create and maintain the data dictionary
and define the structure of the files in a
database.
  • You use this subsystem to define the information
    logical structure or properties when you first
    create a database.
  • Once you created a database, you use this
    subsystem to define new fields, delete fields, or
    change field properties.
  • Figure 4.5 (page 126) contains this subsystem
    screen for the Part file.

19
Logical Structures(properties)
  • Name
  • type
  • form
  • default value
  • validation rule
  • Is an entry required?
  • Can there be duplicates?

20
DATA MANIPULATION SUBSYSTEM --Mining and changing
info. in a database
DBMSs
4-17
helps you add, change, and delete information in
a database and mine it for valuable information.
  • This subsystem is most often the primary
    interface between you as a user and the
    information contained in a database.
  • Tools in this subsystem include views, report
    generators, query-by-example tools, and
    structured query language.

21
DATA MANIPULATION TOOLS
DBMSs
4-18
  • VIEW - allows you to see the content of a
    database file, make whatever changes you want,
    perform simple sorting, and query to find the
    location of specific information. See Figure 4.7
    page 137.
  • REPORT GENERATOR - helps you quickly define
    formats of reports and what information you want
    to see in a report. See Figures 4.8 and 4.9 page
    138.

22
DATA MANIPULATION TOOLS
DBMSs
4-19
  • QUERY-BY-EXAMPLE (QBE) TOOL - helps you
    graphically design the answer to a question.
    Figure 4.10 (page 138) shows the QBE for
    displaying the names and phone numbers of
    facility managers in charge of parts that cost
    more than 10.
  • STRUCTURED QUERY LANGUAGE (SQL) - a standardized
    fourth-generation language found in most database
    environments. SQL is the same as QBE, except
    that you perform a query by creating a
    statement(sentences-based) instead of pointing,
    clicking, dragging(graphics-based).

23
APPLICATION GENERATION SUBSYSTEM
DBMSs
4-20
contains facilities to help you develop
transaction-intensive applications. This
subsystem includes
  • Tools for creating data entry screens (See Figure
    4.12 page 139 for an example)
  • Programming languages specific to a particular
    DBMS
  • Interfaces to commonly used programming languages
    that are independent of any DBMS.

24
DATA ADMINISTRATION SUBSYSTEM
DBMSs
4-21
helps you manage the overall database environment
by providing facilities for
  • Backup and recovery
  • Security management
  • Query optimization
  • Reorganization
  • Concurrency control
  • Change management

Team work Refining information privileges during
University Registration
Project on your own DBMS Support for OLTP,OLAP
and information management
25
Which database model to adopt?
  • hierarchical
  • network
  • relational --the most widely used
  • object-oriented--the newest

26
THE RELATIONAL DATABASE MODEL
Database Models
4-22
a database model that uses a series of
two-dimensional tables or files to store
information.
  • This is the most popular model.
  • Each table is called a RELATION.
  • A relation contains information about a
    particular ENTITY CLASS (a concept - people,
    places, or things - about which you wish to store
    information and that you can identify with a
    unique key). Instance is an occurrence of an
    entity class that can be uniquely described

27
Database Models
4-23
  • Figure 4.14 (page 144) shows a relational
    database for a video rental store.
  • The entity classes are Customer, Video, Video
    Rental, and Distributor.
  • Notice how these tables are related to each other
    through the use of foreign keys.
  • In the Video Rental relation, youll find a
    primary key that uses more than one one field to
    create a unique description. This is called a
    COMPOSITE PRIMARY KEY.
  • A primary key that uses only one field is called
    an ATOMIC PRIMARY KEY.
  • Eg Palo Alto Childrens hospital

28
THE OBJECT-ORIENTED (O-O) DATABASE MODEL
Database Models
4-24
a database model that brings together, stores,
and allows you to work with both information and
procedures that act on the information.
  • An OBJECT-ORIENTED DATABASE MANAGEMENT SYSTEM
    (O-O DBMS) is the DBMS software that allows you
    to develop and work with an O-O database.

29
Database Models
4-25
  • This model takes advantage of the concept of an
    OBJECT - a software module containing information
    that describes an entity class along with a list
    of procedures that can act on the information
    describing the entity class.
  • Figure 4.15 (page 146) shows the same video
    rental store using the O-O database model.
  • Notice that the objects (entity classes) - which
    include Customer, Video Rental, Video, and
    Distributor - contain both information and
    procedures for working with that information.
  • Advantages(key features) See Appendix C for more
    on objects
  • more closely models how an organizations works
  • reuse

30
DEVELOPING YOUR OWN DATABASE
Developing Databases
4-26
  • Being able to develop your own database is a part
    of knowledge worker computing.
  • Building a database for your personal needs
    includes the following 4 steps
  • 1. Defining entity classes and primary keys
  • 2. Defining relationships among entity classes
  • 3. Defining information (fields) for each
    relation
  • 4. Using a data definition language to create the
    database

31
  • You own a small business and are interested in
    tracking employees by the department in which
    they work, job assignment, and the number of
    hours assigned.
  • Each of your employees can be assigned to only
    one department, but a department may have many
    employees (a department, however, may not have
    any employees assigned to it). Each employee can
    be assigned to any number of jobs and a job can
    have many employees assigned to it, but its not
    necessary that any employees be assigned to a
    certain job.
  • Follow along as we build the database to support
    the report in Figure 4.16 on page 148.

32
1 - DEFINING ENTITY CLASSES AND PRIMARY
KEYS--the most important
Developing Databases
4-27
  • From the report in Figure 4.16, you can identify
    the entity classes as Employee, Department, and
    Job.
  • Now, for each entity class, you must define a
    primary key that provides a unique description.
    These include
  • Employee entity class - Emp ID (e.g., 2345 for
    Smith)
  • Department entity class - Dept (e.g., 15)
  • Job entity class - Job (e.g., 14 for Acct)

33
2 - DEFINING RELATIONSHIPS AMONG ENTITY CLASSES
Developing Databases
4-28
  • For this step, use an ENTITY-RELATIONSHIP (E-R)
    DIAGRAM, a graphical method of representing
    entity classes and their relationships.
  • See Figure 4.17 (page 148) for the initial E-R
    diagram of our database and a listing of E-R
    diagram symbols.

34
Developing Databases
4-29
EMPLOYEE
DEPARTMENT
M1
  • An Employee must be assigned to a Department.
  • An Employee cannot be assigned to more than one
    Department.
  • A Department may have many Employees assigned to
    it.
  • A Department is not required to have any
    Employees assigned to it.

35
Developing Databases
4-30
  • After building the initial E-R diagram, you must
    follow the process of normalization.
  • NORMALIZATION is a process of assuring that a
    relational database structure can be implemented
    as a series of two-dimensional tables.
  • Normalization includes the following 3 steps
  • 1.Eliminate repeating groups or MM relationships
  • 2.Assure that each field in a relation depends
    only on the primary key of that relation
  • 3.Remove all derived fields from the relations.

36
Developing Databases
4-31
  • The first rule of normalization states that no
    MM relationships can exist.
  • There is an MM between Employee and Job.
  • Find repeating ways in two ways.
  • You eliminate this by creating an INTERSECTION
    RELATION - a relation you create to eliminate a
    repeating group.
  • An intersection relation will have a composite
    primary key that consists of the primary key
    fields from the two intersecting relations.
  • In Figure 4.18 (page 150), we created an
    intersection relation called Employee-Job to
    eliminate the MM relationship.

Teamwork Building an E-R Diagram for the video
rental store
37
3 - DEFINING INFORMATION (FIELDS) FOR EACH
RELATION
Developing Databases
4-32
  • In this step, you follow rules 2 and 3 of
    normalization.
  • Your goal here is two-fold
  • 1.Make sure that the information in each relation
    is indeed in the correct relation
  • 2.Make sure that the information cannot be
    derived from other information.

38
Developing Databases
4-33
  • To determine if information is in the correct
    relation, ask
  • does this piece of information depend only on the
    primary key for this relation?
  • If the answer is yes, the information is in the
    correct relation.
  • In the Employee relation (Figure 4.20 page 152),
    we currently store Dept Sup. Does Dept Sup
    depend only on Emp ID?
  • The answer is no - Dept Sup depends on Dept, so
    it should be in the Department relation.

39
Developing Databases
4-34
  • Derived information - information that can be
    mathematically determined from other information
    - should not be stored in your database.
  • For example, Emp is a field in the Department
    relation.
  • However, we can simply count the number of
    occurrences of each Dept in the Employee relation
    and determine the number of employees.
  • So, we remove Emp from the database.

40
4 - USING A DATA DEFINITION LANGUAGE TO CREATE
THE DATABASE
Developing Databases
4-35
  • The final step is to actually create the
    relations you identified in steps 1-3.
  • You do this with a data definition language.
  • This step includes
  • Developing a data dictionary
  • Defining the various relations
  • Defining primary keys and relationships

41
(No Transcript)
42
(No Transcript)
43
DATA WAREHOUSE and DATA MINING TOOLS
  • question such as
  • how many size 8 shoes did we sale last month
    in Southeast and Southwest region,compared with
    the same months over the last 5 years?
  • For your organization to succeed (to
    survive),users must have
  • 1) a way to easily develop the logical structure
    of such questions
  • 2) the needed information presented to them
    quickly without sacrificing the speed of
    operational systems and databases

44
DATA WAREHOUSE
Data Warehouses
4-36
a logical collection of information - gathered
from many different operational databases - that
supports business analysis activities and
decision-making tasks. Data warehouses
  • are a logical extension of databases
  • support OLAP
  • are among the newest and hottest buzz words and
    concepts in the IT field.
  • Represent a different way of thinking about
    organizing and managing information

45
DATA WAREHOUSE FEATURES
Data Warehouses
4-37
  • Data warehouses combine information from
    different databases (See Figure 4.22 )
  • Data warehouses are multi-dimensional
  • As opposed to 2 dimensions in the relational
    model
  • containing lays of columns and rows,multidimension
    al representation of information is Often
    called hyper cubes (See Figure 4.23) summary of
    information,
  • a data dictionary maintain the logical structure
    of information and two important
    characteristics(origin and method)
  • Data warehouses support decision making
  • While databases support OLTP, data warehouses
    support OLAP

46
DATA MINING TOOLS
Data Warehouses
4-38
the software tools you use to query information
in a data warehouse.
  • Data mining tools includes(see figure 4.24)
  • QUERY-AND-REPORTING TOOLS - QBE tools, SQL, and
    report generators.
  • INTELLIGENT AGENTS - various artificial
    intelligence tools that form the basis for
    information discovery in OLAP.
  • MULTIDIMENSIONAL ANALYSIS (MDA) TOOLS -
    slice-and-dice techniques that allow you to view
    multidimensional information from different
    perspectives.
  • Eg North Memorial Medical Center
    ---ForestTrees

47
IMPORTANT CONSIDERATIONS IN USING A DATA WAREHOUSE
Data Warehouses
4-39
  • Do you need a data warehouse?
  • Expense,no need(ROLAP),support-strapped..
  • Do you already have a data warehouse?
  • Who will the users be?
  • Mastercard
  • How up-to-date must the information be?
  • What data mining tools do you need?

48
MANAGING THE INFORMATION RESOURCE
Managing Information
4-40
  • How will changes in technology affect organizing
    and managing information?
  • Environment Canadas Ice Services
  • What types of database models and databases are
    most appropriate?
  • Functional requirements types of database
  • Who should oversee the organizations
    information?
  • CEOData administration database
    administration

49
OVERSEEING YOUR ORGANIZATIONs INFORMATION
Managing Information
4-41
  • CHIEF INFORMATION OFFICER (CIO) is the IT
    manager who directs all IT systems and personnel
    while communicating directly with the highest
    levels of the organization.
  • DATA ADMINISTRATION plans for, oversees the
    development of, and monitors the information
    resource.
  • DATABASE ADMINISTRATION is responsible for the
    more technical and operational aspects of
    managing information in databases.

50
MANAGING THE INFORMATION RESOURCE
Managing Information
4-42
  • Is information ownership a consideration?
  • What are the ethics involved in organizing and
    managing information?
  • How should databases and database applications be
    developed and maintained?

51
TO SUMMARIZE
4-43
  • How we view information
  • The physical view of information deals with how
    information is physically arranged, stored, and
    accessed on some type of secondary storage
    device.
  • The logical view of information focuses on how
    you need to arrange and access information to
    meet your particular business needs.
  • A database is a collection of information that
    you organize and access according to the logical
    structure of that information.
  • The data dictionary contains the logical
    structure of information in a database.

52
TO SUMMARIZE
4-44
  • A database management system is the software you
    use to specify the logical organization for a
    database and access it.
  • Popular database models include the relational
    model and the object-oriented model.
  • The four steps of developing a personal database
    application include
  • 1. Define entity classes and primary keys
  • 2. Define relationships among entity classes
  • 3. Define information (fields) for each relation
  • 4. Use a data definition language to create the
    database

53
TO SUMMARIZE
4-45
  • Data warehouses are a logical collection of
    information - gathered from many different
    operational databases - that supports business
    analysis activities and decision-making tasks.
  • Data mining tools - the software tools you use to
    query information in a data warehouse - include
    query-and-reporting tools, intelligent agents,
    and multidimensional analysis (MDA) tools.
Write a Comment
User Comments (0)
About PowerShow.com