DATA WAREHOUSING AND DATA MINING - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

DATA WAREHOUSING AND DATA MINING

Description:

These are star schema, snowflake schema, and starflake schema. ... is the daily processing database that contains data used daily, on a short term basis. ... – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 18
Provided by: csiS7
Category:
Tags: and | data | mining | warehousing | daily | star

less

Transcript and Presenter's Notes

Title: DATA WAREHOUSING AND DATA MINING


1
DATA WAREHOUSING AND DATA MINING
  • MARK VROOMAN AND AMINAT ONI
  • PROJECT REPORT
  • KENNESAW STATE UNIVERSITY
  • DEPARTMENT OF COMPUTER SCIENCE
  • CS8630- DATABASE ADMINISTRATION

2
Purpose of data warehouse
  • Beazerhomes is a builder that has properties all
    over the United States. They have been building
    homes and apartments since 1995 and they feel the
    need to have a data warehouse to help record and
    store what the selling and renting trends and
    patterns for their buildings are and have been.
    This will help top management make decisions such
    as where to build more properties, which
    properties to sell and which to rent out, what
    incentives make their buildings high sellers, and
    what prices to put on the buildings to maximize
    profit and at the same time, encourage buyers.
  • The data warehouse will have top management
    access that will include all departments, middle
    management access that will cover areas they
    manage, and staff access for their individual
    sections of work. It will also allow for simple
    queries on past and present data.

3
Benefits of Data warehouse
  • All the data from all the OLTP systems can be
    integrated and made consistent for easier query.
  • Data present in the data warehouse cannot be
    modified, but can only be updated and deleted.
    This ensures that the data gotten straight from
    the OLTP systems are the same ones used in
    business analysis.
  • Data stored in a data warehouse contains more
    detailed information on a more historical basis.
    This makes for a better analysis of home-buying
    and selling trends and patterns.

4
Data warehouse design
- Dimensional modeling is a technique used
in describing the database component of a data
warehouse. It organizes data in a form that
corresponds to the way users query data. A
dimensional model includes one more fact tables
and a series of smaller tables called dimension
tables. Different schemas can be used for
dimensional modeling. These are star schema,
snowflake schema, and starflake schema. In a star
schema, the fact table is surrounded by dimension
tables containing reference data that can be
denormalized. It can also be described as each
dimension table being linked to the primary key
fact table. - The data warehouse design
chosen by Beazerhomes uses the star schema for
its dimensional modeling. The fact tables Lease,
PropertySale, PropertyMaintenance,
PropertyViewing, and Advert contain the primary
keys for the dimension tables
5
Dimension and fact tables
6
OLTP System
- The OLTP (Online Transaction processing) system
is the daily processing database that contains
data used daily, on a short term basis. The data
from the OLTP is what is stored in the data
warehouse. The DreamHome contains tables from
the OLTP for BeazerHomes and shows information on
how each property is sold or rented.
The diagram below contains the fact and
dimension tables for the OLTP system,
DreamHome
7
(No Transcript)
8
Data warehouse
  • The data warehouse contains historical and
    long-term information from the OLTP system.
  • The data will be loaded via an operational
    datastore. The data will be extracted from OLTP
    database by the operational datastore after the
    end of business hours on Friday (assuming a
    standard business week). The data will then be
    cleansed and its integrity will be verified. The
    data will then be loaded into the data warehouse
    by the operational datastore via the warehouse
    manager. Prior to the loading of the data, an
    incremental backup of the data warehouse will be
    performed. This will ensure a good rollback point
    if issues occur with the data loading. By
    carrying these activities out during non-business
    hours, there will be increased availability of
    resources and will not slow down operations
    during normal hours. This also allows time for a
    recovery if necessary.

9
(No Transcript)
10
E-R Diagram
11
E-R Description
  • The above diagram is the entity-relationship
    diagram for the data warehouse, that shows how
    each dimension or fact table interacts with the
    other tables. These are described as follows
  • Promotion uses Advert displayed in the Newspaper,
    which recommends PropertyForRent and promotes
    PropertyForSale.
  • - Advert describes PropertyForSale.
  • - ClientRenter holds Lease and seeks
    PropertyViewing.
  • - Branch registers ClientRenter and offers
    PropertyForRent, sells PropertyForSale, has
    Staff, and contacts ClientBuyer.
  • - ClientBuyer requests PropertyViewing and agrees
    to PropertySale.
  • - PropertySale is for PropertyForSale.
  • - PropertyForRent is placed in Advert and takes
    PropertyViewing. It is also leased via Lease and
    requires PropertyMaintenance.
  • - Staff oversees PropertyForRent, manages
    PropertyForSale, and attends to
    PropertyMaintenance.

12
- Owner owns provides (rent to) PropertyForRent
or owns PropertyForSale  - PropertyViewing views
PropertyForSale.
13
SAMPLE QUERIES
SELECT COUNT (timeID) FROM PropertySale Where
timeID (SELECT (timeID) FROM Time WHERE year
BETWEEN 2001 AND 2003)   This query will
provide a count of the number of properties that
were sold between the beginning of 2001 and the
end of 2003. The BETWEEN command includes the
endpoints of the range.  
14
SAMPLE QUERIES, CONTD.
SELECT SUM (sellingPrice) AS TotalRevenue FROM
PropertySale Where timeID (SELECT (timeID)
FROM Time WHERE year BETWEEN 2001 AND
2003)   This query will give the total of
sales (TotalRevenue) for the dates from the
beginning of 2001 to the end of 2003.
15
Data Dictionary
The data dictionary is a catalog of data files
in the data warehouse. It also contains the
characteristics of each file and is only used for
the management of the files. An excerpt from
Beazerhomes data warehouse can be seen below.
Although a data dictionary does not contain data,
without one, a management system will not be
able to access data information from the
warehouse.
16
(No Transcript)
17
A data warehouse is a beneficial database storage
system for organizations that want to have a
competitive edge in their industry through using
past and present data to view positive and
negative trends and patterns in order to improve
their technology, marketing, financial, and
sales. To further make the benefits of a data
warehouse more advantageous, data mining tools
should be used to uncover unknown patterns and
trends that may have been lost in large
databases. Using data mining to reveal more
complex queries, reports, and patterns will give
an even more beneficial advantage to
organizations.
Write a Comment
User Comments (0)
About PowerShow.com