Title: DATA WAREHOUSING AND DATA MINING
1DATA WAREHOUSING AND DATA MINING
- MARK VROOMAN AND AMINAT ONI
- PROJECT REPORT
- KENNESAW STATE UNIVERSITY
- DEPARTMENT OF COMPUTER SCIENCE
- CS8630- DATABASE ADMINISTRATION
2Purpose of data warehouse
- Beazerhomes is a builder that has properties all
over the United States. They have been building
homes and apartments since 1995 and they feel the
need to have a data warehouse to help record and
store what the selling and renting trends and
patterns for their buildings are and have been.
This will help top management make decisions such
as where to build more properties, which
properties to sell and which to rent out, what
incentives make their buildings high sellers, and
what prices to put on the buildings to maximize
profit and at the same time, encourage buyers. - The data warehouse will have top management
access that will include all departments, middle
management access that will cover areas they
manage, and staff access for their individual
sections of work. It will also allow for simple
queries on past and present data.
3Benefits of Data warehouse
- All the data from all the OLTP systems can be
integrated and made consistent for easier query. - Data present in the data warehouse cannot be
modified, but can only be updated and deleted.
This ensures that the data gotten straight from
the OLTP systems are the same ones used in
business analysis. - Data stored in a data warehouse contains more
detailed information on a more historical basis.
This makes for a better analysis of home-buying
and selling trends and patterns.
4Data warehouse design
- Dimensional modeling is a technique used
in describing the database component of a data
warehouse. It organizes data in a form that
corresponds to the way users query data. A
dimensional model includes one more fact tables
and a series of smaller tables called dimension
tables. Different schemas can be used for
dimensional modeling. These are star schema,
snowflake schema, and starflake schema. In a star
schema, the fact table is surrounded by dimension
tables containing reference data that can be
denormalized. It can also be described as each
dimension table being linked to the primary key
fact table. - The data warehouse design
chosen by Beazerhomes uses the star schema for
its dimensional modeling. The fact tables Lease,
PropertySale, PropertyMaintenance,
PropertyViewing, and Advert contain the primary
keys for the dimension tables
5Dimension and fact tables
6OLTP System
- The OLTP (Online Transaction processing) system
is the daily processing database that contains
data used daily, on a short term basis. The data
from the OLTP is what is stored in the data
warehouse. The DreamHome contains tables from
the OLTP for BeazerHomes and shows information on
how each property is sold or rented.
The diagram below contains the fact and
dimension tables for the OLTP system,
DreamHome
7(No Transcript)
8Data warehouse
- The data warehouse contains historical and
long-term information from the OLTP system. - The data will be loaded via an operational
datastore. The data will be extracted from OLTP
database by the operational datastore after the
end of business hours on Friday (assuming a
standard business week). The data will then be
cleansed and its integrity will be verified. The
data will then be loaded into the data warehouse
by the operational datastore via the warehouse
manager. Prior to the loading of the data, an
incremental backup of the data warehouse will be
performed. This will ensure a good rollback point
if issues occur with the data loading. By
carrying these activities out during non-business
hours, there will be increased availability of
resources and will not slow down operations
during normal hours. This also allows time for a
recovery if necessary.
9(No Transcript)
10E-R Diagram
11E-R Description
- The above diagram is the entity-relationship
diagram for the data warehouse, that shows how
each dimension or fact table interacts with the
other tables. These are described as follows - Promotion uses Advert displayed in the Newspaper,
which recommends PropertyForRent and promotes
PropertyForSale. - - Advert describes PropertyForSale.
- - ClientRenter holds Lease and seeks
PropertyViewing. - - Branch registers ClientRenter and offers
PropertyForRent, sells PropertyForSale, has
Staff, and contacts ClientBuyer. - - ClientBuyer requests PropertyViewing and agrees
to PropertySale. - - PropertySale is for PropertyForSale.
- - PropertyForRent is placed in Advert and takes
PropertyViewing. It is also leased via Lease and
requires PropertyMaintenance. - - Staff oversees PropertyForRent, manages
PropertyForSale, and attends to
PropertyMaintenance.
12- Owner owns provides (rent to) PropertyForRent
or owns PropertyForSale  - PropertyViewing views
PropertyForSale.
13SAMPLE QUERIES
SELECT COUNT (timeID) FROM PropertySale Where
timeID (SELECT (timeID) FROM Time WHERE year
BETWEEN 2001 AND 2003) Â This query will
provide a count of the number of properties that
were sold between the beginning of 2001 and the
end of 2003. The BETWEEN command includes the
endpoints of the range. Â
14SAMPLE QUERIES, CONTD.
SELECT SUM (sellingPrice) AS TotalRevenue FROM
PropertySale Where timeID (SELECT (timeID)
FROM Time WHERE year BETWEEN 2001 AND
2003) Â This query will give the total of
sales (TotalRevenue) for the dates from the
beginning of 2001 to the end of 2003.
15Data Dictionary
The data dictionary is a catalog of data files
in the data warehouse. It also contains the
characteristics of each file and is only used for
the management of the files. An excerpt from
Beazerhomes data warehouse can be seen below.
Although a data dictionary does not contain data,
without one, a management system will not be
able to access data information from the
warehouse.
16(No Transcript)
17A data warehouse is a beneficial database storage
system for organizations that want to have a
competitive edge in their industry through using
past and present data to view positive and
negative trends and patterns in order to improve
their technology, marketing, financial, and
sales. To further make the benefits of a data
warehouse more advantageous, data mining tools
should be used to uncover unknown patterns and
trends that may have been lost in large
databases. Using data mining to reveal more
complex queries, reports, and patterns will give
an even more beneficial advantage to
organizations.