Title: DATA WAREHOUSING
1Chapter 5
Study Sections 5.2, 5.3, 5.5, Pages231-233
Snowflake schema
2 Business Intelligence
- Companies collect a large amount of data from
their business operations. - To keep track of that information, a business
uses disparate software applications , such as
Excel, Access, etc. - Using multiple software makes it difficult to
retrieve information in a timely manner and to
perform analysis of the data. - Business Intelligence (BI) represents the tools
and systems that play a key role in integrating
and analyzing all corporate data. -
- Generally illustrates intelligence in the areas
of customer profiling, market research, product
profitability (by product, region, year), etc.
3BI Architecture
- Consists of 3 system components
- Data warehouse
- Business analytics
- Performance management (BPM)
4Data warehouse
- A repository of cleaned and integrated historical
/stable data for the entire business - Extracted from independent databases (internal
external) - Transformed (ie. cleaned and reformatted)
- - A subset of a warehouse limited to a business
function is called a Data Mart (eg. Sales).
5DW vs. Transaction DBs
- Differences between standard Transactional
databases Data Warehouses - DWs are not designed for performing transaction
entries, but only for planning and analysis - DWs are not designed for retrieval of individual
records emphasis is on summarized data - DWs data pulled and integrated from disparate
databases, unlike Transaction dbs which are
individual applications - Transaction dbs are concerned with now DW
focuses on activity over a period - A transaction db is volatile (eg. an order may be
cancelled) In a DW, data is only added, never
deleted (as it maintains a history) - Transaction db is optimized for rapid retrieval
not DWs
6Business analytics
These are tools that help analyze the data
towards finding solutions
- Reporting and queries
- Multi-dimensional reports, eg. Pivot tables see
Exercise 8 - SQL Queries Exercise 9
- Cube analysis Chapter 6
- Data, text and Web mining and other sophisticated
mathematical and statistical tools for searching
relationships Chapter 7
7Business Performance Management (BPM)
BPM supports monitoring, measuring, and
comparing of sales, profit, cost, profitability,
and other performance indicators
- Dashboard reports
- Production reports
8Analytic Tools SQL, Cubes
Transaction Data Systems
Data Source DW
Data Source Views
9Extraction, Transformation, and Load (ETL)
Process
- A data warehousing process consists of
- Extraction (i.e., reading data from a database),
- Transformation (i.e., converting the extracted
data from its previous form into the form in
which it needs to be so that it can be placed
into a data warehouse), and - Load (i.e., storing the data into the data
warehouse)
10Data Integration and the Extraction,
Transformation, and Load (ETL) Process
11ETL
Newly integrated schema for the Data Warehouse
Data from multiple Sources
12DW Schema Structures Star
Note that data is un-normalized
13DW Schema Structures Snowflake
Note that data is normalized
14Designing Fact Tables Normalization
- Normalization is the process of gathering
attributes into tables to eliminate redundant
data (the redundancy here is EquipID ? EquipType)
15Normalization Exercise
- FIRST (Supplier, City, CityCode, Part, Qty)
- Split the table into 3 different tables
- (Supplier, City)
- (City, CityCode)
- (Supplier, Part,Qty)
Although normalized databases have less data
redundancies, they are less efficient in quickly
processing the data. Hence, many DWs use Star
schema.
16Data Marts provide views of the data in the
Data Warehouse (we will be working with this in
our SQL exercises)
17Summary