Title: Introduction to Data Warehousing
1Introduction to Data Warehousing
2From DBMS to Decision Support
- DBMSs widely used to maintain transactional data
- Attempts to use of these data for analysis,
exploration, identification of trends etc. has
led to Decision Support Systems. - Rapid Growth since mid 70s
- DBMSs vendors have answered this trend by adding
new features to existing products - Rarely enough
3DBs for Decision Support
- Trend towards Data Warehousing
- Data Warehousing consolidation of data from
several databases which are in turn maintained by
individual business units along with historical
and summary information
4Characteristics of TPSs
- Characteristic OLTP
- Typical operation Update
- Level of analytical requirements Low
- Screens Unchanging
- Amount of data per transaction Small
- Data level Detailed
- Age of data Current
- Orientation Records
5TPS vs Decision Support
Complex Analysis Historical informationto
analyze Data needs to be integrated Database
design Denormalized, star schema
OLTP Information to supportday-to-day
service Data stored at transactionlevel Database
design Normalized
6MIS and Decision Support
Ad hoc access
Productionplatforms
Operational reports
Decision makers
- MIS systems provided business data
- Reports were developed on request
- Reports provided little analysis capability
- no personal ad hoc access to data
7Analyzing Data from Operational Systems
- Data structures are complex
- Systems are designed for high performance and
throughput - Data is not meaningfully represented
- Data is dispersed
- TPS systems unsuitable for intensive queries
ERP
Productionplatforms
Operational reports
8Data Extract Processing
Extracts
Operational systems
Decision makers
- End user computing offloaded from the operational
environment - Users own data
9Management Issues
Extracts
Operational systems
Decision makers
- Extract explosion
- Duplicated effort
- Multiple technologies
- Obsolete reports
- No metadata
10Data Quality Issues
- No common time basis
- Different calculation algorithms
- Different levels of extraction
- Different levels of granularity
- Different data field names
- Different data field meanings
- Missing information
- No data correction rules
- No drill-down capability
11From Extract to Warehouse DSS
Data warehouse
Decision makers
Internal andexternal systems
- Controlled
- Reliable
- Quality information
- Single source of data
12Data Warehousing Architecture
External Data Sources
Visualisation
Extract Clean Transform Load Refresh
Metadata respository
Serves
OLAP
Operational Databases
Data Warehouse
Data Mining
13Business Motivators
- Provide superior services and products
- Know the business
- New products
- Invest in customers
- Retain customers
- Invest in technology
- Reinvent to face new challenges
14Centralised data warehouse
Federated data warehouse
15Tiered data warehouse
16Data Warehouses Vs Data Marts
Data Warehouse
Data Mart Department Single-subject Few lt 100
GB Months
Property Scope Subjects Data Source Size
(typical) Implementation time
Data Warehouse Enterprise Multiple Many 100 GB to
gt 1 TB Months to years
17End-user Access Tools
- High performance is achieved by pre-planning the
requirements for joins, summations, and periodic
reports by end-users. - There are five main groups of access tools
- Data reporting and query tools
- Application development tools
- Executive information system (EIS) tools
- Online analytical processing (OLAP) tools
- Data mining tools
18Data Usage - 1000 questions
Need to complement RDBMS technology with a
flexible, multidimensional view of data
19(No Transcript)
20The Functionality of OLAP
- Rotate and drill down
- Create and examine calculated data
- Determine comparative or relative differences.
- Perform exception and trend analysis.
- Perform advanced analytical functions
21The star structure
22Multidimensional Database Model
Store
Customer
Store
Time
Time
FINANCE
SALES
Product
- The data is found at the intersection of
dimensions.
23Data Mining
24Data mining functions
- Associations
- 85 percent of customers who buy a certain brand
of wine also buy a certain type of pasta - Sequential patterns
- 32 percent of female customers who order a red
jacket within six months buy a gray skirt - Classifying
- Frequent customers are those with incomes about
50,000 and having two or more children - Clustering
- Market segmentation
- Predicting
- predict the revenue value of a new customer based
on that personal demographic variables