Title: Business Intelligence
1Business IntelligenceData Warehousing
- Tom A. Fürstenberg
- Business Intelligence Consultant
- Cap Gemini Ernst Young
2Leerdoelen college
- Wat is BI DWH? (Conceptueel en Technisch)
- Toepassing van BI DWH
- De praktijk van een consultant iha en bij Cap
Gemini Ernst Young ihb
3Inhoud College
- Performance Management
- Business Intelligence (Performance Measurement)
- OLAP
- Extranets
- Architectuur
- Data Warehouse
- ETL
- Multidimensioneel Modelleren
- CGEY Aanpak
- Data Mining
4Performance Management
- Doelgericht meten en bijsturen van
bedrijfsdoelstellingen
5In control of a company
6Overview
7Besturings visie Bouwstenenvoor besturing van
organisaties
Organisatie
Wie?
Systemen
Methoden
Hoe?
Doelstellingen Prestatie indicatoren
Wat?
Strategie Missie
Waarden normen
Stake holders
Waarom ?
8Methoden
Diverse Financiële modellen
Balanced Scorecard
INK managementmodel
9Naar een operationeel Besturingsmodel
10Naar een operationeel Besturingsmodel
11Van Model naar Gedragsverandering
Multi-dimensionale Gegevensstructuur
Verantwoordelijkheden en Bevoegdheden
Operationeel Besturingsmodel
Informatievoorziening
Planning en Commitment
Beoordeling en Sturing
12Some Typical Mgt. Questions
PRODUCT
CUSTOMER
- How much have we sold?
- Which product gives the best profit?
- Which product has the largest sales
- volume this quarter?
- Which product best meets market
- needs?
- How much to produce of each product?
- Who is the most profitable customer?
- What is the satisfaction level?
- Which are the best segments?
- Which service to improve?
- How many customers have we lost last year?
- Who are our biggest accounts?
CHANNEL
MARKETING
- Which retailer yields most by
- volume and which by profit?
- What promotions will yield most profit?
- What effect will discounts have on
- the turnover?
- What are the area coverage levels?
- How many contacted people became
- a customer?
- Promotions results?
- What is the competition doing?
13Key Performance Indicators Top 10
Source Results FIND! The Best
benchmarkstudy conducted in 1997/1998 by Ernst
Young Consulting and VU. 103 industrial companies
participated in the study.
14En nu alleen nog even metenBusiness
Intelligence (performance measurement)
15The Answers
The information is there, but spread everywhere!
16De praktijk...
17Problemen
- (Over)belasting IT-afdeling (queries)
- Lange doorlooptijd rapport-fabricage
- Hoge kosten aan manuren
- Databronnen moeilijk integreerbaar
- Niet-gestandaardiseerde rapporten
- Geen eenduidige definities
- Foutgevoelig
- Manipuleerbaar
- Afhankelijkheid van schakels
- Discussies over verschillen in cijfers
- Beperkte analyse-mogelijkheden
- Verkeerde en te late interpretaties, conclusies,
beslissingen - ...
18Een druk op de knop...
19Van chaos...
20Why now? Hype? Developments
Market Pull
- Globalisation of markets
- Individualisation of customers
- Shorter life cycle of products
- Information overload
- Mergers
Technology Push
- Faster hardware
- Cheaper disk capacity
- Modern OLAP-tools
- Any access c/s, web, mobile
21OnLine Analytical Processing
- Gebaseerd op de syntax van management-informatie
vragen - ltmeetwaardegt per ltdim1gt per ltdim2gt per ...
- KPIs, CSIs en OIs zijn meetwaarden
- Produkt, Regio, Klant, Tijd, etc. zijn dimensies
(slice dice) - Dimensies kennen hierachiën (drill down)
22OLAP
23 Introduction to Cubes
24Demo
- eFashion Case
- BusinessObjects Demo
25BusinessObjects Semantic Layer
26Any Access
27Info- Analysis-need at 3rd parties
28e/m-Business Intelligence Extranets
29Extranet demos
30Business IntelligenceTheory
31BI Definition
- Business Intelligence is the process of
collection, cleansing, combining, consolidation,
analysis, interpretation and communication of all
internal and available external data, relevant
for the decision making process in the
organisation
32BI Concept
Decisions
Action
Feedback
Knowledge
Analysis
Business Value
Information
Integration
Data
Collection
33BI Systems
Reporting Query
DSS, MIS and EIS
OLAP
Data Mining
34The Five Functional Levels
standard reports
bunch of reports, cube
unique report or question
i.e. finding variables
i.e stat. analysis, testing a hypothesis
35The Five Functional Levels
Number of users
80 of all users
reporting
Static/ Dynamic
analysis
Complexity of the question
querying
exploring
mining
interactief
36Corporate Information Factory
Any Source
Any Access
Any Data
LAN/WAN
Q U E R Y M A N A G E M E N T
L O A D M A N A G E M E N T
Data Marts
Applications
External data
Data Warehouse
WWW
Operational Data Store
37Components of the CIF
- Data Warehouse
- Data Mart
- Operational Data Store
- ETL
38Data Warehouse
39Definition Bill Inmon
- Characteristics of a data warehouse
- Subject-oriented
- Integrated
- Time-variant
- Non-volatile
- Both summary and detailed data
40Data Warehouse
- Contains data that can be used to meet the
information of (part of) the organisation - Contains integrated data extracted from one or
more sources - Mostly contains large amounts of data
- Contains data that is clean and consistent
- May contain aggregated data
- Optimised for its use
41Data Warehouse
Data Base
Data Warehouse
Actual
Historical
Internal and External
Internal
Integrated
Isolated
Transactions
Analysis
Normalised
Dimensional
Dirty
Clean and Consistent
Detailed and Summary
Detailed
42Data Warehouse
- Advantages
- One point of contact
- Time savings
- No loss of historical data
- OLTPs not hampered by BI activities
- Better consistency and quality of data
- Improvement of Business Intelligence
43Data Warehouse
- Disadvantages
- Never quite up-to-date
- Requires a lot of storage space
- Requires a lot of communication, coordination and
cooperation - Large impact on the organisation
- A data warehouse is only the beginning
44Data Mart
- DW design does not optimise query performance
- Data is not stored in an optimal fashion for any
given department in the DW - Competition to get the resources required to get
inside the DW - Costs for DSS computing facilities are high
because of the large volume in DW
45Data Mart
- Characteristics
- Customised for a specific department
- Limited amount of history
- Summarised
- Very flexible
- Elegant presentation
- Processor dedicated to the department
46Data Mart
- Divided by
- Business
- Geography
- Security
- Political (budget)
- Structure (data mining)
47Data Mart
- Three different kinds of data marts
- Subset/summary
- MOLAP
- ROLAP
48Operational Data Store
- Characteristics
- Subject-oriented
- Integrated
- Current-valued
- Volatile
- Detailed data
49ETL Extraction
- Source selection
- Data model is starting point determine data
elements that are needed - For each data element, determine available data
sources - If more han 1 source available, select on
- Quality, reliability and integrity
- Scope of data
- Location and availability of data
- Location and availability of expertise
50ETL Transformation
- Processing
- Aggregate records
- Encoding structures
- Simple reformatting
- Mathematical conversion
- Resequencing of data
- Default values
- Key conversion
- Cleansing
51ETL Transformation
Key transformation
Key structure A
Key structure B
Key structure A
Key structure C
Key structure A
Key structure B
New key structure
Key structure C
52ETL Cleansing
- Data quality is critical for
- Marketing communications
- Targeted marketing
- Customer matching
- Retail- and commercial householding
- Combining information
- Tracking retail sales
53ETL Cleansing
- Common excuses for not cleaning
- The data in the operational systems seem to work
just fine - Data can be joined most of the time
- Cleansing will take place after population of the
data warehouse - Data entry will be improved
- The users will never agree to change their data
54Multi DimensionalData Modeling
55MD Modeling Contents
- E/R Modeling (Ex.)
- MD Modeling (Ex.)
- Star Schema
- Slowly Changing Dimensions (Ex.)
- Surrogate Keys
- Aggregation (Ex.)
- Measures Dimensions reviewed
- Other important MDM aspects
56Exercise E/R Modeling
- How could the sales transaction database of the
eFashion retailer look like?
57Management Questions
- Give me the annual revenue of all my product
lines divided over all the sales regions over the
last 3 years - Give me the top 10 of most profitable products
this year - Give me the top 10 of most sold products of last
year - Give me the top 10 of most profitable customers
- Compare the YTD revenue with the one in the same
period last year and the target
58Why not E/R Modeling?
- End users cannot understand, remember, navigate
an E/R model (not even with a GUI) - Software cannot usefully query an E/R model
- Use of E/R modeling doesnt meet the DW purpose
intuitive and high performance querying
59Exercise Model the Efashion DM
- Sales Revenue
- Time hierarchy (Year-Quarter-Month)
- Store hierarchy (Region, State, City, Store)
- Product hierarchy (Line, Category, SKU)
60eFashion Data Mart
61DW Modeling Components
62Using a Star Schema
63Components of a Star Schema
EmployeeKey
TimeKey
ProductKey
TimeKey
EmployeeKey
Dimensional Keys
ProductKey
CustomerKey
ShipperKey
CustomerKey
ShipperKey
64Exercise Slowly Changing Dimensions
- Suppose the product categories change from time
to time. - Model the Data Mart when the manager wants to see
historical reports against - 1. The present categories
- 2. The categories at the time of the sale
- 3. Both against the present categories and the
immediate previous categories - 4. The categories at any specified time
65SCD Exercise 1
66SCD Exercise 2
Most Recent Product Key Map
Product_key SKU_nr
67SCD Exercise 3
Time Month_nr Month_desc Quarter
Year
Facts Month_nr Store_nr Pr
oduct_key Sales_revenue ...
Product Product_key SKU_nr SKU_desc Cate
gory Category_old Line
Geography Store_nr Store_name City State Reg
ion
68SCD Exercise 4
Time Month_nr Month_desc Quarter
Year
Facts Month_nr Store_nr SK
U_nr Sales_revenue ...
Product SKU_nr SKU_desc Category Line Va
lid_from Valid_until
Geography Store_nr Store_name City State Reg
ion
69Slowly Changing Dimensions
- Type 1 Overwrite the dimension record
- Type 2 Create new dimension record
- Type 3 Create an old field in the dimension
record - Type 4 Add a valid_from and valid_until field in
the dimension record
Ad. Type 2 requires surrogate keys, but in
general, one should always use these because of
performance and flexibility Ad. Type 4 Kimball
only recognizes 3 types SCDs
70Always Use Surrogate Keys
- Allows DWH to assign new key versions for SCDs
(type 2) - Higher performance with numeric keys than with
long, alphanumeric keys
71Exercise Aggregation
- Suppose the manager queries frequently on product
line level and finds the performance too low. - Question How to model the data mart when we want
to add aggregated measures on product line level?
72Exercise Aggregation
Time Month_nr Month_desc Quarter
Year
Facts Month_nr Store_nr Pr
oduct_key Sales_revenue ...
Product Product_key SKU_nr SKU_desc Cate
gory Line
Geography Store_nr Store_name City State Reg
ion
73Exercise Measures
Add the following measures to the eFashion Data
Mart
- Stock Quantity
- Product Price
- Promotion Costs (product-specific,
store-independent)
74Exercise Measures
Time Month_nr Month_desc Quarter
Year
Facts Month_nr Store_nr Pr
oduct_key Sales_revenue Stock_qty
Product Product_key SKU_nr SKU_desc Pric
e Category Line (Valid_from Valid_until)
Geography Store_nr Store_name City State Reg
ion
75Measures Dimensions reviewed
The most useful measures are
Dimensions are
- The natural entry points of the facts
- I.e., used for constraints and report breaks
- Independent of each other, not hierarchically
related
76Other Important MDM-Aspects
- Cardinality
- Grain
- Referential Integrity
- Conformed Dimensions
- Drill Across
- Traps
77How to make the CIF?
Any Source
Any Access
Any Data
LAN/WAN
Q U E R Y M A N A G E M E N T
L O A D M A N A G E M E N T
Data Marts
Applications
External data
Data Warehouse
WWW
Operational Data Store
78CGEY BI-Approach Overview
Incremental Delivery
Strategy objectives
DW blueprint
Source data
Metamodel
Extraction, Transformation Load
Development
Definition Increments
Implementation
Awareness
Data Warehouse Architecture I
Data Warehouse Architecture II
Evolutionary Strategy
79(No Transcript)
80Data Mining
81Data Mining
- Definition
- The process of digging intelligently into large
volumes of data to discover and analyse
previously unknown relationships or to validate
hypotheses.
82Data Mining Versus OLAP
OLAP/Query Are there some customers from large
accounts with a high decrease in international
calls?
Data Mining Are there any common characteristics
among these customers?
83Applications
- Risk Analysis (grant credit, investment)
- Fraud Detection (telephone charge, bank
withdrawals) - Trouble Shooting and Diagnosis
- Process Controls (wafer fabrication)
- Promotion Analysis
- Bankruptcy Prediction (mortgage lending, business
partners) - Customer Churn (telco)
- CRM (next slides)
84Maximizing Customer Value
- Getting more prospects in
- Turning prospects into customers
- Selling more products to existing customers
- Getting less customers out
85Which ones in and which ones out?
Yield per individual customer
Yield per customer
Costs per customer
Highest
Lowest
Customer profitability
86Example One to One Marketing
- Treat different customers differently
- differentiate message
- differentiate product offer
- differentiate channel
- Need for usable information gt predict customer
behavior out of databases
87(No Transcript)
88(No Transcript)
89(No Transcript)
90(No Transcript)
91(No Transcript)
92(No Transcript)
93Example clickstream analysis
- What parts of our Web site get the most visitors?
- What parts of the Web site do we associate most
frequently with actual sales? - What parts of the Web site are superfluous or
visited infrequently? - Which pages on our Web site seem to be "session
killers," where the remote user stops the session
and leaves? - What is the new-visitor click profile on our
site? - What is the click profile of an existing
customer? A profitable
customer? A complaining customer
that all too frequently returns our
product? - What is the click profile of a customer about to
cancel our service, complain, or sue us? - How can we induce the customer to register with
our site so we learn
some useful information about that
customer? - How many visits do unregistered customers
typically make with us before they are willing to
register? Before they buy a product or service?
94Customized Customer Service
95Example Contact Strategy
Good
Data mining
Customer Data
Bad
96The customer choses the channel
Operational systems
Analysis
Integration
97Data Sources for Data Mining
Collecting Cleansing
DATA
- Transactions (loyalty cards)
- Behaviour of existing customers
- Logfiles cookies
- Market research
- Data suppliers
- Public data
98Example Affinity Grouping
- Market Basket what items are sold together?
- Market Basket what categories are sold with what
items? - Market Basket what is not sold with certain
items? - Event Correlations what other services are
brought in the first month after signing up for a
satellite TV subscription?
99Data Mining Techniques
- Decision Trees, Classification Trees, Rule
Induction - Neural Nets
- Visualisation
- Fuzzy Logic Nearest Neighbour Memory Based
Reasoning Case Based Reasoning - Proprietary Logic
- Classical Statistics
100Data Mining Techniques
- ? Statistical analysis
- ? Neural networks
- ? Genetic algorithms
- ? Decision trees
- ? Intuïtion
?
?
?
Predictive Power
?
?
Simplicity
101Critical Success Factors
- Data availability (large amounts of a wide
variety of data) - Data consistency
- Data quality
- Domain expertise
- Data used/needed is allowed by privacy laws
102Benefits
- Improved customer relationships
- More revenue from existing customers
- Market segmentation
- Differentiated products and services
- Differentiated sales channels
- More effective marketing programs
- Improved fraud detection
- Improved investments
-
103Decision Tree with BusinessMiner from
BusinessObjects
Demo
104Contact information
- Tom A. Fürstenberg
- Business Intelligence Consultant
- Cap Gemini Ernst Young
- Sector Energy, Products Transport
- Tel 31 6 21 878 915
- email Tom.Furstenberg_at_CapGemini.nl