Title: The Data Warehousing 2001
1The Data Warehousing 2001
2Agenda
- The evolution
- What are sites around the world doing with DW
today - How are the biggest schools of thought coming
together - Why is data warehousing more important than ever?
- Is architecture still a dirty word?
3The progression
- 1st data warehouse in 1905 by Dupont Corp
- 1st data cube by sales, branch and date
- 1970s - Management Decision Systems developed
product called Express (Oracle) - 1983 Metaphor - founded by Ralph Kimball and 2
partners as standalone DSS - Lessons learned - manage information as corporate
resource - 1980 - E.F.Codd - Promise of relational databases
(data every which way) - Inmon 1993 - Popularisation of the term
4Original Data Warehouses
- Set up primarily to
- Offload DSS from the mainframe platform
- Get around security issues
- Let the end users look at data in a safe
environment - To provide a place to cleanup data, by
replicating it somewhere else - IT was the Primary beneficiary
5Evolution through 90s
- Reporting
- Summarisation
- EIS applications
- OLAP
- Data Mining
- Intelligent Agents
- Active Warehouses
6Why Did it Take off This Time?
- We finally have the ability to store vast
quantities of data - Intense competition in the business world
requires ability to automatically adjust - This means understand trends/change as they
happen - Parallel processing technologies make querying
vast quantities of data possible - Availability of desk-top or web-based tools to
slice and dice
7Critical Success Factors in DW Yesterday
- Sponsorship - Gotta have it
- Requirements driven development versus build it
and they will come - Data quality imperatives
- Solid database design
- End user deployment as opposed to information
centre - DW Methodology versus standard AD cycle
8Critical Success Factors in DW Today
- So whats changed
- Application requirements, not data requirements,
are driving need - Successful DW environments are more concerned
about getting better and better usage - Hard befits soft benefits of users DW ROI
- New pitfalls
- Ignored Deployment
- Obsessive cleansing
- Some DSS have finite lifespans
- Failing to socialise new applications
- Advanced technology groups
- Talking to the wrong end users!
9Data Warehouse Institute San Diego 2000
- Conference Purpose
- Provide a vendor-independent forum for sharing
information about data warehousing - No Hype, No Bias, No Fluff
- Vendor exhibitions - equal footing
- High quality education for all levels
- Highlights for this year (and for me)
- working with click-stream data to measure and
improve e-business initiatives - Some success stories! (and some tragedies)
- Measuring and justifying DW projects is being
done successfully using solid return on
investment figures - Solutions to the metadata problem
10Some best practices
- Web-based deployment
- Power users still need client-server
- Internet portals
- near real-time updates
- On demand ad-hoc
- No query limits
- Atomic data, used for predictive modeling
- as opposed to predictable answer for predictable
questions - Formalised acquisition of data
- shared processes to reduce the re-work
11What are other people doing?
- Web Wide analytics
- Collecting and integrating click streams
- Goal to improve relevance and efficiency of site
content and advertising targeting - Almost impossible to do if you dont build web
applications with this in mind - Some new data types to confuse us
- PII !! (personally Identifiable info)
- Sessionization
- Anonymous cookie profiles - event pings
12Layers of BI
Knowledge Development - No Hypothesis
Modelling - With some Hypothesis
Multi-dimensional Analysis
Degree of understanding
Standard Ad-hoc Queries
13Customer Relationship Management (CRM)
- Is enabled by the Data Warehouse
- Need information pulled together to turn data
into knowledge - Detailed customer analysis, segmentation to
segments of 1 - Pattern recognition, and profiling through data
mining - Enter the active data warehouse
14What do we mean by CRM
- Embedding of customer management and relationship
building into every organisational aspect - The use of information technologies to drive a
share of wallet strategy in markets where
personal account management is not appropriate or
cost-effective - NOT
- Some data warehousing
- Some data mining
- campaign management systems
- some analytical tools with automation of sales
force thrown in
15CRM - Why the big deal now?
- Customer needs -
- Loyalty out the window
- Competition fiercer due to globalisation and
increased commoditization in all industries
(especially services) - Retention - is cheaper than acquiring new
customers (10X) - The challenge
- Enabling strategies are just that, the key to
achieving the value is in the doing - Enormity of the task
- Where to start
16Key Message Think Big, but Start Small
- Single campaign of high priority, tight scope
- Support it minimally at first
- Data
- Target by analysing information
- Study the channel mix
- Measure
- Absorb into organisation
- Develop systems, architecture, skills transfer to
other areas - Long term strategy
17Critical Success Factors - CRM Related
- Field of dreams - when do you get payback?
- Cultural changes that are required
- Sponsorship (not until it looks like a winner)
- Training, training and more training
- Knowledge and tools without action and the right
attitude by the people does not translate into
success - What does this look like?
18(No Transcript)
19What a Data Warehouse Architecture IS
- A data warehouse architecture is a blueprint, an
arrangement or a map - A data warehouse architecture is a plan which is
the technical translation of the business
requirements - Data Warehouse Architecture is the set of
components required to deliver an organisations
information capability - A data warehouse architecture is a set of
principles that an organisation determines to
follow to achieve an information capability
20What does it allow you to do?
- Spend time understanding the priorities of the
business - Determine which scale your company will fit into
- Choose tools (if youre lucky)
- Determine which components are required and which
are luxuries - People, Processes and Technology
- Choose the right religion
21What a Data Warehouse Architecture IS NOT
- It is NOT a detailed plan
- It is not something that you go out and build or
buy - It is not a set of imperatives that must be
followed in all situations - It is not a diagram or a document
- Although these may be useful means of
communicating, explaining, and recording it
22Why do you need an architecture anyway?
- On its own, an architecture does not deliver
anybody anything! - But the same could be said of Project Plans, Data
Models, Road maps, methodologies! - If youre going to build anything, you need a
plan. If you dont plan, plan to fail - Serves as communication point for the architects
and the customers - Assists in breaking down work into manageable
components so that you can determine overall
costs
23What will happen to it.(Unfortunately)
- People will get sick of hearing about it! (0.7
Probability) - At some point, you will be struggling to find
another word for it - It will change (0.9 Probability)
- The situation that you didnt plan for emerges
- Someone will disobey architectural principles
(0.9999 probability) - And this may be you (0.85 probability)
24What happens without agreed architecture?
- Failure to manage expectations Youll have
trouble selling the vision going forward - Constant rework due to duplication of effort
- Difficulty in estimating costs
- Departments and business units will do their own
thing and you may never get the opportunity to
catch up
25Architecture as a religion...
- When principles become more important than
delivering business value - Zealotry / devotion to theory
- Architectural trade-offs are seen as compromising
principles - Standing on the sidelines of projects waiting for
failure - I told you so (sinner repent)!
- Elevator white-anting
26Architecture as evolution
- Agreeing on a first cut version, then selecting a
piece to deliver the first slice of business
value - Modifying the first version over time as business
imperatives alter - Building or buying the components in stages when
it makes sense - ETL tools, Multi-dimensional DBs, Metadata
dictionary, ODS
27Architectural deviation...
- When you need to deliver a business solution
sooner - when sponsorship appears to be at risk
- To prove that you can actually deliver something
- Previous projects have failed
- Examples
- Standalone data marts
- Implemented prototypes
28Some situations not to compromise.
- Making up fun new dimensions and measures
- Customer, Client, Debtor, Partner
- All defined differently
- How much more to do it right
- Recording and Euros, average balances and
opening balances as Balances - Exception - Analysis paralysis
29Communicating the Architecture
- Communication plan mandatory
- List of key stakeholders
- Devise the preferred way of informing them of
progress and requesting support - Consistency
- Frequency of communication
- not too much, and not too little
- Ensure that each stage/phase can be related to
its predecessor
30How far do you goand when do you stop!
- Symptom
- A Excessive data modeling
- A Sponsors dont return your phone calls
- A New products or technology innovations are
sending you back to the drawing board - Diagnosis
- Advanced analysis paralysis
31The Cure for A.A.P.
- Remember why were doing this?
- To have a picture of where we are going
- To be able to scope out the next project
- To understand the costs in enough detail to get
to the next stage - To be able to communicate internally and
externally - When youre there, STOP
- Get STARTED
32The Great Debate
- Father of DWing
- Origin - Prism, Pine Cone Systems, Author
- Corporate Information Factory
- Dependent Data Marts
- Normalised Warehouse
- billinmon.com
- Lifecycle Toolkit Man
- Redbrick
- Dimensional Model
- Data Mart Data Mart Data Mart DW
- Dimensional everything
- lifecycle-toolkit.com
33(No Transcript)
34(No Transcript)
35Where they meet
- An architecture is required
- Conform dimensions and measures across the
enterprise - The star schema model is a useful way to present
information to users - Build a data warehouse iteratively
- Metadata is of crucial importance
- That they both have it right!
36Where they disagree
- Granularity required
- Which modeling technique to use and when
- ER Modeling
- Star Schema/Dimensional Modeling
- The role of the data mart
37 Architectural Components
- Ralph Kimball says.
- Data staging area
- Collection of DMs DW
- ODS (Internal/external) really just atomic DM if
for DSS) - Data Mart - not quite the same
- N/A
- star schema used for everything
- Archived data
- Metadata
- Bill Inmon Says.
- Integration and Transformation Layer
- Enterprise data warehouse
- ODS
- Data Mart
- Exploration DW
- For outside square end user
- Near line storage
- Archived, less frequently used data
- Metadata
38 Granularity in the Warehouse
- Ralph Kimball says.
- Declaring the grain of a data mart at the lowest
level will make design impervious to changes - The grain of the time dimension will usually be
individual days (doesnt this conflict) - One of most important decisions is declaring the
grain correctly
- Bill Inmon Says.
- Level of granularity in the data warehouse should
be at the lowest possible level required for any
data mart - If that isnt very low, go lower!
- Atomic data warehouse data should be archived so
that any new views/summaries can be recreated
Who is right?
39 DW and role of E/R Modelling/
- Ralph Kimball says.
- ER Models are too complicated for end users to
understand - ER Modeling/normalising only suitable for OLTP or
in data staging area since it eliminates
redundancy - Results in too many tables to be easy to query
- ER models are optimised for update activity not
high performance querying
- Bill Inmon says.
- ER Model is suitable for data warehouses because
it is stable, and supports consistency and
flexibility - Normalised data is ideal basis for the design of
the Data Warehouse and the ODS - May not be suitable for the data mart, which
deals heavily with regular query activity and
time-variant analysis
Who is right?
40Dimensional Modelling and Star Join
- Ralph Kimball says.
- DM is only viable technique for designing
databases in the Data Warehouse environment
because it provides a predictable framework - Even lowest level granular data should be in
dimensional format - Every E/R model has an equivalent dimensional
model representation - Any type of business data can be represented as a
cube
- Bill Inmon Says.
- DM is reasonable viable technique for designing
data marts, when type of access is very
predictable - DMs are not suitable for updating at all
- Differing business areas will likely want a
different dimensional model to look at similar
data - Series of dimensional models are not flexible
enough to support an enterprises entire Data
Warehouse
Who is right?
41Role of the Data Mart
- Ralph Kimball says.
- Successive data marts built on a star schema
model together form a data warehouse - The bad publicity about data marts comes from
implementation of isolated stovepipe data marts
done badly, and not conforming dimensions and
measures - Data Marts can be atomic but should still be in
dimensional view format
- Bill Inmon says.
- Data marts should be populated by the data
warehouse and external data only - Can contain subsets, aggregated data or atomic
data - Provide a departmental view of the world
- May or may not reside on a different platform
from DW - Provide for repeatable, predictable types of
information delivery
Who is right?
42Where Inmon works best (my opinion)
- Large organisations with many different business
units that need to share information - Multiple MISs/DSSs in place, and feeling the
pain from inconsistencies and many interfaces - Traditional data modeling skills in-house and
understood
43Where Inmon fails us
- Little attention is paid to the value and rigour
required for dimensional modeling - Defined hierarchies are glossed over
- Does not stress the concept of conformed
dimensions and measures - This is implied but not stressed
- Assumed as part of integration layer
44Where Kimball works best
- Small organisations, predictable measuring
capability required - Where how you need to look at the data is a
no-brainer - Dimensions and measures are well-known and not
likely to change - Where lowest level of granularity does not create
Terabytes - Traditional data modeling was not successful and
is not practiced anyway
45Where Kimball fails us
- If you get the initial granularity wrong, you are
in deep trouble - He warns you about this, but does not really
provide a solution - Gut feeling says daily is often not enough
- If a new way of looking at things emerges, could
cost a rebuild - Assumes users are too dumb to make sense out of
snow-flaking
46Best of Both worlds?
- Why not.
- Pay strict attention to conforming dimensions and
measures across the business - Also model hierarchies early in piece
- Have a permanent staging area (3rd normal form)
and name it an atomic data warehouse - Feed dimensional data marts from this DW/Staging
area - Build data marts for departments going thru
staging area
47Where does this leave us?
- Has become an integral part of todays decision
making processes - DW is here to stay as
- enabler of CRM
- Will continue to evolve
- Role will move to be more operational
- Will not work without a sound information
architecture