Data Warehouses, Data Marts, OLAP, Data Mining - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Data Warehouses, Data Marts, OLAP, Data Mining

Description:

Some comes from querying the OnLine Transaction Processing ... Customise to needs of small groups. Increase response time. Reduce build time. Reduce set-up cost ... – PowerPoint PPT presentation

Number of Views:167
Avg rating:3.0/5.0
Slides: 31
Provided by: itserv5
Category:

less

Transcript and Presenter's Notes

Title: Data Warehouses, Data Marts, OLAP, Data Mining


1
Data Warehouses, Data Marts, OLAP, Data Mining
2
Business Intelligence
  • Some comes from querying the OnLine Transaction
    Processing system (operational data)
  • Often answer who?/what?.
  • OnLine Analytical Processing and Data Mining use
    data from many sources
  • Often answer why?/what if?.

3
Data Warehouse Definition
  • A Data warehouse is a subject orientated,
    integrated, non-volatile, time variant data store
    in support of management decisions
  • (Bill Inmon 1993)
  • What does this mean?!

4
Data Warehouse Definition continued
  • Subject Orientated
  • relates to the major subjects of the
    organisation, not the applications e.g. Customers
    not Sales
  • Integrated
  • data from many sources in different formats can
    be collected together in a uniform manner
  • Non-volatile
  • data is always added to not replaced. Once data
    is inserted it can not be changed
  • Time-variant
  • historical data needs to be associated with a
    relative point in time so trends can be analysed

5
Why Data Warehouses came to be
  • The archiving structure was developed to improve
    performance of operational systems to gain
    business advantage
  • Decision makers in the organisation need to
    access information from all areas/departments/divi
    sions
  • Need current and historical data in order to
    identify trends

6
Why Data Warehouses Developed
  • Data Warehouses first emerged in the 1970s
    however they have only really evolved in the last
    ten years due to two main factors
  • Technological developments
  • Business Needs

7
Why Data Warehouses Developed - Technological
  • Power of H/w and S/w on desktops has increased
  • Power of servers has increased
  • Development of web based applications
  • S/w and H/w prices have dropped significantly

8
Why Data Warehouses Developed Change in
Business Needs
  • Economic Changes
  • Analysing profitability, controlling downsizing
    exercises
  • International operations
  • consolidating data centrally from a global market
    place
  • More competent end-user
  • as end-user skills have developed, data can be
    used more effectively, by more people in the
    organisation

9
Data Warehouse - Strengths
  • Improved speed of operating systems
  • Data accessible to more people
  • Brings data together in one place for analysis so
    increasing potential return on investment
  • Decision makers better equipped, which can lead
    to a competitive advantage

10
Data Warehouses - Weaknesses
  • Long build process can take up to 3 years
  • Integration is difficult
  • May narrow decision makers scope of thinking
  • Maintenance
  • Security of sensitive data

11
Data Marts - Overview
  • A subset of a data warehouse that supports the
    requirements of a particular functional
    area/department.
  • Normally have no detailed operational data just
    the summaries
  • Less data and less complex structure than data
    warehouses so easier to build and to navigate.

12
Data Marts - Benefits
  • Provide access to most-frequently used data
  • Customise to needs of small groups
  • Increase response time
  • Reduce build time
  • Reduce set-up cost

13
Data Marts - Issues
  • Size and functionality overlap with those of
    smaller data warehouses data mining and OLAP
    tools used.
  • Performance likely to reduce as size and
    complexity increase affects user response and
    data load times.
  • Having multiple data marts needs central control
    of consistency, integrity, version control,
    security.

14
Analytical Methods
  • On-line Analytical Processing (OLAP)
  • The dynamic synthesis, analysis and consolidation
    of large volumes of multi-dimensional data
  • Data Mining
  • The process of extracting valid, previously
    unknown, comprehensible and actionable
    information from large databases and using it to
    make crucial decisions
  • (Connolly Begg, 2005)

15
Requirements of OLAP Applications
  • Multi-dimensional views of data
  • Support for complex calculations
  • Time intelligence

16
Multi-dimensional Data (2-D)
As multi-dimensional
As a table
17
Multi-dimensional Data (3-D)
As a table
As multi-dimensional
18
Multi-dimensional Data organisation and storage
  • There may be any number of dimensions - the more
    dimensions, the slower the response
  • Pre-aggregate (i.e. summarise/consolidate) as
    many subtotals in each dimension as possible.
  • Compress data for faster access
  • May facilitate this by storing dense and sparse
    data separately
  • Select physical storage organisation appropriate
    to the data

19
Multi-dimensional Data - Operations
  • Consolidation aggregation/roll-ups
  • Drill-down display the detailed data (opposite
    of consolidation)
  • Slicing and Dicing look at data from different
    viewpoints (pivoting).

20
OLAP Tools Categories(Berson Smith 1997)
  • Multi-dimensional OLAP
  • Relational OLAP
  • Hybrid OLAP
  • Desktop OLAP

21
Multi-dimensional OLAP (MOLAP)
  • Data structures use arrays and data aggregated
    according to predicted use (relatively static)
  • Efficient storage methods (inc. sparse data
    management) but only for limited volumes
  • Good performance on predicted analyses
  • Limited possibilities for non-predicted analyses
  • Needs a Multi Dimensional DBMS new skills needed

22
MOLAP
RDBMS or Legacy System
Query
Load data
Client
Analysis results
MOLAP Server
23
Relational OLAP (ROLAP)
  • Uses RDBMS with meta-data so multi-dimensional
    data structures can be created according to
    requirements
  • Some use enhanced SQL, some use highly normalised
    designs e.g. star schema
  • Poorer performance on complex queries
  • Middleware to convert relational tables to
    multi-dimensional structure

24
ROLAP
Client
query
SQL
RDBMS
Results
Analysis results
ROLAP Server
25
The Data Mining process
  • The data mining process involves (Saarenvirta,
    2001) -
  • Define the problem
  • Select data
  • Prepare the data
  • Mine the data
  • Deploy the model
  • Take business action

26
Data Mining -
  • A method of analysis of data to identify
    relationships such as
  • Associations - When one event can be correlated
    to another event.
  • Sequences - One event leading to another later
    event.
  • Classification The recognition of patterns and
    a resulting new organisation of data.
  • Clustering Finding and visualising groups of
    facts not previously known.
  • Forecasting Simply discovering patterns in the
    data that can lead to predictions about the
    future.

27
Data Mining Tools
  • Cluster Analysis
  • A data reduction technique that is useful to
    group cases with similar characteristics based on
    similar characteristics such as purchase
    behaviour or financial information.
  • Factor Analysis
  • This is also a data reduction technique. It
    builds a model that can be used to identify
    underlying factors, factors that might not be
    noted otherwise.

28
Data Mining Tools (cont)
  • Data Visualisation
  • This shows the data in a graphical form that can
    be interpreted by the user and make deviations in
    the data easier to identify. Data visualisation
    enables a user to intuitively understand the data
    and therefore works well with data mining. If it
    is used without data mining the visualisation can
    receive too much data.
  • Decision Trees
  • Decision trees define a set of rules, which
    affect a target variable. The characteristics of
    the data can then be defined into certain rules.
    Data can then be classified into different groups
    as it progresses down the tree structure

29
Data Mining Tools (cont)
  • Rule Induction
  • This is a technique to classify data by finding a
    set of If / Then rules to do this. The
    technique looks for interesting patterns within
    the data and is more powerful than decision
    trees. One problem with rule induction is that a
    large number of different rules may be generated,
    they may contradict each other and also may not
    cover all the situations in the data.
  • Neural Networks
  • Neural networks are good at finding patterns or
    trends. Neural networks derive meaning from
    complicated data that would not be picked up on
    by humans or other computer systems. The network
    is a structure that has the power to learn.

30
Data Mining Problems
  • Limited Information
  • Noise and missing values
  • Uncertainty
  • Size, updates and irrelevant fields
Write a Comment
User Comments (0)
About PowerShow.com