Data Mining and Data Warehousing - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Data Mining and Data Warehousing

Description:

Data Mining and Data Warehousing Data Warehousing The Data Warehouse A data warehouse differs from a conventional database in a number of ways: Volume of data: A data ... – PowerPoint PPT presentation

Number of Views:145
Avg rating:3.0/5.0
Slides: 38
Provided by: ifmAcTzs
Category:

less

Transcript and Presenter's Notes

Title: Data Mining and Data Warehousing


1
Data Mining and Data Warehousing
  • Data Warehousing

2
Evolution Of Data Warehouse
  • Since the 1970s, organizations have gained
    competitive advantage through automation of
    business processes to offer more efficient and
    cost-effective services to customers
  • This resulted in accumulation of growing amounts
    of data in operational databases
  • Organizations now focus on ways to use
    operational data to support decision-making, as a
    means of gaining competitive advantage
  • However, operational systems were never designed
    to support such business activities

3
The Data Warehouse
  • A data warehouse is a relational database that is
    designed for query and analysis rather than for
    transaction processing
  • It usually contains historical data derived from
    transaction data, but it can include data from
    other sources
  • It separates analysis workload from transaction
    workload and enables an organization to
    consolidate data from several sources to business
    users

4
The Data Warehouse
  • A data warehouse differs from a conventional
    database in a number of ways
  • Volume of data A data warehouse is likely to
    hold far more data from operational database.
  • Volume could be 400 gigabytes.
  • Diverse data sources The data stores in a data
    warehouse come from different data
    sources/application systems.

5
The Data Warehouse
  • Dimensional access A warehouse is designed to
    fulfill various ways in which users may wish to
    retrieve data.

6
Data Warehouse Definitions
  • A copy of transaction data, specifically
    structured for query and analysis Ralph Kimball
  • A data warehouse is a simple, complete and
    consistent store of data obtained from a variety
    of sources and made available to end users in a
    way they can understand and use it in a business
    context IBM

7
Data Warehouse Definitions
  • A data warehouse is a subject-oriented,
    integrated, time-variant, and non-volatile
    collection of data in support of managements
    decision-making process Bill Inmon
  • Bill Inmon is regarded as father of
  • Data Warehousing

8
Data Warehouse - Subject-Oriented
  • Organized around major subjects, such as
    customer, product, sales
  • Focusing on the modeling and analysis of data for
    decision makers, not on daily operations or
    transaction processing
  • Provide a simple and concise view around
    particular subject issues by excluding data that
    are not useful in the decision support process

9
Data Warehouse - Subject-Oriented (cont)
10
Data Warehouse - Integrated
  • Constructed by integrating multiple,
    heterogeneous data sources
  • Relational databases, flat files, on-line
    transaction records
  • Data cleaning and data integration techniques are
    applied
  • Ensure consistency in naming conventions,
    encoding structures, attribute measures, etc.
    among different data sources
  • E.g., Hotel price currency, tax, breakfast
    covered, etc.
  • When data is moved to the warehouse, it is
    converted

11
Data Warehouse Integrated (cont)
12
Data Warehouse - Time Variant
  • The time horizon for data warehouses is much
    longer than that of operational systems
  • Operational database current value data
  • Data warehouse data provide information from a
    historical perspective (e.g., past 5-10 years)
  • Every key structure in the data warehouse
  • Contains an element of time, explicitly or
    implicitly
  • But the key of operational data may or may not
    contain time element
  • Need to decide how frequently data warehouse is
    updated

13
Data Warehouse - Non-Volatile
  • A physically separate store of data transformed
    from the operational environment
  • Operational update of data does not occur in the
    data warehouse environment
  • Does not require transaction processing,
    recovery, and concurrency control mechanisms
  • Requires only two operations in data accessing
  • Initial loading of data and access of data

14
Data Warehouse - Non-Volatile (cont)
15
Data Warehouse Environment Capabilities
  • A data warehouse environment typically includes
  • Extraction
  • Transformation
  • Loading (ETL) solution
  • An online analytical processing (OLAP) engine
  • Other applications that manage the process of
    gathering data and delivering it to business users

16
What Is OLAP?
  • Online Analytical Processing (OLAP) is an
    industry-accepted reporting technology that
    provides high-performance analysis and easy
    reporting on large volumes of data
  • The goal of OLAP, also known as multidimensional
    data analysis, is to provide fast and flexible
    data summarization, analysis, and reporting
    capabilities with the ability to view trends over
    time

17
What Is OLAP?
  • OLAP is term coined by Tedd Codd to describe the
    architecture that supports complex analytical
    processing such as consolidation, drilling down,
    and pivoting
  • Consolidation comprises the aggregation of data
    such as module data being aggregated into courses
    data, and courses data being aggregated into
    department data
  • Drilling Down Disaggregating data such as
    breaking down school-data into particular courses

18
What Is OLAP?
  • Pivoting sometimes referred to as slice and
    dicing comprises the ability to analyse the same
    data into different viewpoints, frequently along
    time axis.

19
What Is OLAP?
  • OLAP applications, also called decision support
    systems (DSS), have the following features
  • Enable users to look at different relationships
    in data by looking beyond traditional
    two-dimensional row and column data analysis
  • Offer high-performance access to large amounts of
    presummarized data
  • Give users the power to retrieve answers to
    multi-dimensional business questions quickly and
    easily
  • Provide slice-and-dice views of multiple
    relationships in large quantities of
    presummarized data

20
Data Warehouse Vs. Operational DBMS
  • OLTP (on-line transaction processing)
  • Major task of traditional relational DBMS
  • Day-to-day operations purchasing, inventory,
    banking, manufacturing, payroll, registration,
    accounting, etc.
  • OLAP (on-line analytical processing)
  • Major task of data warehouse system
  • Data analysis and decision making

21
OLTP Vs. Data Warehouse
OLTP Data Warehouse
Users Clerk, IT professional Knowledge worker
Function Day to day operations Decision support
DB Design Application-oriented Subject-oriented
Data Current, up-to-date detailed, flat relational Isolated Historical, summarized, multidimensional, integrated, consolidated

Records Accessed Tens Millions
Users Thousands Hundreds
DB Size 100MB-GB 100GB-TB
22
Benefits of data warehousing
  • A data warehouse is seen to deliver three major
    benefits to organisations
  • It provides a single manageable structure for
    decision-support data.
  • A data warehouse enables organisational users to
    run complex queries on data that traverses a
    number of business areas.
  • A data warehouse enables a number of business
    intelligence applications such as on-line
    analytical processing and data mining.

23
Benefits of data warehousing
  • The overall objective for a data warehouse is to
    increase the productivity and effectiveness of
    decision-making in organisations. This, in turn,
    is expected to deliver competitive advantage to
    deliver competitive advantage to organisation.

24
Data Warehouse Architecture
25
Building a Data Warehouse
  • The main stages of getting data into the data
    warehouse are
  • Data Extraction
  • Data Cleaning
  • Data Transformation
  • Data Loading
  • Once the data is loaded it needs to be put into
    a suitable format
  • ER model
  • Star Schema

26
Data Extraction
  • Process of copying the data from the
    transactional databases in preparation for
    loading it into the data warehouse
  • The data is likely to come from several
    transactional databases
  • Some of the data entering into this process may
    come from outside of the company (data enrichment)

27
Data Extraction (cont)
  • Internal
  • Manufacturing, Accounting, HR, etc.
  • Legacy
  • External
  • Competitor Data
  • Economic Data
  • Demographic Data
  • Credit Data

28
Data Cleaning
  • Transactional data can have all kinds of errors
    in it
  • Data warehouses are very sensitive to data errors
  • Data errors must be cleaned or cleansed or
    scrubbed as the data is loaded into the data
    warehouse
  • Get data into a consistent state

29
Data Transformation
  • Data extracted from transactional databases must
    go through several kinds of data transformation
    on its way to a data warehouse
  • Data from different transactional databases being
    merged to form the data warehouse tables
  • Data will often be aggregated as it is being
    extracted from the transactional databases and
    prepared for the data warehouse
  • Units of measure used for attributes in different
    transactional databases must be reconciled as
    they are being merged into common data warehouse
    tables

30
Data Transformation
  • Coding schemes used for attributes in different
    transactional databases must be reconciled as
    they are being merged into common data warehouse
    tables
  • Sometimes values from different attributes in
    transactional databases are combined into a
    single attribute in the data warehouse (e.g.,
    employee name)

31
Data Loading
  • After all of the extracting, cleaning, and
    transforming, the data is ready to be loaded into
    the data warehouse
  • Data will be loaded into a loading or working
    area in the database
  • Some of the previous steps may have been done in
    the database
  • Data may have to go through a number of stages
    dividing up the data and merging with other data
  • When the above has been done the Star Schemas are
    populated with the new, time specific data

32
Data Loading (cont)
  • A schedule for regularly updating the data
    warehouse must be put in place
  • Frequency of updates is important
  • Time taken to get to this point is important

33
Data Warehouse Queries
  • Types of queries that a data warehouse is
    expected to answer ranges from the relatively
    simple to the highly complex and is dependent on
    the type of end-user access tools used
  • End-user access tools include
  • Reporting, query, and application development
    tools
  • Executive information systems (EIS)
  • OLAP tools
  • Data mining tools

34
Steps in Building DW
  • Users specify information needs
  • Analysts and users create a logical and physical
    design
  • Sources of data is scrubbed, extracted and
    transformed
  • Data is transferred and loaded into the warehouse
    periodically
  • Users are given the access to warehouse
  • The warehouse is maintained in terms of changing
    requirements

35
Typical Data Warehouse Queries
  • Examples include
  • What was total IFM revenue in 3rd quarter of
    2006?
  • What was total revenue for property sales for
    each type of property in Tanzania in 2006?
  • What are the three most popular areas in each
    city for the renting of property in 2003 and how
    does this compare with the figures for the
    previous two years?
  • What would be effect on property sales in the
    different regions of Europe if legal costs went
    up by 3.5 and Government taxes went down by 1.5
    for properties over 250,000?
  • What is monthly revenue for property sales at
    each branch office, compared with rolling
    12-monthly prior figures?

36
Benefits Of Data Warehousing
  • Gives the data you want, in a suitable format
  • Removes inconsistency of reporting
  • Gives one consistent picture of the data. i.e. It
    provides single manageable structure for decision
    support data.
  • Potential high returns on investment

37
Benefits Of Data Warehousing
  • Enable users to run complex queries on data that
    traverses a number of business areas.
  • Competitive advantage
  • Increased productivity of corporate
    decision-makers
Write a Comment
User Comments (0)
About PowerShow.com