DATA WAREHOUSING - PowerPoint PPT Presentation

About This Presentation
Title:

DATA WAREHOUSING

Description:

POBox 750122. Dallas, Texas 75275-0122. 214-768-3087 ... Outside of the facts, each diemnsion is shown separately in dimension tables ... Interesting Articles ' ... – PowerPoint PPT presentation

Number of Views:355
Avg rating:3.0/5.0
Slides: 44
Provided by: SeaS59
Learn more at: https://s2.smu.edu
Category:

less

Transcript and Presenter's Notes

Title: DATA WAREHOUSING


1
  • DATA WAREHOUSING
  • INFORMATION RETRIEVAL
  • Margaret H. Dunham
  • Department of Computer Science and Engineering
  • Southern Methodist University
  • POBox 750122
  • Dallas, Texas 75275-0122
  • 214-768-3087
  • mhd_at_engr.smu.edu
  • The contents of this presentation draw
    extensively from slides for
  • Data Mining, Introductory and Advanced Topics, by
    Margaret H. Dunham, Prentice Hall, 2003.

2
DWIR Outline
  • Introduction
  • Data Warehousing
  • Research
  • Summary

3
DWIR Outline
  • Introduction
  • Data Warehousing Overview
  • Information Retrieval
  • Data Warehousing
  • Research
  • Summary

4
Data Warehousing
  • Subject-oriented, integrated, time-variant,
    nonvolatile William Inmon
  • http//www.inmondatasystems.com/
  • Operational Data Data used in day to day needs
    of company.
  • Informational Data Supports other functions such
    as planning and forecasting.
  • Data mining tools often access data warehouses
    rather than operational data.

5
Data Warehouse Variations
  • Data Mart Subset of complete data warehouse
  • Virtual Warehouse Warehouse implemented as a
    view of operational data

6
Operational vs. Informational
7
Information Retrieval
  • Information Retrieval (IR) retrieving desired
    information from textual data.
  • Library Science
  • Digital Libraries
  • Web Search Engines
  • Traditionally keyword based
  • Sample query
  • Find all documents about data mining
  • IR being applied to other unformatted data

8
DB vs IR
  • Records (tuples) vs. documents
  • Well defined results vs. fuzzy results
  • DB grew out of files and traditional business
    systesm
  • IR grew out of library science and need to
    categorize/group/access books/articles

9
DB vs IR (contd)
  • Data retrieval
  • which docs contain a set of keywords?
  • Well defined semantics
  • a single erroneous object implies failure!
  • Information retrieval
  • information about a subject or topic
  • semantics is frequently loose
  • small errors are tolerated
  • IR system
  • interpret contents of information items
  • generate a ranking which reflects relevance
  • notion of relevance is most important

10
Information Retrieval (contd)
  • Similarity measure of how close a query is to a
    document.
  • Documents which are close enough are retrieved.
  • Metrics
  • Precision Relevant and Retrieved
  • Retrieved
  • Recall Relevant and Retrieved
  • Relevant

11
IR Query Result Measures and Classification
IR
Classification
12
DWIR Outline
  • Introduction
  • Data Warehousing
  • Dimensional Modeling
  • OLAP
  • Decision Support Systems
  • Research
  • Summary

13
Data Transformation for Data Warehouse
  • ETL Extract, Transform, Load
  • Unwanted data must be removed
  • Convert heterogeneous sources into one common
    schema
  • As the operational data is probably a snapshot of
    the data, multiple snapshots may need to be
    merged to create the historical view
  • Summarize data
  • New derived data
  • Handle missing and erroneous data

14
Data Warehouse Creation
Fig 1 1
15
Dimensional Modeling
  • View data in a hierarchical manner more as
    business executives might
  • Useful in decision support systems and mining
  • Dimension collection of logically related
    attributes axis for modeling data.
  • Facts data stored
  • Ex Dimensions products, locations, date
  • Facts quantity, unit price

16
Multidimensional Model Example
Fig 2 1
17
Cube view of Data
Fig 4 1
18
Aggregation Hierarchies
19
Multidimensional Schemas
  • Star Schema shows facts and dimensions
  • Center of the star has facts shown in fact tables
  • Outside of the facts, each diemnsion is shown
    separately in dimension tables
  • Access to fact table from dimension table via
    join
  • SELECT Quantity, Price
  • FROM Facts, Location
  • Where (Facts.LocationID Location.LocationID)
    and
  • (Location.City Dallas)
  • View as relations, problem volume of data and
    indexing

20
Star Schema
21
Flattened Star
22
Normalized Star
23
Snowflake Schema
24
OLAP
  • Online Analytic Processing (OLAP) provides more
    complex queries than OLTP.
  • OnLine Transaction Processing (OLTP) traditional
    database/transaction processing.
  • Dimensional data cube view
  • Support ad hoc querying
  • Require analysis of data
  • Can be thought of as an extension of some of the
    basic aggregation functions available in SQL
  • OLAP tools may be used in DSS systems
  • Mutlidimentional view is fundamental

25
OLAP Implementations
  • MOLAP (Multidimensional OLAP)
  • Multidimential Database (MDD)
  • Specialized DBMS and software system capable of
    supporting the multidimensional data directly
  • Data stored as an n-dimensional array (cube)
  • Indexes used to speed up processing
  • ROLAP (Relational OLAP)
  • Data stored in a relational database
  • ROLAP server (middleware) creates the
    multidimensional view for the user
  • Less Complex Less efficient
  • HOLAP (Hybrid OLAP)
  • Not updated frequently MDD
  • Updated frequently - RDB

26
OLAP Operations
Roll Up
Drill Down
Single Cell
Multiple Cells
Slice
Dice
27
OLAP Operations
  • Simple query single cell in the cube
  • Slice Look at a subcube to get more specific
    information
  • Dice Rotate cube to look at another dimension
  • Roll Up Dimension Reduction Aggregation
  • Drill Down
  • Visualization These operations allow the OLAP
    users to actually see results of an operation.

28
Relationship Between Topcs
29
Decision Support Systems
  • Tools and computer systems that assist management
    in decision making
  • What if types of questions
  • High level decisions
  • Data warehouse data which supports DSS

30
Data Warehouse Links
  • OLAP
  • http//www.olapreport.com/
  • General Data Warehousing
  • http//www.inmoncif.com/home/
  • http//www.datawarehouseconsulting.com/
  • http//www.datawarehousing.com/
  • http//www.dw-institute.com/
  • DW Products
  • http//www-306.ibm.com/software/data/informix/redb
    rick/
  • http//www.oracle.com/solutions/business_intellige
    nce/dw_home.html
  • http//www.sas.com/technologies/dw/index.html
  • http//msdn2.microsoft.com/en-us/library/aa545535.
    aspx
  • http//www.sybase.com/detail?id1027323
  • Interesting Articles
  • Teaching Effective Methodologies to Design a
    Data Warehouse, by Behrooz Seyed-Abbassi
  • http//isedj.org/isecon/2001/35c/ISECON.2001.Seyed
    -Abbassi.pdf
  • An Oracle DBAs Guide to the OLAP Option, by by
    Mark Rittman
  • http//www.dbazine.com/datawarehouse/dw-articles/r
    ittman1

31
DWIR Outline
  • Introduction
  • Data Warehousing
  • Research
  • Bibliomining
  • Hierarchical Multimedia IR
  • Ontology-based OLAP IR
  • Summary

32
Bibliomining 2,3
  • Data Warehousing Data Mining Libraries
  • Abstract, cleanse, summarize library data
  • Documents
  • Users (including demographics)
  • Circulation Records (including Web server
    records)
  • Privacy of utmost importance
  • http//www.bibliomining.com/nicholson/biblioproces
    s.htm 2
  • http//bibliomining.com/nicholson/nicholsonbiblioi
    ntro.html 3

33
Hierarchical Multimedia IR 4
  • DW Approach to Multimedia IR
  • Allows easier integration of multiple data types
  • Facilitates indexing
  • Facilitates searching
  • Allows data to be stored at many different
    granularities and dimensions
  • Data aggregation
  • data warehouses are not just large databases
    they are large, complex environments that
    integrate many technologies p729
  • Multimedia starflake schema
  • Denormalized star dimension table
  • Normalized snowflake tables

34
Starflake
Fig 2 4
35
Hierarchy of Data Cubes
Fig 4 4
36
Ontology-Based OLAP IR 5
  • Combine structured and document data obtained
    from Web
  • Global Ontology
  • Includes OLAP dimensions
  • Contains resource metadata
  • RDF based
  • IR based on
  • Both queries and resources represented as RDF
    metadata
  • http//www.w3.org/RDF/

37
Ontology OLAPIR Architecture
Fig 1 5
38
OLAP Dimensions in RDF
Fig 2 5
39
RDF Query
Fig 6 5
40
DWIR Outline
  • Introduction
  • Data Warehousing
  • Research
  • Summary

41
Summary
  • Information Retrieval is being extended to many
    different data types
  • Multimedia
  • Data warehouse
  • Data Warehousing is being extended beyond the
    basic business domain
  • Little research in combining DW and IR
  • Integrating Unstructured Text into the Structured
    Environment The Value Proposition, by Bill
    Inmon
  • http//www.inmondatasystems.com/whitepapers/integr
    atingunstructured.pdf

42
Bibliography
  • 1 Anne-Muriel Arigon, Anne Tchounikine, and
    Maryvonne Miquel, Handling Multiple Points of
    View in a Multimedia Data Warehouse, ACM
    Transactions on Multimedia Computing,
    Communications and Applications, Vol. 2, No. 3,
    August 2006, Pages 199218.
  • 2 S. Nicholson, The Bibliomining Process
    Data Warehousing and Data Mining for Library
    Decision-Making, Information Technology and
    Libraries, 22(4), 2003.
  • 3 S. Nicholson, The Basis for Biliomining
    Frameworks for Bringing Together Usage-Based Data
    Mining and Bibliometrics through Data Warehousing
    in Digital Library Services, Information
    Processing Management, 42(3), May 2006, pp
    785-804.
  • 4 Jane You, Tharam Dillon, James Liu, Edwige
    Pissaloux, On Hierarchical Multimedia
    Information Retrieval, You, J. Proceedings of
    the 2001 International Conference on Image
    Processing, 7-10 Oct 2001, pp 729 732.
  • 5 Torsten Priebe and Gunther Pernul,
    Ontology-based Integration of OLAP and
    Information Retrieval, Proceedings of the 14th
    International Workshop on Database and expert
    Systems Applications, 2003.

43
Thank You
Write a Comment
User Comments (0)
About PowerShow.com