Using AutoMed for Data Warehousing - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Using AutoMed for Data Warehousing

Description:

School of Computer Science & Information Systems. Birkbeck college, University of London ... A study of general data warehousing techniques ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 30
Provided by: Hao99
Category:

less

Transcript and Presenter's Notes

Title: Using AutoMed for Data Warehousing


1
Using AutoMed for Data Warehousing
  • Hao Fan
  • School of Computer Science Information Systems
  • Birkbeck college, University of London
  • 8th April 2003

2
The AutoMed
  • The Automed is a data transformation and
    integration system, could support both virtual
    and materialized integrated schemas expressed in
    multiple modelling languages.
  • HDM (Hypergraph-based Data Model)
  • IQL (Intermediate Query Language)
  • Transformation Pathways

3
Outline of presentation
  • A study of data warehousing
  • Generally using AutoMed for data warehousing
    activities
  • Data lineage tracing in AutoMed
  • Future work
  • Possible outline of PhD thesis

4
Data Warehouse Framework (1-3)
5
Data Warehouse Framework (2-3)
  • Data Sources
  • The data warehouse
  • Staging Area
  • Detailed Data
  • Summarized Data
  • Data Marts
  • Meta Data

6
Data Warehouse Framework (3-3)
  • End-User Application
  • OLAP severs
  • Query applications
  • Analytic application
  • Data mining tools
  • Query rewriting tools
  • And so on

7
Data Warehouse Modelling
  • ER modelling
  • Dimensional Data Modelling
  • Facts
  • Dimensions
  • Measures
  • Two basic dimensional models
  • Star Model
  • Snowflake Model

8
Data Warehouse Processes ELT(1-4)
  • Extraction in data warehouse
  • Full extraction
  • the data is extracted completely from the
  • data sources
  • Incremental extraction
  • only the data that has changed since a well
  • defined event back in history will be
  • extracted

9
Data Warehouse Processes ELT(2-4)
  • Data Cleaning in data warehouse
  • Schema-level problem
  • Naming conflicts
  • Structural conflicts etc.
  • Physical-level problem
  • Illegal values
  • Duplicated records
  • Word transpositions etc.

10
Data Warehouse Processes ELT(3-4)
  • Transforming in data warehouse
  • Transforming source data structure into data
  • warehouse data structure
  • Multistage data transformation
  • Separate operation transformation, need temporary
    tables,
  • more easily but require more space and time
  • Pipelined data transformation
  • Transformation-while-loading, no temporary
    tables,
  • Save space and time but more difficult to monitor

11
Data Warehouse Processes ELT(4-4)
  • Loading in data warehouse
  • Moving data from one data system to
  • another system.
  • Summarizing in data warehouse
  • Lightly summarizing
  • Highly summarizing

12
Data Warehouse Processes DWM
  • Data warehouse maintenance
  • Logic methods
  • Full recomputing
  • Incremental refreshing
  • Physical methods
  • Periodically
  • Immediate
  • Deferred
  • VM strategies PI , PR , II , IR , DI , DR

13
Data Warehouse Processes DLT
  • Data lineage tracing in DW
  • this is one of my research focuses
  • Other processes in DW
  • Query rewriting
  • Answering queries using views.
  • Data mining
  • Knowledge discovery in databases

14
Research Problems in Data Warehousing
  • Data warehouse design
  • Data warehouse implementation
  • Meta data management
  • Data warehouse update
  • Data lineage tracing
  • Data warehouse evolution and optimization

15
Using AutoMed for Data Warehousing
AutoMed Data Warehouse (AutoDW) a data warehouse
built and maintained using the AutoMed approach.
  • Two parts of an AutoDW
  • The AutoMed repository, saving data schemas,
    data transformation pathways, and managed by the
    AutoMed Repository API.
  • The data warehouse, the exact warehouse saving
    data used for DSS processing.

16
Designing AutoDW
Two phases for building an AutoDW
  • Logic phase
  • The process for creating data schema,
    transformation
  • Pathway, and other AutoMed meta data.
  • Physical phase
  • The process for creating the physical data
    warehouse.

17
Modelling AutoDW
Representing dimensional data model with HDM
18
View Maintenance and Data Lineage Tracing in
AutoDW
The aim for designing View Maintenance and Data
Lineage Tracing algorithms in AutoDW is to make
them applying in a heterogeneous data warehouse
environment.
19
Data Cleaning in AutoDW
Two scenarios for data cleaning in AutoDW
  • Modelling data cleaning
  • Creating the data schemas and transformation
    pathways to describe data cleaning process.
  • Practising data cleaning
  • Using existing data cleaning tools to practice
    data cleaning.

20
Query rewriting in AutoDW
The idea of both as view (BAV) is the means for
using AutoMed for queries rewriting It is
possible to extract a definitions of the global
schemas as view over the local schema, And It is
also possible to extract a definitions of the
local schemas as view over the global schema.
21
Other Interesting Topic of using the AutoMed
Approach
  • Data Mining
  • Database Access Control

22
Data Lineage Tracing in AutoMed (1-4)
  • Affect-Pool Origin-Pool
  • Simple IQL queries

23
Data Lineage Tracing in AutoMed (2-4)
  • DLT formulae for SIQL (1)

24
Data Lineage Tracing in AutoMed (3-4)
  • DLT formulae for SIQL (2)

25
Data Lineage Tracing in AutoMed (4-4)
  • DLT algorithms
  • affectPoolOfTuple(t,O)
  • Tracing lineage of a tuple
  • affectPoolofSet(T,O)
  • Tracing lineage of a set of tuple
  • traceAffectPool(B,O)
  • Tracing lineage of a bag of tuple

26
Contribution So Far
  • A study of general data warehousing techniques
  • A general study of using the AutoMed approach for
    data warehousing
  • A approach of using the AutoMed approach for data
    lineage tracing in a limited case

27
Further Aims
  • A approach of using the AutoMed approach for data
    lineage tracing in general case
  • A approach of using the Automed approach for data
    warehouse maintenance
  • An integrated architecture of using the Automed
    approach for data warehousing
  • A testbed for implementing my approach
  • More applications of my DW architecture
  • A comparison between my approach and others

28
Future Work the possible outline of my PhD
thesis
  • Introduction
  • Data warehousing (done)
  • Using AutoMed for data warehousing (May, 03)
  • IVM in AutoMed (June, 03)
  • General IVM in Automed (0709, 03)
  • DLT in AutoMed (done)
  • General DLT in Automed (1012, 03)
  • An integrated architecture and implementation
    (0104, 04)
  • Other application of my DW architecture (0507,
    04)
  • Conclusions and future work (0810, 04)

29
Acknowledge
  • Prof. Alexandra Poulovassilis
  • School of Computer Science Information Systems
  • Birkbeck college, University of London
Write a Comment
User Comments (0)
About PowerShow.com