Title: Using AutoMed for Data Warehousing
1Using AutoMed for Data Warehousing
- Hao Fan
- School of Computer Science Information Systems
- Birkbeck college, University of London
- 8th April 2003
2The AutoMed
- The Automed is a data transformation and
integration system, could support both virtual
and materialized integrated schemas expressed in
multiple modelling languages. - HDM (Hypergraph-based Data Model)
- IQL (Intermediate Query Language)
- Transformation Pathways
3Outline of presentation
- A study of data warehousing
- Generally using AutoMed for data warehousing
activities - Data lineage tracing in AutoMed
- Future work
- Possible outline of PhD thesis
4Data Warehouse Framework (1-3)
5Data Warehouse Framework (2-3)
- Data Sources
- The data warehouse
- Staging Area
- Detailed Data
- Summarized Data
- Data Marts
- Meta Data
6Data Warehouse Framework (3-3)
- End-User Application
- OLAP severs
- Query applications
- Analytic application
- Data mining tools
- Query rewriting tools
- And so on
7Data Warehouse Modelling
- ER modelling
- Dimensional Data Modelling
- Facts
- Dimensions
- Measures
- Two basic dimensional models
- Star Model
- Snowflake Model
-
8Data Warehouse Processes ELT(1-4)
- Extraction in data warehouse
- Full extraction
- the data is extracted completely from the
- data sources
- Incremental extraction
- only the data that has changed since a well
- defined event back in history will be
- extracted
9Data Warehouse Processes ELT(2-4)
- Data Cleaning in data warehouse
- Schema-level problem
- Naming conflicts
- Structural conflicts etc.
- Physical-level problem
- Illegal values
- Duplicated records
- Word transpositions etc.
10Data Warehouse Processes ELT(3-4)
- Transforming in data warehouse
- Transforming source data structure into data
- warehouse data structure
- Multistage data transformation
- Separate operation transformation, need temporary
tables, - more easily but require more space and time
- Pipelined data transformation
- Transformation-while-loading, no temporary
tables, - Save space and time but more difficult to monitor
11Data Warehouse Processes ELT(4-4)
- Loading in data warehouse
- Moving data from one data system to
- another system.
- Summarizing in data warehouse
- Lightly summarizing
- Highly summarizing
12Data Warehouse Processes DWM
- Data warehouse maintenance
- Logic methods
- Full recomputing
- Incremental refreshing
- Physical methods
- Periodically
- Immediate
- Deferred
- VM strategies PI , PR , II , IR , DI , DR
13Data Warehouse Processes DLT
- Data lineage tracing in DW
- this is one of my research focuses
- Other processes in DW
- Query rewriting
- Answering queries using views.
- Data mining
- Knowledge discovery in databases
14Research Problems in Data Warehousing
- Data warehouse design
- Data warehouse implementation
- Meta data management
- Data warehouse update
- Data lineage tracing
- Data warehouse evolution and optimization
15Using AutoMed for Data Warehousing
AutoMed Data Warehouse (AutoDW) a data warehouse
built and maintained using the AutoMed approach.
- Two parts of an AutoDW
- The AutoMed repository, saving data schemas,
data transformation pathways, and managed by the
AutoMed Repository API. - The data warehouse, the exact warehouse saving
data used for DSS processing.
16Designing AutoDW
Two phases for building an AutoDW
- Logic phase
- The process for creating data schema,
transformation - Pathway, and other AutoMed meta data.
- Physical phase
- The process for creating the physical data
warehouse.
17Modelling AutoDW
Representing dimensional data model with HDM
18View Maintenance and Data Lineage Tracing in
AutoDW
The aim for designing View Maintenance and Data
Lineage Tracing algorithms in AutoDW is to make
them applying in a heterogeneous data warehouse
environment.
19Data Cleaning in AutoDW
Two scenarios for data cleaning in AutoDW
- Modelling data cleaning
- Creating the data schemas and transformation
pathways to describe data cleaning process. - Practising data cleaning
- Using existing data cleaning tools to practice
data cleaning.
20Query rewriting in AutoDW
The idea of both as view (BAV) is the means for
using AutoMed for queries rewriting It is
possible to extract a definitions of the global
schemas as view over the local schema, And It is
also possible to extract a definitions of the
local schemas as view over the global schema.
21Other Interesting Topic of using the AutoMed
Approach
- Data Mining
- Database Access Control
22Data Lineage Tracing in AutoMed (1-4)
- Affect-Pool Origin-Pool
- Simple IQL queries
23Data Lineage Tracing in AutoMed (2-4)
- DLT formulae for SIQL (1)
24Data Lineage Tracing in AutoMed (3-4)
- DLT formulae for SIQL (2)
25Data Lineage Tracing in AutoMed (4-4)
- DLT algorithms
- affectPoolOfTuple(t,O)
- Tracing lineage of a tuple
- affectPoolofSet(T,O)
- Tracing lineage of a set of tuple
- traceAffectPool(B,O)
- Tracing lineage of a bag of tuple
26Contribution So Far
- A study of general data warehousing techniques
- A general study of using the AutoMed approach for
data warehousing - A approach of using the AutoMed approach for data
lineage tracing in a limited case
27Further Aims
- A approach of using the AutoMed approach for data
lineage tracing in general case - A approach of using the Automed approach for data
warehouse maintenance - An integrated architecture of using the Automed
approach for data warehousing - A testbed for implementing my approach
- More applications of my DW architecture
- A comparison between my approach and others
28Future Work the possible outline of my PhD
thesis
- Introduction
- Data warehousing (done)
- Using AutoMed for data warehousing (May, 03)
- IVM in AutoMed (June, 03)
- General IVM in Automed (0709, 03)
- DLT in AutoMed (done)
- General DLT in Automed (1012, 03)
- An integrated architecture and implementation
(0104, 04) - Other application of my DW architecture (0507,
04) - Conclusions and future work (0810, 04)
29Acknowledge
- Prof. Alexandra Poulovassilis
- School of Computer Science Information Systems
- Birkbeck college, University of London