Title: Multisource materialized views maintenance
1Multi-source materialized views maintenance
- Josep Silva, Jorge Belenguer, Matilde Celma
- Dpto. de Sistemas Informáticos y Computación
- Universidad Politécnica de Valencia. (España)
2Multi-source materialized views maintenance
- Outline
- Problem statement
- Goal.
- Propossal.
- Conclusions.
3Problem statement
E.T.L process
ROLAP
Reports tools
1
Operational Data.
metadata
summarized data
2
OLAP tools
summarized data
Operational Dat.
detailed data
DW
Data Staging
3
ROLAP
Operational Data.
Data Mining tools
Operational databases
Data Warehouse
Analysis tools
4Problem statement
Data Warehouse maintenance.
From a conceptual point of view a DW can be
considered as a set of materialized views (Wi)
which are defined in terms of the tables (Ri)
stored in one or more operational databases.
A materialized view is a view which content is
explicitly stored.
5Problem statement
Data Warehouse maintenance.
The datawarehouse maintenance problem can be
formulated as a particular case of the
materialized views maintenance problem.
Wm
Rn
W2
R2
W1
R1
For each update on table Ri, the new content of
each relevant view Wk must be computed and
stored.
OLTP
DW
6Problem statement.
Operational Data 1.
R1
update ?Ri
Wi
maintenance
Operational Data 2.
Wi ? F (R1, R2, ?Ri ..., Rn)
R2
- ?X,Y, R. (projection)
- ? C R (selection)
DW
R ? S
(join)
Wi ? F (R1, R2,..., Rn)
Operational Data n.
Rn
multi-source materialized views
7Goal.
- We proposse a solution for the multi-source
materialized views maintenance problem. - Characteristics of the solution
- Use of auxiliary materialized views defined over
single data sources - Use of a metric which helps the designer to
chosse the optimal configuration of these
auxiliary views.
8Propossal.
Auxiliary views
Original views
Operational Data 1.
R1
V1
Wi
definition
definition
Operational Data 2.
R2
Vn
DW
Operational Data n.
Rn
Wi ? F (V1, V2,..., Vn)
multi-source materialized views
9Propossal.
Auxiliary views
Original views
Operational Data 1.
R1
?V1
V1
?Ri
Wi
Operational Data 2.
R2
?Vn
Vn
DW
Operational Data n.
Rn
Wi ? F (V1, V2,..., Vn)
multi-source materialized views
10Propossal.
Operational Systems
Table A
DataWareHouse
Table B
1
W1 ?W ? Vgt1000 (A?B) W2 ?X ? Xlt400 (B?C) W3
?X (C) W4 ?W (A?D)
Table C
2
2
Table D
3
3
11Propossal.
Operational Systems
DataWareHouse
Table A
Va ?W,V (A)
Table B
1
Vb ?W,X (B)
W1 ?W ? Vgt1000 (Va?Vb)
W2 ?X ? Xlt400 (Vb?Vc)
Table C
W3 Vc
2
Vc ?X (C)
2
W4 ?W (Va?Vd)
Table D
Vd ?V (D)
3
3
Level 1 (Original materialized views)
Level 2 (Auxiliary materialized views)
12Propossal.
Operational Systems
DataWareHouse
Table R
Table S
1
W1 ?P,Q ? C (A) ..
Table A
?
2
2
W2 ?Q,R ? C?D (A)
Table T
3
3
Level 1 (Original materialized views)
Level 2 (Auxiliary materialized views)
13Propossal.
V2 ?Q,R ? C?D (A)
V3 ?P,Q,R ? C?D (A)
V1 ?P,Q ? C (A)
Auxiliary views
V5 ?Q,R ? C?D (A)
V4 ?P ? C (A)
W1 ?P,Q ? C (A) ..
W2 ?Q,R ? C?D (A)
DW
Original materialized views
14Propossal.
V2 ?Q,R ? C?D (A)
V3 ?P,Q,R ? C?D (A)
V1 ?P,Q ? C (A)
Auxiliary views
V5 ?Q,R ? C?D (A)
V4 ?P ? C (A)
W1 ?P,Q ? C (A) ..
W2 ?Q,R ? C?D (A)
DW
Original materialized views
15Propossal.
V2 ?Q,R ? C?D (A)
V3 ?P,Q,R ? C?D (A)
V1 ?P,Q ? C (A)
Auxiliary views
V5 ?Q,R ? C?D (A)
V4 ?P ? C (A)
W1 ?P,Q ? C (A) ..
W2 ?Q,R ? C?D (A)
DW
Original materialized views
16Propossal.
What is the optimal set of views Vi defined on
table T which supply the information needed for
views Wj ?
Operational Systems
DataWareHouse
1
W1
V1
W2
Table T
V2
2
2
Vn
Wn
3
3
Level 1 (Original materialized views)
Level 2 (Auxiliary materialized views)
17Propossal.
Probability that one update on T affects view V1
C temporal maintenance cost.
K average of the temporal maintenance cost of a
materialized view due to an update on T.
MT number of updates (insertions, deletions,
modifications) on table T (before perform
maintenance).
Card(Vi) cardinality of Vi Grad(Vi) fields in
Vi
18Propossal.
K average of the temporal maintenance cost of a
materialized view due to an update on T.
C temporal maintenance cost.
? number of times that one update on a
materialized view produces another update on any
other view .
Card(Vi) cardinality of Vi Grad(Vi) fields in
Vi
19Case study.
Table T
DW materialized views use information from table
T combining condictions A B, C, D, E, F and
fields W, X, Y, Z.
20Case study.
No disjoint conditions Disjoint
conditions Partially disjoint conditions
Same projection
No disjoint conditions Disjoint
conditions Partially disjoint conditions
Distinct projection
No disjoint conditions Disjoint
conditions Partially disjoint conditions
Partially distinct projection
21Case study.
DW materialized views
Condition A
No Condition
Condition A
Condition B
Condition A
Condition D
22Case study.
DW materialized views
Condition A
No Condition
CASE 1 Solution 1 V1 ? W,X ? A (T), V2 ?
W,X (T) Maintenance cost C1 Solution 2
V1 ? W,X ? A (T), V3 ? W,X ? ?A (T)
Maintenance cost C2 Solution 3 V2 ? W,X
(T) Maintenance cost C3 C3 lt C2 y
C3 lt C1 The best solution is to use only one
auxiliary view on table T view V2
Condition A
Condition B
Condition A
Condition D
23Case study.
DW materialized views
Condition A
No Condition
Condition A
Condition B
CASE 2 Solution 1 V1 ? W,X ? A (T), V2 ?
W,X ? B (T) Maintenance cost C1 Solution 2
V3 ? W,X ? A ? B (T) Maintenance
cost C2 C2 lt C1 The best solution is to use
only one auxiliary view on table T view V3
Condition A
Condition D
24Case study.
CASE 3 Solution 1 V1 ? W,X ? A (T), V2 ?
W,X ? D (T) Maintenance cost
C1 Solution 2 V1 ? W,X ? A (T), V3 ? W,X ?
D ? (?A) (T) Maintenance cost C2 Solution
3 V2 ? W,X ? D (T), V4 ? W,X ? A ? (?D) (T)
Maintenance cost C3 Solution 4 V3 ? W,X
? D ? (?A) (T), V4 ? W,X ? A ? (?D) (T)
V5 ? W,X ? D ? A (T)
Maintenance cost C4 Solution 5 V6 ? W,X ?
A ? D (T) Maintenance
cost C5 The best solution is to use only one
auxiliary view on table T view V6
Condition A
No Condition
Condition A
Condition B
Condition A
Condition D
25Case study.
DW materialized views
Condition F
Condition B
Condition E
Condition C
Condition F
Condition B
26Case study.
DW materialized views
Condition F
Condition B
Condition E
Condition C
Condition F
Condition B
In cases 4, 5, 6 the best solution depends on the
parameter ? and on the cardinality of the views.
27Case study.
Field W
Field Y
Condition E
Condition F
Condition B
Condition F
- In case 7 the best solution is to use a single
view - In case 8 the best solution depends on the
parameter ? and on the cardinality of views - The case 9 is more complex and their study can
be found in http//www.dsic.upv.es/jsilva/rese
arch.htmtechs
28Conclusions.
- Conclusions
- In this work we have propossed a solution for the
datawarehouse maintenance problem in the case of
multiple data sources. - Characteristics of the solution
- Use of auxiliary materialized views defined over
single data sources - Use of a metric which helps the designer to
chosse the optimal configuration of these
auxiliary views.