Diapositive 1 - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Diapositive 1

Description:

Omar Boussaid Riadh Ben Messaoud. R my Choquet St phane Anthoard ... Left_MLO, Right_CC, Right_MLO (CC: Cranio-Caudal ; MLO: Medio-Latral Oblique) ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 29
Provided by: omarbo
Category:

less

Transcript and Presenter's Notes

Title: Diapositive 1


1
Designing and Bulding warehouses in XML
Omar Boussaid Riadh Ben Messaoud Rémy
Choquet Stéphane Anthoard Laboratory ERIC,
University Lyon 2 Campus Porte des Alpes, 69676
Bron Cedex Omar.Boussaid_at_univ-lyon2.fr -
rbenmessaoud_at_ericuniv-lyon2.fr remy.choquestepha
nea_at_gmail.com http//eric.univ-lyon2.fr/
ADBIS 2006 - Thessaloniki, Hellas
2
Motivation
Complex data different format, different
supports
Example case study of a patient (general
information on the patient age, sex, etc,
images of scanner, the interrogations in the form
of recordings sound report handwritten doctors)
? It is necessary to structure data and to
homogenize them
? XML semi-structured organization of data.
Its capacity of self-description and its tree
structure give to this formalism a great
flexibility and a sufficient power to describe
complex data, heterogeneous and distributed
3
Motivation
4
Outline
  • Motivation
  • Related work
  • Our approach X-Warehousing
  • Formalization
  • Construction of an XML cube
  • Implementation
  • Case study
  • Conclusion and future directions

5
Related work
  • Baril et Bellahsène (2000) Dawax, View Manager
  • Pokorny (2001) XML Stars schema
  • Golfarelli et al. (2001) Dimensional model of
    Facts trees of attributes
  • Hümmer et al. (2003) Xcube
  • Rajugan et al. (2003) use of packages UML
  • Trujillo et al. (2004) directed approach object
  • Nassis et al. (2004) approach OO , repository
    xFacts et Virtual dimensions
  • Rusu et al. (2005) Warehouse XML
  • Park et al. (2005) XML-OLAP

6
Related work
  • Two different approaches
  • 1) Physical storage of XML documents in DW.
  • XML documents feed the DW.
  • XML is regarded as an effective technology
    supporting data.
  • Data are sligthly structured, adapted to the
    interworking, and to the exchange of information.
  • 2) Use of the XML formalism to design DW.
  • ? According to the traditionnal multidimensional
    models such as the star schema or the snowflake
    schema.

7
Approach X-Warehousing
Context general of our approach
pierre.jouve_at_eric.univ-lyon2.fr
8
Formalization
  • Définition Star XML diagram
  • That is to say (F,D) a star diagram, where
  • F is a whole of facts having m measures F.Mq, 1
    q m and
  • D Ds, 1 s r a whole of r dimension
    where each Ds contains a whole of ns attributes
    Ds.Ai, 1 i ns.
  • Le Star XML diagram of (F,D) is a diagram XML
    such as
  • F defines the element root in diagram XML
  • ? q ? 1, . . . ,m, F.Mq an attribute XML
    included in the element root defines
  • ?s ? 1, . . . , r, Ds is a under elements XML
    of the element root XML. There is as many of
    under elements XML that of size related to the
    unit the facts F
  • ?s ? 1, . . . , r et ? i ? 1, . . . , ns,
    Ds.Ai an attribute XML included in element XML Ds
    defines.

9
Description of a cube by an XML diagram
ltxselement name"F"gt ltxscomplexTypegt
ltxssequencegt ltxselement
name"D1" type"D1_Type" /gt
ltxselement name"D2" type"D2_Type" /gt
ltxselement name"D3" type"D3_Type" /gt
ltxselement name"D3" type"D3_Type"
/gt ltxselement name"D4"
type"D4_Type" /gt lt/xssequencegt
ltxsattribute name"F.M1" type"xsinteger" /gt
ltxsattribute name"F.M2"
type"xsinteger" /gt lt/xscomplexTypegt lt/xs
elementgt ltxscomplexType name"D1_Type"gt
ltxsattribute name"D1.A1" type"xsstring"
/gt lt/xscomplexTypegt ltxscomplexType
name"D2_Type"gt ltxsattribute name"D2.A1"
type"xsstring" /gt ltxsattribute
name"D2.A2" type"xsstring" /gt lt/xscomplexTypegt
ltxscomplexType name"D3_Type"gt
ltxsattribute name"D3.A1" type"xsstring" /gt
ltxsattribute name"D3.A2" type"xsstring"
/gt lt/xscomplexTypegt ltxscomplexType
name"D4_Type"gt ltxsattribute name"D4.A1"
type"xsstring" /gt lt/xscomplexTypegt
10
Formalization
Definition Hierarchical Dimension XML That it
to say H D1, . . . ,Dt, . . . ,Dl a
hierarchical dimension. The hierarchical
dimension XML is part of a diagram XML such as
D1 an element XML defines ? ? t ? 2, . .
. , l , Dt defines a under element XML of
element XML Dt-1 ? t ? 1, . . . , l , each
attribute in Dt defines an attribute XML included
in element XML Dt.
Définition Model in snowflakes XML That is to
say (F,H), a model in snowflakes where F is a
whole of facts having m measures F.Mq, q
m et H Hs, s r is a whole of r
independent hierarchies. The model in snowlfakes
XML de (F,H) is a diagram XML such as F
définit lélément XML racine du schéma XML ?q
? 1, . . . ,m, F.Mq element XML root of diagram
XML defines ?s ? 1, . . . , r, Hs as many
time of hierarchical dimensions XML, like under
element XML root which it is related to the whole
of facts F.
11
Example of an XML fact
lt?xml version"1.0" encoding"UTF-8"
?gt ltSuspicious_region Region_length"287"
Number_of_regions"6"gt ltPatient
Patient_age"60" gt ltAge_class
Age_class_name"Between 60 and 69 years old" /gt
lt/Patientgt ltLesion_type
Lesion_type_name"calcification type
round_and_regular distribution n/a"gt
ltLesion_category Lesion_category_name"calcificati
on type round_and_regular" /gt
lt/Lesion_typegt ltAssessment
Assessment_code"2" /gt ltSubtlety
Subtlety_code"4" /gt ltPathology
Pathology_name"benign_without_callback" /gt
ltDate_of_study Date"1998-06-04"gt
ltDay Day_name"June 4, 1998"gt
ltMonth Month_name"June, 1998"gt
ltYear Year_name"1998" /gt
lt/Monthgt lt/Daygt
lt/Date_of_studygt ltDate_of_digitization
Date"1998-07-20"gt ltDay
Day_name"July 20, 1998"gt
ltMonth Month_name"July, 1998"gt
ltYear Year_name"1998" /gt
lt/Monthgt lt/Daygt
lt/Date_of_digitizationgt ltDigitizer
Digitizer_name"lumisys laser" /gt
ltScanner_image Scanner_file_name"B_3162_1.RIGHT_C
C.LJPEG" /gt lt/Suspicious_regiongt
12
Construction of the XML cubes
? MCA needs for analyses ? XML documents
? Algorithms to merge attribute trees based on
1. fusion per pruning 2. fusion per
grafting
Concept of attribute tree (Golfarelli and al.
1998, Golfarelli er Rizzi 1999, Golfarelli and
al. 2001
13
Construction of the XML cubes
Fusion of the trees of attributes
MCA needs for analyses ? attribute tree XML
documents ? attribute tree
? Diagram XML of a cube XML
  • Operations of fusion of the attribute trees
  • Concept of minimal contents

14
Construction of the XML cubes
  • Fusion of attribute trees

15
Construction of the XML cubes
  • Minimal content of an XML document

XML documents must contain sufficient information
to meet the needs for analysis of the user
control on the attribute tree.
The user defines the elements (measurements,
dimension, hiérarchis and their attributes) in
the MCA necessary or not (mandatory or optional)
for his objectives of analysis.
The minimal contents of an XML document thus
correspond to the mandatory part of the attribute
tree associated to the MCA.
16
Implementation
17
Implementation
Function WriteTreeDeep(document,tree)
rootGetRootElement(document)
nodeListGetNodes(tree,root) While
Not(end(nodeList))
Graphe.AddVertex(nodeList.name)
Call Function ReadTreeDeep(nodeList.name,tree)
End While End Function
Function ReadTreeDeep(root,tree)
nodeListGetNodes(tree,root) While
Not(end(nodeList)) Graphe.AddVertex(node
List.name) Call Function
ReadTreeDeep(nodeList.name,tree) End
While End Function
- Recursive functions WriteTreeDeep and
ReadTreeDeep to handle the attribute tree
18
Implementation
Function MergeTree(tree1,tree2)
tree3DuplicateTree(tree1) While
Not(end(nodeList(tree3)))
vertex1GetVertex(tree3) While
Not(end(nodeList(tree2)))
vertex2GetVertex(tree2)
If vertex2vertex1 Then vertex1.arc 0
End While End While
Tree3WriteTree(tree3) End Function
- Function MergeTree to amalgamate two trees of
attributes
19
Case study Context
DDSM (Digital Database for Screening
Mammography) a complex DB (2 604 files of
patients A total volume of 230,9 Go)
  • A file is composed of
  • 1 file .ics describing in ASCII format, general
    informations of a file of patient.
  • 4 files images .LJPEG (LOSSLESS JPEG) of the
    digitized radios.
  • Each radio presents an angle of sight of the
    centre Left_CC, Left_MLO, Right_CC, Right_MLO
    (CC Cranio-Caudal MLO Medio-Latral Oblique).
  • For each radio operator presenting one or of the
    abnormal zones, is assocated a file .OVERLAY in
    ASCII format, describing an anomaly of the
    centre.
  • 1 file image .16_PGM gathering the 4 radios and
    presenting a fast outline for the visualization
    of a file of patient.

20
Case study Context
21
Case study Corpus XML
Documents XML (http //eric.univ-lyon2.fr/rbenmess
aoud/ ?pagedonneessection3)
22
Case study Conceptual model of the needs
Case of the Suspects areas
23
Case study attribute trees
- Tree of attributes associated with the MCA with
Suspects areas.
- Tree of attributes of documents XML in entry
24
Case study Logical model of the XML cube
.Diagram XML of cube Suspects areas
lt?xml version1.0 encodingUTF-8
?gt ltxsschema xmlnshttp//www.w3schools.comgt ltx
selement nameSuspicious regiongt
ltxscomplexTypegt ltxssequencegt
ltxselement namePatient typePatient Type
/gt ltxselement nameLesion Type
typeLesion Type /gt ltxselement
nameSubtlety typeSubtlety Type /gt
ltxselement namePathology typePathology
Type /gt ltxselement nameDate of
study typeDate Type /gt ltxselement
nameDate of digitization typeDate Type /gt
ltxselement nameDigitizer
typeDigitizer Type /gt ltxselement
nameScanner image typeScanner Type /gt
lt/xssequencegt ltxsattribute nameRegion
length typexsinteger /gt ltxsattribute
nameNumber of regions typexsinteger /gt
lt/xscomplexTypegt lt/xselementgt ltxscomplexType
namePatient Typegt ltxssequencegt
ltxselement nameAge classgt
ltxscomplexTypegt ltxsattribute
nameAge class name typexsstring/gt
lt/xscomplexTypegt lt/xselementgt
lt/xssequencegt
25
Case study Logical model of the XML cube
.. ltxscomplexType nameLesion Type Typegt
ltxssequencegt ltxselement nameLesion
categorygt ltxscomplexTypegt
ltxsattribute nameLesion category name
typexsstring/gt lt/xscomplexTypegt
lt/xselementgt lt/xssequencegt
ltxsattribute nameLesion type name
typexsstring/gt lt/xscomplexTypegt ltxscomplexTy
pe nameSubtlety Typegt ltxsattribute
nameSubtlety code typexsinteger/gt lt/xscomp
lexTypegt ltxscomplexType namePathology Typegt
ltxsattribute namePathology name
typexsstring/gt lt/xscomplexTypegt ltxscomplexT
ype nameDigitizer Typegt ltxsattribute
nameDigitizer name typexsstring/gt
lt/xscomplexTypegt ltxscomplexType nameScanner
Typegt ltxsattribute nameScanner file name
typexsstring/gt lt/xscomplexTypegt ..
26
Case study Logical model of the XML cube
. Diagram XML of cube Suspects areas
ltxscomplexType nameDate Typegt
ltxssequencegt ltxselement nameDaygt
ltxscomplexTypegt ltxssequencegt
ltxselement nameMonthgt
ltxscomplexTypegt
ltxssequencegt ltxselement
nameYeargt
ltxscomplexTypegt
ltxsattribute nameYear name typexsinteger/gt
lt/xscomplexTypegt
lt/xselementgt
lt/xssequencegt
ltxsattribute nameMonth name
typexsstring/gt
lt/xscomplexTypegt lt/xselementgt
lt/xssequencegt ltxsattribute
nameDay name typexsstring/gt
lt/xscomplexTypegt lt/xselementgt
lt/xssequencegt ltxsattribute nameDate
typexsdate/gt lt/xscomplexTypegt lt/xsschemagt
27
Conclusion and future directions
Conclusion
? methodology based on the XML formalism to store
complex data.
  • To express a level of abstraction interesting to
    prepare data to analysis.
  • To feed a multidimensional structure using XML
    documents.
  • A formalization of the star schema or the
    snowflake schema in XML.
  • ( use of the tree of attributes, Golfarelli and
    al., 2001a,b)
  • A Java application which produces a logical model
    and a phisical model of a cube from heterogenious
    XML documents
  • A case study on suspect areas on mammographies
    showed the interest of our approach.

28
Conclusion and future directions
Future directions
  • Interrogation of the XML cube an extension of
    the XQuery language is necessary to make it
    possible and to carry out the operation of
    Group-by.
  • Not numerical measurements resort to suitable
    operators.
  • The example of the operator OpAC ( Ben
    Messaoud and al., 2004),
  • A study of performance is requested within the
    framework of XML cubes
  • Problem of update of XML cubes when changes in
    data sources are needed
  • Physical model of the XML cube
Write a Comment
User Comments (0)
About PowerShow.com