XML Data Quality Modeling - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

XML Data Quality Modeling

Description:

Monica Scannapieco-IFIP2.6 Meeting Catania, Sicily. XML Data Quality Modeling. Monica Scannapieco ... IFIP2.6 Meeting Catania, Sicily. Data Schema. Data class d ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 29
Provided by: wiseV
Category:
Tags: xml | data | modeling | quality | sicily

less

Transcript and Presenter's Notes

Title: XML Data Quality Modeling


1
XML Data Quality Modeling
  • Monica Scannapieco
  • Dipartimento di Informatica e Sistemistica
  • Università di Roma La Sapienza

2
Outline
  • Motivations
  • The D2Q data model
  • Querying the D2Q data model
  • DaQuinCIS architecture Mediator
  • Conclusions

3
Data Quality Multi-dimensional concept
  • Accuracy
  • Jhn vs. John
  • Currency
  • Residence Address out-dated vs. up-to-dated
  • Consistency
  • ZIP Code and City consistent

Completeness
4
Data Quality CISs
  • Cooperative Information Systems (CISs)
  • data sharing to accomplish cooperative tasks
  • high data replication

Instance level heterogeneities need to be
reconciled
CISs need data quality
High data replication
Data quality needs CISs
5
Why DQ modeling?
  • Quality can be associated to data in order to
  • Certify the correcteness (accuracy,
    consistency, currency) and completeness of data
  • benefits for cooperation
  • Support instance level reconciliation
  • last-update timestamp for currency guarantee
    drives the reconciliation phase

6
Why XML modeling?
  • The context of CISs obliges to face
    interoperability issues
  • Flexibility of semi-structured models
  • possibility of associating quality values to
    different data granularity levels

7
D2Q Data and Data Quality Model
  • Graph-based data model, enhancing the semantics
    of the XML data model to represent quality data

8
Data Schema
  • Data class d(named, p1,, pn)
  • Name named
  • Set of properties pi ltnamei Typeigt where
  • namei is the name of the property pi
  • Typei can be
  • (i) a basic type
  • (ii) a data class or
  • (iii) a a type set-of ltXgt, where ltXgt can be
    either a basic type or a data class
  • Data Schema
  • Node- and Edge-Labelled Direct Acyclic Graph of
    data classes

9
Quality Schema
  • Quality Class ?d associated to a data class d

Enterprise_Currency

Enterprise_Currency

Enterprise_QualityEnterpriseQualityClass
Enterprise_QualityEnterpriseQualityClass
t
t
_currency
_currency


Enterprise_Consistency

Enterprise_Consistency

t
t
_consistency
_consistency
Owner_QualityOwnerQualityClass
Owner_QualityOwnerQualityClass
Enterprise_Completeness

Enterprise_Completeness

t
t
_completeness
_completeness
1
1
Enterprise_Accuracy

Enterprise_Accuracy

t
t
_accuracy
_accuracy
FiscalCode_Quality


FiscalCode_Quality




FiscalCodeQualityClass
FiscalCodeQualityClass
t
t _accuracy
Owner_Accuracy

Owner_Accuracy

accuracy
FiscalCode_Accuracy

FiscalCode_Accuracy

FiscalCode_Accuracy

FiscalCode_Accuracy

FiscalCode_Accuracy

FiscalCode_Accuracy

FiscalCode_Currency

FiscalCode_Currency

t
t
t
t
_accuracy
_completeness
_accuracy
_completeness
t
t
_consistency
_consistency
t
t
_currency
_currency
10
D2Q Schema
Quality Association
quality
Enterprise

string
Enterprise_Quality
Owner
string
quality
Code
Name
quality

Owner_Quality
Code_ Quality
?-accuracy
Name_ Quality

quality

?-accuracy
Enterprise_ accuracy
?-accuracy
Code_ accuracy
Name_ accuracy
Quality Associations Biunivocal functions among
all nodes of a data schema and all non-leaf
nodes of a quality schema
11
D2Q Schema Instances
  • Data Classes Instance-gt Data Objects
  • Quality Classes Instance-gt Quality Objects
  • Quality Association Values-gtQuality Links

Owner1
FiscalCodeSCNMNCXXX
AddressVia Salaria 113 Roma
NameMonicaScannapieco
12
From D2Q to XML
  • D2Q schemas translated into XML Schemas
  • XML Schema Types Definition
  • Data and quality classes and their properties as
    XML elements
  • OID and QOID attributes for quality associations
  • Introduction of root elements

13
Querying the D2Q Model with XQuery
  • Quality Selectors set of user-defined XQuery
    functions
  • Each quality selector allows to access the values
    of a specific dimension or the overall quality of
    a set of input nodes
  • accuracy(node)-gtnode

14
Example of DQ Accessing
  • Query
  • for i in input()//ownerName eq Monica
    Scannapieco
  • return quality(i/Address,i/FiscalCode)
  • Result
  • ltrootgt
  • ltAddress_Quality qOIDqOID132gt
  • ltAddress_Accuracygthighlt/Address_Accuracygt
  • ltAddress_Currencygtmediumlt/Address_Currencygt
  • lt/Address_Qualitygt
  • ltFiscalCode_Quality qOIDqOID131gt
  • ltFiscalCode_Accuracygthigh lt/FiscalCode_Accuracy
    gt
  • lt/FiscalCode_Qualitygt
  • lt/rootgt

15
DaQuinCIS Platform A platform for exchanging and
improving data quality in CISs
Quality Factory
Quality Notification Service
  • Data Quality Broker
  • Record Matcher

16
Quality Improvement Strategy
The Broker selects the best quality data
answering a query and sends it to the requester
(data quality-driven query answering) and to
other providers (On-Line Improvement)
The Record Matcher periodically compares exported
data in order to improve their quality
Broker
Cooperative data
Cooperative data
Cooperative data
The notification service multicasts data quality
changes
Quality Maintenance
17
Data Quality Broker a Quality-based Data
Integration System
  • Wrapper/Mediator Architecture
  • Global and Local views expressed as XML Schemas
    D2Q-compliant
  • Global as View (GAV) Mapping

18
Query Processing Steps (1)
  • Given a query Q on a D2Q global schema
  • Q is unfolded according to a static mapping that
    retrieves all copies of same data that are
    available in the CIS
  • The execution of local queries returns a set of
    results, on which a run-time matching is performed

19
Query Processing Steps (2)
  • The result to be returned is built as follows
  • (i) if no quality requirement is specified, a
    best quality default semantics is adopted. This
    means that the result is constructed by selecting
    the best quality values
  • (ii) if quality requirements are specified, the
    result is constructed by checking the
    satisfiability of the requirements on the whole
    result.

20
Why such a semantics?
  • Best quality copies always available
  • Quality Improvement Feature
  • Results collected at query-time have an
    associated quality
  • The best quality results is proposed to all data
    sources that provided a lower quality result

21
How does it work?
  • Static mapping specified through path expressions
  • A path expression allows to locate a concept in a
    schema
  • XML schemas are D2Q compliant

22
Mediator query processing steps
23
Unfolding
  • Path Expression Extraction a global query is
    analyzed to extract path expressions
  • Path Expression Pre-processing to obtain
    completely specified path expressions
  • Translation each path expression on the global
    view is translated into (a set of) path
    expressions over the structure of the local views
  • Framing keeps trace of transformation steps
  • Queries over local sources are sent to the
    Transport Engine module

24
Refolding
  • Re-translation step received results are
    re-translated according to the global schema
    specification
  • Materialization results are concatenated into a
    single, temporary file
  • Global Query Execution the global query is
    changed into a query using only local files, and
    can then be executed
  • Record Matching records are matched and after a
    comparison on quality values externally made by
    the Comparator Module, the query results ordered
    by quality are sent for a Quality Filtering
  • Results best fitting with the user query
    requirements are sent back to the user. Moreover,
    quality feedbacks are sent to the Transport
    Engine that is in charge of propagating them in
    the system.

25
Implementation Modules
26
On-going Experiments
  • Currently comparing
  • Periodical record matching in a traditional
    setting
  • Quality Improvement strategy underlying DaQuinCIS
  • Three Italian PA databases are being used

27
Conclusions
  • An XML Data Quality model
  • How using it within a DI architecture for
  • Quality accessing
  • Quality improving

28
DaQuinCIS Platform
  • Data Quality in Cooperative Information Systems
    (DaQuinCIS) platform for exchanging and
    improving data quality in CISs

Rating
Service
Communication
Infrastructure
Quality
Notification
Quality
Service
(QNS)
Factory (QF)
Cooperative
Cooperative
Gateway
Gateway
OrgN
back
-
end systems
Cooperative
internals
of the
Org2
Org1
Gateway
organization
Data
Quality
Broker (DQB)
Write a Comment
User Comments (0)
About PowerShow.com