Metadata and XML for Organizing and Accessing Multiple Statistical Data Sources PowerPoint PPT Presentation

presentation player overlay
1 / 26
About This Presentation
Transcript and Presenter's Notes

Title: Metadata and XML for Organizing and Accessing Multiple Statistical Data Sources


1
Metadata and XML for Organizing and Accessing
Multiple Statistical Data Sources
  • Yaxin Bi(1) , Fionn Murtagh(2), and Sally
    McClean(1)
  • (1)University of Ulster, N Ireland
  • (2)The Queens University of Belfast, N Ireland

2
Introduction
  • A brief background of this work
  • Major issues related to an integration of
    multiple statistical data sources
  • A proposed approach for querying multiple
    statistical data sources
  • The roles of statistical metadata and XML in
    structuring and organizing statistical data
    sources
  • Introduction to a content model DTD, called
    Statistical Metadata Description Language (SMDL)

3
Introduction (continued)
  • Conclusion
  • Implementation of a prototype

4
The issues in the integration of multiple
statistical data sources
  • legacy system problem
  • Heterogeneity
  • Access to multiple data sources
  • Sharing a common view among multiple data sources

5
A methodology used in this work
  • Enhancing the conceptual representation to
    multiple data sources
  • Considering relations among data sources
  • Extracting metadata, building up a content model
    serving as a global ontology for describing
    multiple data sources
  • Accessing multiple data sources is divided into
    two step
  • Searching, navigating
  • Database query

6
Querying on multiple sources
Query
Client
Database1
Results
Database2
7
Expressing a database query on multiple data
sources
  • SQL select-from-where construct as a basis
  • Clear information need
  • Unclear data sources
  • what the databases contain?
  • what the variable names can be used for querying?
  • where the databases are?
  • Aiming at how to make unclear to be clear

8
Conventional approach to querying multiple data
sources
  • An integrated interface global view for multiple
    data sources
  • limited and
  • it is impossible to represent context
  • Universal schema
  • ensure mapping between schemas are semantically
    equivalent but whose forms are different

9
Two step of query on multiple sources
Searching Navigating Exploring
Query
Client
Database1
Results
Database2
XML data (metadata)
10
From unclear query to clear query
Vague expression Query
MD
ltDomain / Provider / Dataset / Variable Name /
Descriptiongt
. . .
Clear expression Query
DB
DB
11
Ranking and validating pathway
  • Ranking the retrieved results
  • Generating a pathway ltDomain / Provider / Data
    sources / Variable Name / Descriptiongt
  • Validating the pathway for database query

12
An architecture for ADDSIA
13
A proposed framework for statistical metadata and
XML
  • Creating a content model DTD with a form
    ltidentifiergtvaluelt/identifiergt
  • Bringing data representation, structure and
    relationship together.
  • Describing the content of data sources and
    relations with external sources
  • Crossing platforms

14
Scope of statistical metadata
  • Description of methods of statistical data
    processing and management
  • Description of statistical data sources
  • Definition of classification of activities and
    concept
  • Definition of statistical population and
    measurement units
  • Definition of variables
  • Definition of mapping

15
Statistical metadata domain metadata
  • Domain metadata. What information is covered by
    the data sources
  • domain name
  • domain manager
  • domain institution
  • description
  • concept
  • data provider
  • source

16
Statistical metadata data provider metadata
  • Data provider metadata. Details about the data
    sources with a hierarchical structure
  • provider name
  • description
  • provider manager
  • data source
  • dataset

17
Statistical metadata data provider metadata
(continued)
  • Dataset
  • name survey details
  • timeperiod time series
  • content codelist
  • reference variable
  • source
  • geographical area
  • note

18
Statistical Metadata Description Language (SMDL)
ADDSIA DTD
lt!ELEMENT SMDL (DOMAIN, DATAPROVIDER)gt lt!ELEMENT
DOMAIN (DMNAME, DMDESCRIPTION, DMINSTITUTION,
DMMANAGER, DMPROVIDER, CONCEPT,
CLASSIFICATION, UNITDESCR, DMSOURCE)gt lt!ELE
MENT DMNAME (PCDATA)gt lt!ELEMENT DMDESCRIPTION (
PCDATA)gt lt!ELEMENT DMINSTITUTION (PCDATA)gt lt!ELE
MENT DMMANAGER (NAME, EMAIL, TELEPHONE?, FAX?,
ADDRESS?)gt lt!ELEMENT NAME (PCDATA)gt lt!ELEMENT EM
AIL (PCDATA)gt lt!ELEMENT TELEPHONE (PCDATA)gt lt!E
LEMENT FAX (PCDATA)gt lt!ELEMENT ADDRESS (PCDATA
)gt lt!ELEMENT DMPROVIDER (INSTITUTION, COUNTRY,
URL?)gt lt!ELEMENT INSTITUTION (PCDATA)gt lt!ELEMENT
COUNTRY (PCDATA)gt lt!ELEMENT URL (PCDATA)gt lt!E
LEMENT CONCEPT (NAME, DESCRIPTION, DEFINITION?,
REFERENCE, STATUS)gt
19
SMDL tree structure
20
An example marked up a labour force domain
lt?xml version"1.0"?gt lt!DOCTYPE SMDL SYSTEM
"smdl_v_1.dtd"gt lt!-- Define a domain
--gt ltSMDLgt ltDOMAINgt ltDM_NAMEgt Labour Force
lt/DM_NAMEgt ltDM_DESCRIPTIONgt surveys and data
sets studying the availability
of labour. See xxx lt/DM_DESCRIPTIONgt
ltDM_INSTITUTIONgt ONS lt/DM_INSTITUTIONgt
ltDM_MANAGERgt ltNAMEgt A.N.Other lt/NAMEgt
ltEMAILgt A.Other_at_ons.gov.uk lt/EMAILgt
ltTELEPHONEgt ext 1234 lt/TELEPHONEgt
lt/DM_MANAGERgt ltDM_PROVIDERgt ltINSTITUTIONgt
ONS lt/INSTITUTIONgt ltCOUNTRYgt UK lt/COUNTRYgt
lt/DM_PROVIDERgt ltDM_PROVIDERgt ltINSTITUTIONgt
CSO lt/INSTITUTIONgt ltCOUNTRYgt Irelandlt/COUNTRYgt
lt/DM_PROVIDERgt
21
Conclusion
  • Avoiding heterogeneity to some extent
  • The framework with XML format (field-based),
    hierarchical structure and link-following
    relationship
  • Providing an approach for scanning the content of
    multiple data sources through searching,
    navigating, and performing database query
  • There is significant overhead from delimiters in
    XML-data

22
An architecture for searching and navigating
statistical metadata
Browser
Client side
Navigating
Searching
HTTP/RMI
Domain server
XML data (metadata)
23
Retrieving metadata using searching engine
24
Navigating hierarchical structure
25
Exploring dataset content for database query
26
Read and edit XML data
Write a Comment
User Comments (0)
About PowerShow.com