Developing a Distributed Data Dictionary Service - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Developing a Distributed Data Dictionary Service

Description:

Vocabularies - human readable collections of terms and ... DC.Date.LastModified. JPL's Planetary Data System (PDS) PDS.Target_Name. PDS.Sampling_Factor ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 17
Provided by: confere
Category:

less

Transcript and Presenter's Notes

Title: Developing a Distributed Data Dictionary Service


1
Developing a Distributed Data Dictionary Service
  • Jim URen
  • Jet Propulsion Laboratory
  • California Institute of Technology
  • Design Hub, KM Standards Working Group EDA Team
  • April 11, 2002

2
Problem
  • 1. Data dictionaries mean different things to
    different people
  • Vocabularies - human readable collections of
    terms and definitions pertaining to a domain
  • Data element dictionaries - machine interpretable
    collections of data elements (from ISO/IEC 11179)
  • Schemas (information models) - structured,
    machine interpretable collections of information
    models consisting of structured relationships
    between data elements
  • 2. Dictionaries do not communicate with each other

3
What is Needed
  • A mechanism that can be used to access, publish,
    update, relate and integrate data dictionaries
    (vocabularies, data elements, and data models)
  • Mechanism must be able to span domains and
    subdomains, e.g., engineering, science, and
    administrative
  • Mechanism must have both manual and automated
    interfaces
  • Mechanism should follow the distributed service
    model (e.g., DNS, Internet Domain Name Service,
    x.500 Directory, etc.)

4
A Solution
  • Develop a distributed data dictionary service
    using
  • LDAP Internet service protocol (LightWeight
    Directory Access Protocol)
  • ISO11179 - a specification for standard data
    elements
  • DSML XML DTD/Schema (Directory Service Markup
    Language)
  • Dublin Core Meta-data
  • the Service will store and relate vocabulary,
    data elements, and data model information

5
Advantages of LDAP
  • LDAP has many advantages, including
  • Universal Access - Internet directory standard,
    widely adopted and implemented by numerous
    vendors and open source software solutions
  • Simple - a relatively simple, high-level protocol
    with a straightforward API
  • Extensible - easily extended and adapted
  • Access Control and Security - connections can be
    authenticated and secured layered Internet
    security mechanims
  • Multi-Platform Development - C/C, Perl, Java,
    JavaScript, Python, PHP and other APIs are
    available, making LDAP services accessible from
    virtually any language, platform, or development
    environment

6
What is LDAP?
  • An Internet Standard from an IETF working
    group
  • RFC 1777 Lightweight Directory Access Protocol
  • RFC 1778 String Representation of Standard
    Atribute Syntaxes
  • RFC 1779 String Representation of Distinguished
    Names
  • RFC 1959 LDAP URL Format
  • RFC LDAP API
  • A distributed, hierarchial data base
  • Uses a multi-part naming convention to create
    unique records (distinguished names)
  • cnbehaviour, dcvocabulary, dcPart233,
    dc10303, dcISO
  • cnrequirement_set, dcdata-element, dcPart233,
    dc10303, dcISO
  • cnTBR-apha1, dcshema, dcPart233, dc10303,
    dcISO
  • Includes ability to implement multiple levels of
    security

7
Example of an LDAP tree
ISO
10303
14496
9000
. . .
237
235
. . .
233
203
210
209
Vocabulary
Schema
Data Elements
8
Advantages of ISO 11179
  • an established international standard
  • widely supported - US Census Bureau, NIST,
    Defense Information System Agency, Environmental
    Security, DoE, DoJ, Bureau of Labor Statistics,
    DoT, EPA, etc.
  • Flexible use of elements within the schema
  • Easily implemented in an LDAP directory service -
    flexible and easily configured LDAP servers well
    suited to flexible 11179 schema

9
Data Dictionary Components for a given namespace
10
A Distributed Data Dictionary Serviceusing
Standards-based technology LDAP Protocol ISO
11179 meta-data schema DSML Dublin Core
Prototype service viewable at http//step.jpl.nas
a.gov/ldap
Supporting Automated Processes
Supporting Validation Scenarios
Supporting Data Modeling Activities
Supporting Terminology Lookups
11
A Proposed Data Element Naming Convention
  • A structured, multi-part naming system
  • similar to IP addressing and URLs
  • dot delimited names
  • follows convention used by Dublin Core Meta-data
    Initiative
  • short-name aliases could be supported in the
    planned distributed data dictionary service
  • e.g. author DC.Creator, keywordDC.Subject,
    etc.
  • Names would consist of domains, descriptors and
    qualifiers.

12
Examples of the Data Element Naming Convention
within JPL Domains
  • Dublin Core Meta-data Initiative (a JPL adopted
    standard)
  • DC.Date
  • DC.Date.Created
  • DC.Date.LastModified
  • JPLs Planetary Data System (PDS)
  • PDS.Target_Name
  • PDS.Sampling_Factor
  • JPLs Product Data Management System (PDMS)
  • PDMS.Version
  • PDMS.ReferenceDesignator
  • JPL New Business System (NBS)
  • NBS.HR.start_date
  • NBS.HR.employee_status

13
Terminology Lookup Scenarios
  • Resolving Ambiguous Terminology - an end user,
    needing to clarify use and meaning of a word used
    in a specific context, performs a multi-domain
    vocabulary lookup across multiple DD services
    looking for published vocabulary of referenced
    domain
  • Finding the Correct Acronym - an end user,
    confronted with a number of new acronyms used in
    a presentation, accesses a local DD service to
    look up the acronyms based within probable
    domains, thereby eliminating the alternative
    meanings e.g., searching for STEP standards work
    versus the JPL STEP project
  • Enabling Improved Search Engine Performance - as
    a search engine scans through a document, it
    discovers a keyword list and finds a reserved
    word the document includes a reference to a
    domain-specific vocabulary list in a DD service
    the search engine uses this vocabulary to be
    certain it is indexing the keywords in the right
    context
  • Building Glossaries for Technical Papers - an
    engineer or scientist writing a technical paper,
    needs to include a glossary of relevant terms in
    the paper by performing a multi-service search,
    terms and definitions that relate to the topic of
    the paper are quickly found and inserted into the
    paper with the corresponding attributions

14
Validation Scenarios
  • Validating Units of Measure - a system integrator
    receives an MCAD geometry model (e.g., STEP AP203
    Part 21 file) of a component to be integrated
    into any assembly automatically, a standard
    validation routine is performed against the
    schema located in a referenced data dictionary
    that checks for use of the units of measure
    called for in the contract and identified in the
    exchange file
  • Enabling Automated Repository Check-In - as a
    STEP model is checked into a PDM system, an
    automated validation routine checks the model
    using the schema (located in the DD service) that
    is identified in the Part 21 data file
  • Improving Quality of Data Handoffs - an MCAD
    geometry model is sent from design to thermal
    analysis and validation is performed using the
    correct schema version as referenced in the
    model validation is an automated process that
    occurs before any work is done with the model as
    it is transferred between domains
  • Validating for Adequacy and Range the PDS
    (NASAs Planetary Data System) central node
    receives a dataset description in template format
    to be ingested into the dataset catalogue
    database. Automatically, a standard validation
    routine is performed that checks for required
    keywords, key word values and value types in the
    dataset in template format against a
    corresponding structure stored in the PDS domain
    of the data dictionary service

15
Data Modeling Scenarios
  • Data Reuse in Modelling Activities- a data
    modeller, charged with developing an information
    model for a new application, uses data elements
    published in several DD services (much like a
    parts library), ensuring that the new information
    model will have compatible interfaces with data
    sets that share the same data elements or
    collection of elements
  • Creating a TDP (technical data package) - an
    application performs a schema check against
    objects about to be wrapped into a TDP (e.g.,
    STEP AP232 or PDM Schema TDP) to ensure their
    correct structure and meta-data content
  • Data Integration Enabled - an analyst, charged
    with integrating data from two or more data sets,
    accesses the correct version of each schema as
    referenced in the data set from the DD service
    space allowing them to identify/map interfaces
    between the data sets, e.g., MCAD-ECAD-cost data
  • Extending a schema - to solve a "local" problem,
    a data modeller uses data elements from a
    published collection of data items to extend an
    existing official schema the new schema is
    published in the DD service with traces/links
    back to the official schema

16
Whats next? (Completing the prototype )
  • Architecture development
  • UML Model (50)
  • Naming Convention (50)
  • Linking ontology (25)
  • Server configuration
  • 2nd and 3rd DD test nodes (33)
  • Wrapping existing DD DBs (10 )
  • Client configurations
  • LDAP URL (75 ) Java (33)
  • Python (33) Perl (33)
  • C/C (75) Unix Shell (25)
  • PHP (25) Native clients(25)
Write a Comment
User Comments (0)
About PowerShow.com