Title: Metadata Registries
1Metadata Registries Repositories Lessons
Learned
- Informal Presentation to the
- SAIC/DHS Metadata Center of Excellence
- Brand L. Niemann
- US EPA and Federal CIO Councils Architecture and
Infrastructure and Best Practices Committees - September 16, 2004
2Overview
- 1. Service-Oriented Architecture
- 2. Pilots
- 3. Some Lessons Learned
- 4. A Possible DHS Strategy
- 5. Contact Information
31. Service-Oriented Architecture
- IBM has created a model to depict Web services
interactions which is referred to as a
service-oriented architecture comprising
relationships among three entities (see next
slide) - A Web service provider
- A Web service requestor and a
- A Web service broker.
- Note IBMs service-oriented architecture is a
generic model describing service collaboration,
not specific to Web services. - See http//www-106.ibm.com/developerworks/webservi
ces/
41. Service-Oriented Architecture
Service provider
Bind
Publish
Service requestor
Service broker
Find
Service-oriented architecture representation
(Courtesy of IBM Corporation)
51. Service-Oriented Architecture
- A Service-Oriented Architecture (SOA) means that
the architecture is described and organized to
support Web Services dynamic, automated
description, publication, discovery, and use. - The SOA organizes Web Services into three basic
roles - The service provider (publish)
- The service requestor find)
- The service registry (bind)
- The SOA is also responsible for describing how
Web Services can be combined into larger services.
61. Service-Oriented Architecture
- The SOA has four key functional components
- Service Implementation
- Build from scratch, provide a wrapper, or create
a new service interface for an existing Web
Service. - Publication
- Author the WSDL document, publish the WSDL on a
Web Server, and publish the existence of your
WSDL in a Web Services registry using a standard
specification (UDDI). - Discovery
- Search the registry, get the URL, and download
the WSDL file. - Invocation
- Author a client (SOAP) using the WSDL and make
the request (SOAP message) and get the response
(SOAP message).
71. Service-Oriented Architecture
- 1. Client queries registry to locate service.
- 2. Registry refers client to WSDL document.
- 3. Client accesses WSDL document.
- 4. WSDL provides data to interact with Web
service. - 5. Client sends SOAP-message request.
- 6. Web service returns SOAP-message response.
WSDL Document
UDDI Registry
2
3
1
4
5
Client
Web Service
6
81. Service-Oriented Architecture
- Acronyms
- UDDI
- WSDL
- SOAP
- HTTP, SMTP, FTP
- Programming (DOM, SAX)
- Schema (DTD, XSD)
- XML
- Practical Examples
- Phone Book
- Contract
- Envelope
- Mailperson
- Speech
- Vocabulary
- Alphabet
91. Service-Oriented Architecture
- Stages of Web services Development and
Deployment - Creation Design, development, documentation,
testing, and distribution. - Publication Web service hosting and
maintenance. - Promotion Directory services, value-added
services and accreditation.
101. Service-Oriented Architecture
Service requestors
Service providers
Web Services Network Security Reliability QoS Bil
ling
Web services networks act as intermediaries in
Web services interactions.
112. Pilots
- Ask to review EPAs Environmental Data Registry
(ISO 11179) and National Environmental Exchange
Network and make recommendations for improvements
and to provided XML Web Services training
(2001-2002). - Received Special Award for Innovation with XML
Web Services from the Federal Quad Council (Mark
Forman, March 2002) and asked to lead the CIO
Councils XML Web Services Working Group and to
do more pilots in support of E-Government (August
2002-September 2003) (see list on slide 12). - Received Emerging Technology/Standards Leadership
Award at the SecureE-Biz.Net Summit from Mark
Forman and David McClure (April 2003). - One CIO Council Pilot project becomes the First
Annual Conference on Semantic Technology for
E-Government at the White House Conference Center
(September 8, 2003) which fosters the formation
of the Semantic Interoperability Community of
Practice (SICoP) under the CIO Councils Best
Practices Committee (March 2004) (Co-Chairs, Rick
Morris and Brand Niemann) which in turn becomes a
public-private partnership that produces the
Second Annual Conference on Semantic Technology
for E-Government (September 8-9, 2004). - The Best Paper Award at the Second Annual
Conference on Semantic Technology for
E-Government went to a four person team lead by
the SAIC/ACS (see slides 13-14) in which the
repository was a semantic store.
122. Pilots
132. Pilots
- Operationalizing the Semantic Web A Prototype
Effort using XML and Semantic Web Technologies
for Counter-Terrorism - M. Personick, B. Bebee, B. Thompson
SAIC/Advanced Systems Concepts - B. Parsia, The University of Maryland, College
Park, Maryland Information and Network Dynamics
Lab, Semantic Web Agents Project and - C. Soechtig, Object Sciences Corporation.
Conference presenters
142. Pilots
- 2.1 Repurposing EPAs Environmental Data Registry
(ISO 11179) (added structure and data element
harmonization) - 2.2 Distributed Content Network and Semantic Web
Services (NextPage) - 2.3 XML.Gov Working Group-NIST Pilot Registry
(Yellow Dragon-Adobe) - 2.4 Repurposing the DOD Registry with XML
Collaborator (use in IC MWG?) - 2.5 MetaMatrix-XML Collaborator (DHS integration
scenario using MOF) - 2.6 CollabNet (now used for CORE.Gov)
- 2.7 E-Forms for E-Gov (Censuss Registry and 12
or so vendors) - 2.8 Integrated Web Services/ebXML (to OASIS TC)
- 2.9 State and Local Homeland Security Best
Practices Pilot (FileMaker Pro) - 2.10 Native XML Database ( Tamino with UDDI)
- 2.11 Data and Information Reference Model (DRM)
(embedded semantic harmonization and real data
tables) - 2.12 Networked Communities of Practice (CoP)
Their Dynamic Knowledge Repositories (DKR)
(ONTOLOG Forum Collaborative Expedition
Workshops with CIM3) - 2.13 Semantic Information Management (Unicorn)
- 2.14 Federated Repository-Software Asset Reuse
(LogicLibrary) - 2.15 Best Practices for Networked Communities
of Practice (CIM3 and NextPage) - 2.16 Community of Practice Hosting Portals
(Tomoye Simplify and Groove) - 2.17 Ontology Production and Linking (several new
open source and proprietary)
Note Some specifics for each to be provided in
the presentation.
152.1 Repurposing EPAs Environmental Data Registry
(ISO 11179)(added structure and data element
harmonization)(Note NextPages LivePublish puts
all data in the same format (XML) while its NXT4
indexes many different formats in same format
(XML))
EPAs EDR contains about 250 standardized elements
and about 10,000 non-standardized data elements
including many that are redundant.
http//www.sdi.gov http//xml.gov/presentations/ni
st3/iso11179.htm
162.2 Distributed Content Network and Semantic Web
Services(NextPage)
Enterprise Ontology and Web Services Registry
Dynamic Resources
Semantic Web Services
Web Services
Static Resources
WWW
Semantic Web
Source Derived in part from two separate
presentations at the Web Services One Conference
2002 by Dieter Fensel and Dragan Sretenovic.
Interoperable Syntax
Interoperable Semantics
172.2 Distributed Content Network and Semantic Web
Services(NextPage)
Content gives us the semantics
(taxonomy/ontology) the interoperability, Adam
Pease, SICoP Meeting at MITRE, May 19,
2004. Structure comes from the content
itself, The Large Document Problem, Lucian
Russell, Categorization of Government Information
WG Meeting, 5/10/04.
182.2 Distributed Content Network and Semantic Web
Services(NextPage)(Note Recently acquired by
FAST, the Search Engine company used by FirstGov)
http//www.sdi.gov
192.3 XML.Gov Working Group-NIST Pilot Registry
(Yellow Dragon-Adobe)(BAH Business Case called
for Federated, but only Centralized so far)
http//xmlregistry.nist.gov8080/index.jsp
202.4 Repurposing the DOD Registry with XML
Collaborator (use in IC MWG?-DOD not a sterling
registry)(supported ISO 11179, ebXML,
WSDL/UDDI, and now CAM)
http//www.blueoxide.com/Pages/xmlcollaborator.htm
l http//xml.gov/presentations/blueoxide2/collabor
ator.htm http//xml.gov/presentations/fgm/dodregis
try.htm http//www.xml.saic.com/icml/ic_registry/i
ntroduction.asp
212.5 MetaMatrix-XML Collaborator (DHS integration
scenario using MOF)
Emergency Preparedness Response
Border Transportation Security
Science Technology
Information Analysis
Virtual Views
Physical Sources
CIA
INS
Customs
FBI
Coast Guard
Secret Service
TSA
FEMA
National Guard
This approach is equally valid for intra-agency
data integration
222.5 MetaMatrix-XML Collaborator (DHS integration
scenario)
XML Schema Mapping
See Joint Government Data and Information
Reference Model (IAC White Paper) which includes
MetaMatrix-XML Collaborator Pilot Project (see
pages 26-27) at http//web-services.gov/030528_IA
C_EA_SIG_Information_and_Data_Reference_Model_Body
.pdf
232.6 CollabNet (now used for CORE.Gov)(more the
pull model than the push model used by
LogicLibrary)
https//www.core.gov/
242.7 E-Forms for E-Gov(Censuss Registry and 12
or so vendors)
See http//www.fenestra.com/eforms/deliverables/fi
nal_report.htm
252.8 Integrated Web Services/ebXML (to OASIS TC)
- One interface (HTTP, SwA, ebMS)
- Electronic Forms
- Web Services / WSRP
- Collaboration Agreements
- Business Process Requirements, Objects, Data
- Domain specific Semantics and Relationships
between Assets Artifacts - SQL queries and APIs
See Carl Mattocks http//xml.gov/presentations/oa
sis4/eGovRegistry.htm
262.9 State and Local Homeland Security Best
Practices Pilot (FileMaker Pro)(James
Mackison, GSA, for DHS)
Standard metadata template in Excel imported to
FileMaker Pro.
272.10 Native XML Database (Tamino with UDDI)
How UDDI Works
3) UDDI assigns a programmatically unique
identifier (UUID) to each tModel and business
registration and stores them in an Internet
registry
For update see http//xml.gov/presentations/systin
et/uddi.htm
282.11 Data and Information Reference Model
(DRM)(embedded semantic harmonization and real
data tables)
Harmonization/Standardization of Data Element and
XML Tag Names and table structure preserved for
use in spreadsheets, etc.
292.12 Networked Communities of Practice (CoP)
Their Dynamic Knowledge Repositories (DKR)
(ONTOLOG Forum Collaborative Expedition
Workshops with CIM3)
http//ontolog.cim3.net/
302.13 Semantic Information Management (Unicorn)
- Formulate ad hoc queries
- Run queries across sources
- Analyze and visualize (using 3rd party tools)
- What does this mean? Where did it come from?
- Discover data sources
- Impact analysis
- Create schemas
- Manage repository
- Manage ontology model collaboratively
- Semantic mapping
312.13 Semantic Information Management (Unicorn)
Semantic Mapping
- Map each data asset once only as a spoke to
Ontology Model hub - Formal semantic mappings capture meaning of data
in formal machine-readable form - Flexibility of Mapping
- Map all assets relational, XML, legacy, etc. to
same model - Productive mapping in two stages groups (e.g.
tables) and fields (e.g. columns) - Attach conditions to mapping
322.14 Federated Repository-Software Asset Reuse
(LogicLibrary) (more the push model than the
pull model used by Core.Gov)
See Federal Times article at http//federaltimes.c
om/index.php?S290239
Enterprise Architecture http//www.noblestar.com/
we_do/arch/arch.jsp
332.14 Federated Repository-Software Asset Reuse
(LogicLibrary)
Logidex Demonstration Site http//www.logidexasse
tcenter.com/assetcenter.jsp
342.15 Best Practices for Networked Communities
of Practice(CIM3 and NextPage)
http//web-services.gov
352.15 Best Practices for Networked Communities
of Practice(NextPage)(CIM3 Wiki shown
previously in section 2.12)
http//web-services.gov
362.16 Community of Practice Hosting Portals
(Tomoye Simplify and Groove)
Private Groove Workspace for Conference Planning,
Paper Reviews, Etc.
http//12.158.152.7/ev_en.php. Contact Guy
Rogers, Chief Editor, for password at
grogers_at_triplei.com
372.17 Ontology Production and Linking(several new
open source and proprietary)
OntoLink - Linking Ontologies Services
- Mapping between OWL ontologies and XML Schemas
- Allow procedural transformations
- Generate XSLT transformations
- Create mapping services for ontologies
Source Sirin and Hendler, Semantic Web and Web
Services, University of Maryland, Semantic Web
and Agent Technology at the Maryland Information
and Network Dynamics Laboratory, September 14,
2004.
383. Some Lessons Learned
- Metadata has evolved from
- Descriptions of databases that are not readily
accessible (e.g., clearinghouse). - An interface to multiple databases that are
accessible (e.g. warehouse). - A layer of information that makes multiple
database integration possible (e.g., MOF). - A markup language for relating and linking
multiple databases in a networked environment
(e.g., RDF and OWL).
393. Some Lessons Learned
- Registries - Repositories have evolved from
- Separate from the actual databases (e.g., ISO
11179) to integrated with DBMS (e.g.,
Tamino/UDDI). - Specific for certain artifacts and functions
(e.g., XML Schemas) to comprehensive systems
(e.g., LogicLibrary). - Centralized (e.g., XML.Gov WG/NIST) to federated
(e.g. Semantic Web Services). - Special category of software (e.g., data element
management) to mainstream software (e.g.,
document management systems/distributed content
networks). - Simple tools (e.g., spreadsheets for data
elements) to community of practice hosting
portals (e.g., Tomoye Simplify, Groove, etc.).
403. Some Lessons Learned
- Goals have evolved from
- Prevent (manage) data element naming conflicts
(e.g., ISO 11179). - Support XML artifact development and versioning
Web Services (e.g., XML Collaborator). - Support semantic harmonization across different
domains and Semantic Web Services (e.g., e.g.
TopBraid, OntoLink, etc.). - Support community of practice (CoP) development
and networking with other CoPs (e.g., CIM3 Wiki
for Ontolog).
414. A Possible DHS Strategy
- Multiple Levels of Metadata
- Level 1-Coarse
- Some basic descriptors for all 700 some
databases. - E.g., Name, type, accessability, etc.
- Level 2-Medium
- Some basic metadata for say the 100 best
databases. - E.g., Data dictionary, XML Schema, etc.
- Level 3-Fine-grained
- Detailed metadata and/or markup for say the 10 or
so databases to be integrated. - E.g. MOF, RDF, etc.
425. Contact Information
- U.S. Environmental Protection Agency, Office of
Environmental Information (Office of the Chief
Information Officer-CIO) - Enterprise Architecture Team.
- Computer Scientist and Semantic XML Web Services
Specialist. - 202-566-1657, niemann.brand_at_epa.gov.
- Interagency Working Group on Sustainable
Development Indicators - http//www.sdi.gov.
- CIO Councils Architecture Infrastructure
Committee and Emerging Technology Subcommittee - http//web-services.gov.
- http//componenttechnology.org.
- CIO Councils Best Practices Committee (Knowledge
Management Working Group) and Semantic (Web
Services) Interoperability Community of Practice - http//km.gov and http//web-services.gov