Title: Enhancing the Semantics Management Capabilities of Metadata Registries
1Enhancing the Semantics Management Capabilities
of Metadata Registries Wuhan Symposium on
Ontology and MetaModeling Wuhan University, P. R.
China March 16-17, 2006
Kevin D. Keck Lawrence Berkley National
Laboratory http//sdm.lbl.gov/kdkeck/ kdkeck_at_lbl.
gov
2Just In Time Slides
- These slides are substantially modified from the
version printed in the program - (just following Bruce's example)
- tinyurl for XMDR presentations page
- http//tinyurl.com/z7aun
3Topics
- Integrating Ontology Management, Vocabulary
Management, and Data Management - Implementation for the Semantic Web
- Deployment and Best Practices
4- Integrating
- Ontology Management, Vocabulary Management,
- and Data Management
5Historically Distinct Disciplines
- Data Management
- ISO SQL
- OMG UML, CWM
- W3C XML
- Vocabulary Management
- ISO TC37
- OMG TQS
- W3C SKOS
- Ontology Management
- ISO Common Logic
- OMG ODM
- W3C OWL/RDF
6Aligning Standards 11179 19763 ( MOF ODM)
ISO/IEC19763
ISO/IEC 11179-2
XMDR Registry
Terminology Basic Classes Basic
Relationship
Ontologies
Analysis and Extraction
Registering
7Infrastructure is Expensive
- Developing, deploying, and maintaining distinct
software for each kind of repository is
time-consuming and expensive - both on the server side, and on the client side
- Common architectural challenges motivate reuse of
technologies - scalability, reliability, auditability, etc.
8Over Time, Content IsEven More Expensive
- Common registries facilitate reuse of content
- Coordination (maintenance) is easier within an
integrated registry system - Mature registry services can maximize content
availability (through multiple APIs) - Mature infrastructure will protect content
investments
9Bridging different realms of metadata standards
11179 Edition 3 Metadata Registry 19763
Framework for Metamodel Interoperability
1011179 Designation and Definition
11Data Models vs.Conceptual Models
MOF Core Ordered, Hierarchical Containment
Concept System Flat Semantic Network
12Twist between Data Elements and Concepts
- Data Elements do not correspond to a single
Concept, but rather a pair of Concepts. - This is an intrinsic characteristic of
informationit is always an assertion of some
predicate, for some subject.
13Concepts and Relationships
14Concept System Draft One
15Concept System Draft Two
16Concept System
17Ontology
18- Implementation
- for the
- Semantic Web
19XMDR files serve dual purposeXML and
OWL-compatible RDF
- Conform to XMDR XML schema
- More human-readable
- Easier to manipulate with XML tools, such as XSLT
- XML serialization of RDF
- Base tag includes rdfabout attribute
- Literals encoded as element content
- URIs encoded as attribute values
- striped resource, property, resource, use
abbreviated form for anonymous nodes - Conform with 11179 OWL ontology
20XMDR Prototype Example dual purposeRDF/XML
file DEALL.1.5394.1.xml
- ltDataElement rdfabout""
- xmlbase"http//xmdr.lbl.gov/xmdr/data/DEALL.1.53
94.1.xml"gt - ltcontainer rdfresource"http//oaspub.epa.gov/e
dr"/gt - ltidentifier rdfparseType"Resource"gt
- ltstring rdfdatatype"xsdstring"gt5394lt/strin
ggt - lt/identifiergt
- ltversion rdfdatatype"xsdstring"gt1lt/versiongt
- ltadministrationRecord rdfparseType"Resource"gt
- ltregistrationStatus rdfdatatype"xsdstring"
gtStandardlt/registrationStatusgt - ltadministrativeStatus rdfdatatype"xsdstrin
g"gtFinallt/administrativeStatusgt - ltcreationDate rdfdatatype"xsddate"gt1999-09
-09lt/creationDategt - lt/administrationRecordgt
- ltdesignation rdfparseType"Resource"gt
- ltcontext rdfresource"CXT-Legacy.xml"/gt
- ltsign xmllang"en"gtCountry Namelt/signgt
- lt/designationgt
- ltdesignation rdfparseType"Resource"gt
- ltcontext rdfresource"CXT-Long
Abbreviation.xml"/gt - ltcontext rdfresource"CXT-Medium
Abbreviation.xml"/gt
ltdesignation rdfparseType"Resource"gt
ltcontext rdfresource"CXT-Registry.xml"/gt
ltcontext rdfresource"CXT-Standard.xml"/gt
ltsign xmllang"en"gtMailing Address Country
Namelt/signgt lt/designationgt ltdefinition
rdfparseType"Resource"gt ltcontext
rdfresource"CXT-Legacy.xml"/gt ltcontext
rdfresource"CXT-Long Abbreviation.xml"/gt
ltcontext rdfresource"CXT-Medium
Abbreviation.xml"/gt ltcontext
rdfresource"CXT-Registry.xml"/gt ltcontext
rdfresource"CXT-Short Abbreviation.xml"/gt
ltcontext rdfresource"CXT-Standard.xml"/gt
lttext xmllang"en"gtThe name of the country where
the addressee is located.lt/textgt lt/definitiongt
lttype rdfresource"RCDIS.1.12116.1.xml"/gt
ltdomain rdfresource"VDALL.1.15147.1.xml"/gt
ltmeaning rdfresource"DCDIS.1.12800.1.xml"/gt
ltexample rdfdatatype"xsdstring"gtUnited
Stateslt/examplegt lt/DataElementgt
21OWL, RDF XML Schema used to specify XMDR as UML
is used for 11179 metamodel
Types Cardinalities
Trang
Triples binary labeled relationships
What things go in own files? Which property
direction stored? Sequential ordering of
properties
22XMDR Prototype Architecture Initial Implemented
Modules
External Interface
RegistryStore
Registry
Java
WritableRegistryStore
Subversion
RetrievalIndex
Jena, Xerces
LogicBasedIndex
FullTextIndex
Jena, OWI KS Racer,Kowari
Lucene
MappingEngine (defer)
Protege
Composition (tight ownership)
Generalization
Aggregation (loose ownership)
23Inference
Disease
is-a
is-a
Infectious Disease
Chronic Disease
is-a
is-a
is-a
is-a
Heart disease
Polio
Smallpox
Diabetes
24RDF Graph Query Facilities Compliment Text Query
Capabilities
- SQL-like queries
- e.g., names of ontologies in a registry
- Span items that are only indirectly connected
- e.g., data elements associated with a conceptual
domain - Expand queries to subsumed classes in hierarchy
- e.g., ConceptualDomain includes EnnumeratedConc..
- Transitivity
- e.g., all subclasses subsumed by a higher order
class - e.g., all superclasses (ancestors) of a
particular class - Least common ancestor
- e.g., closest subsuming concept for 2 concepts
25- Deployment
- and Best Practices
26Content Is King
- Proved true in Web 1.0
- Will remain true in Web 2.0, Semantic Web,
Semantic Grids, etc. - Ship early and often
27What is a Domain Model?Data Points vs. Physical
Entities
- In a 11179 registry, what should the Object
Classes be? - In EDEN water quality system, the Ontology
describes data points, of which STATION and
PARAMETER are attributes - A similar approach is seen in HL7, for capturing
test results and other observations - An alternative approach would be to primarily
describe the physical entities (water or
patients, resp.) directly, as real-world
continuants, and treat data points as
information-level, rather than domain-level,
entities. - Should one or the other be specified as a best
practice? - Should provision be made for both to be captured
explicitly?
28Two Approaches for Water Data
- Tests come in many varieties
- Tests occur at a time
- Data Elements capture the result of a test
- More faithfully captures nuances of measurement
tolerance - Ontology is rather simpler, both in structure and
to develop
- Water has many attributes
- Attributes have different values at different
times - Data Elements capture the value of an attribute
- Might provide much more insight into the domain
of interest - Might provide a better basis for data sharing and
integration