Title: Welcome and Introduction
1Welcome
1
K 1
Bruce Bargmeyer
Lawrence Berkeley National Laboratory University
of California Tel 1 510-495-2905,
bebargmeyer_at_lbl.gov
2Welcome
- Welcome
- Conference Logistics
- Internet connections SSID and password in your
packets. - Meals
- Breakfast with room
- Lunch 90 minutes
- Reception Tonight at 1930 in Emerald Room
- Agenda
- Prompt start and stop of presentations
- Some late breaking changes
- Thanks to Program Committee
- Professor Hajime Horiuchi, General Chairman
- Professor Doo-Kwon Baik, Vice Chair
- Program Committee Members
- ISO/IEC JTC 1/SC 32/WG 2
- ISO TC 37
- ISO TC 184
- Thanks to Speakers
- Thanks to host and organizers
3Thanks
Host IPSJ / ITSCJ Information Processing
Society of Japan / Â Â Â Â Â Â Â Information Technology
Standards Commission of Japan. Supporter OGIS-RI
Co. Ltd. (Osaka Gas Information SystemsResearch
Institute) Sponsors Infoterm International
Information Centre for Terminology TermNet The
International Network for Terminology
Information System Society of Japan
UML based Modeling Technologies Promotion
4Introduction to the Open Forum on Metadata
Registries 2006
1
K 1
Bruce Bargmeyer
Lawrence Berkeley National Laboratory University
of California Tel 1 510-495-2905,
bebargmeyer_at_lbl.gov
5 WG 2 Metadata
TC 37 Terminology
6We have come to join
Terminology Metadata
7???
- Questions from friends, relatives, associates TC
184, TC 154, ebXML Asia (Reg/Rep), ODM, Dublin
Core, practitioners, - Now
- Who are WG 2/TC 37?
- How did WG 2/TC 37 earn a living?
- What skills and tools do they have?
- Future
- Can they work together?
- Can they earn a living in a changing world?
- What skills and tools do they need?
- What are the most promising directions?
- We will discuss these and hear about new RD,
standards development, implementations, and new
ideas from the speakers at OFMR2006.
8Who is WG 1?Metadata
- Area of Work To develop and maintain standards
that facilitate specification and management of
metadata. Use of these standards will enhance the
understanding and sharing of data, information
and processes to support, for example,
interoperability, electronic commerce and
component-based development. The scope shall
include - a framework for specifying and managing metadata
- specification and management of data elements,
structures and their associated semantics - specification and management of value domains,
such as classification and code schemes - specification and management of data about
processes and behaviour - facilities to manage metadata, for example data
dictionaries, repositories, information resource
dictionary systems, registries and glossaries - facilities to exchange metadata, including its
semantics, over the Internet, intranets and other
media.
9Who is TC 37?Terminology and other language and
content resources
- Area of Work Standardization of principles,
methods and applications relating to terminology
and other language and content resources in the
contexts of multilingual communication and
cultural diversity.
10What has WG 2 Focused On in the Past?
- Specification of the meaning of data (what data
is meant to represent). - Documentation of the provenance of data.
- Standardization and harmonization of data
- Stewardship of data
11What has TC 37 Focused Onin the Past?
- Provide a systemic description of the concepts in
the field of terminology - Clarify the use of the terms in this field
- Addressed to, not only standardizers and
terminologists, but to anyone involved in
terminology work, as well as to the users of
terminologies. - Exchange of Terminology
-
12Some Inspirational ISO TC 37 Standards
- ISO 704 Terminology work -- Principles and
methods - ISO 860 Terminology work -- Harmonization of
concepts and terms - ISO 1087 Terminology work -- Vocabulary -- Part
1 Theory and application - ISO 1087 Terminology work -- Vocabulary -- Part
2 Computer applications - ISO 12200 Computer applications in terminology --
Machine-readable terminology interchange format
(MARTIF) -- Negotiated interchange - ISO 12620 Computer applications in terminology --
Data categories - ISO 16642 Computer applications in terminology --
Terminological markup framework
13How Did WG 2 People Earn a Living in the Past?
- Panhandle
- Public assistance (government leadership)
- Data Standards
- Data management
- Data administration
- Build and operate MDR
- Serve as MDR Registration Authority
14How Did TC 37 People Earn a Living in the Past?
- Crumbs off library tables
- Catalog library contents
- Developing terminologies for application areas
- Computational linguistics
15What Tools Skills did WG 2Use to Earn a Living?
- 11179 E 1
- Write data descriptions down in Text
- Used typewriters, word processors, or editors
- Started using DBMSs
- 11179 E 2
- Record data descriptions in databases
- SQL based query facilities
- Designed the Metadata Registry schema with
modeling tools (but wrote it in text also) - Used 11179 E 1 and E 2 Data Standards techniques
16What Tools Skills did TC 37Use to Earn a
Living?
- Information science
- Write thesauri and taxonomies down in Text
- Used word processors, editors, or typewriters
- Started using DBMSs
- Software databaseData Category Registry
-
17The Future?
- Data integration and harmonization is still a
large challenge, but not exciting to large
organizations. - People like to make up new data and new words
unfettered by the past. (It is fine for
dictionaries and registries to record what they
have done.) - Metadata, thesauri and taxonomies are so last
year. - Now knowledge, semantics and ontologies are hot.
They get increasing organizational mind share and
funding. - Knowledge bases
- Ontologies
- Triples (subject, verb, object)
- Inferencing
- Semantic Web
- There is recognition of the need for integration
and harmonization. - But, If you cant dance it, you cant teach it.
from the movie Ballroom Dancing
18New Tools
- Inference engines
- Reasoners
- Agents
- Triple stores
- Search engines
-
- Have these rendered metadata registries and Data
Category Registries obsolete? - Have these rendered TC 37 and WG 2 skills,
techniques and technologies obsolete?
19Semantics Where have we been?
Where are we planning to go?
System manuals
Semantic grids
Data dictionaries
Semantics services (SSOA)
11179 E1
Data ontology lifecycle management
Data Standards
11179 E2
Complex semantics management
Data Management/ Data Administration
ISO/IEC 11179 E3 19763 P 1-4 24707
Data engineering
Terminologies
Metadata Registries (MDR)
Semantic Web Ontologies
XML related standards
20 WG 2 Metadata
TC 37 Terminology
21Ogdens Semiotic Triangle
Thought or Reference
Symbolises
Refers to
Symbol
Referent
Stands for
C.K Ogden and I. A. Richards. The Meaning of
Meaning.
22Concept in Semiotic Triangle
Thought or Reference (Concept)
Symbolises
Refers to
Symbol
Referent
Stands for
Rose, ClipArt
C.K Ogden and I. A. Richards. The Meaning f
Meaning.
23WG 2 Concept
- From 11179 E2 (2003)
- Concept unit of knowledge created by a unique
combination of characteristics ISOÂ 1087-12000,
3.2.1 - Designation representation of a concept by a
sign which denotes it ISOÂ 1087-12000, 3.4.1 - Definition representation of a concept by a
descriptive statement which serves to
differentiate it from related concepts
ISOÂ 1087-12000, 3.3.1 - Concept Relationship a semantic link among two
or more Concepts - concept relationship type description a
description of the type of relationship among two
or more Concepts
24WG 2 Concept
Figure 8 Data Element Concept metamodel
region ISO/IEC 11179 E2 (2003)
25WG 2 Classification Scheme Item
Figure 7 Classification metamodel
region ISO/IEC 11179 E2 (2003)
26TC 37 Concept
27ConceptEssence and Differentia
TC 37 Definition
Ogden Symbol TC 37 Designation
(Sign)
28ConceptEssence and Differentia
Definition Essence
Differentia
Ogden Symbol TC 37 Sign?
29Rose
- 1. any of the wild or cultivated, usually
prickly-stemmed, pinnate-leaved, showy-flowered
shrubs of the genus Rosa. Cf. rose family. - 2. any of various related or similar plants.
- 3. the flower of any such shrub, of a red, pink,
white, or yellow color. - --Random House Websters Unabridged Dictionary
(2003)
30Is each a Rose? as Defined by Essence and
Differentia
31Concept Described by Relationships to Other
Concepts
Love Romance Marriage
CONCEPT
Refers To
Symbolizes
Rose, ClipArt
Stands For
Referent
32SNOMED Terms Defined by Relationships
- Is this
- The thing that is defined as a procedure that
involves an excision of a structure of lobe of
lung? (Axiom)
2. A statement saying All procedures that
involve an excision of the structure of lobe of
lung are pulmonary lobectomy? (Falsifiable
proposition)
33Rose Same Concept?
Romance Love Marriage XXX Baby Family
Romance Love Marriage XXX Baby Family
34Rose Same Concept?
Romance Love Marriage XXX Baby Family
XXX Romance Love Marriage Baby Family
35Rose Same Concept?
XXX
Romance Love Marriage XXX Baby Family
36The Communication Process
CONCEPT
CONCEPT
Symbolises
Refers To
Refers To
Symbolises
Rose, ClipArt
Rose, ClipArt
Stands For
Stands For
Referent
Symbol
Symbol
37CommunicationConcept vs. Symbol
Symbol
Symbol
CONCEPT
CONCEPT
Symbolises
Refers To
Refers To
Symbolises
I see a ClipArt image of a rose
Rose, ClipArt
Rose, ClipArt
Stands For
Stands For
Referent
Symbol
Symbol
Rose
Rose
38RDF Symbol and Reference
Symbol
Symbol
CONCEPT
CONCEPT
Symbolises
Refers To
Refers To
Symbolises
I see a ClipArt image of a rose
Rose, ClipArt
Rose, ClipArt
Stands For
Stands For
Referent
Symbol
Symbol
39RDF Both Symbols and Reference (Definition)
Edge
Node
Node
Subject
Predicate
Object
URI ..
Rose
URI ..
URI ..
40Registry may be used to ground the Semantics of
an RDF Statement.
The address state code is AB. This can be
expressed as a directed Graph e.g., an RDF
statement
41Grounding RDF nodes and relations URIs
Reference a Metadata Registry
dbAe0139
ai MailingAddress
dbAma344
ai StateUSPSCode
ABaiStateCode
_at_prefix dbA http/www.epa.gov/databaseA _at_prefix
ai http//www.epa/gov/edr/sw/AdministeredItem
42URI Resolution in a Metadata Registry
Node and relationship meaning is established
through a URI pointing to an ISO/IEC 11179
Metadata Registry
Mailing Address
http//www.epa/gov/edr/sw/AdministeredItemMailin
gAddress
- The exact address where a mail piece is intended
to be delivered, including urban-style address,
rural route, and PO Box
State USPS Code
http//www.epa/gov/edr/sw/AdministeredItemStateU
SPSCode
- The U.S. Postal Service (USPS) abbreviation that
represents a state or state equivalent for the
U.S. or Canada
Mailing Address State Name
http//www.epa/gov/edr/sw/AdministeredItemStateN
ame
- The name of the state where mail is delivered
Needed Persistent URIs pointing to each item in
a 11179 Metadata Registry (Not currently part of
the standard).
43Major Issues in Semantics Management addressed by
ISO/IEC 19763, 24707 and 11179
- Independent development and autonomous evolution
- Multiple ways to specify the same thing within a
language (formalism, notation) and between
languages - Precise specification so that software (agents,
applications, systems) can process without human
intervention - Harmonization and vetting within a community of
interest - Life cycle management (data, concept systems,
....) - Processing based on semantic reasoning, rather
than procedure
44Strong Commonality of Purpose19763 24707
11179
- Semantics management - creating, managing,
harmonizing, using, exchanging, - Data,
- Concepts relationships (concept systems),
- Sentences/axioms,
- Created by diverse organizations,
- For diverse purposes
- Management approach coordinate and cultivate,
rather than top-down command and control
45ISO/IEC 19763Framework for Metamodel
Interoperability
- Objective
- Promote interoperability based on ontologies.
- Obstacles to ontology-based interoperation
- Issue 1
- Each ontology is developed independently and
evolves autonomously. - Issue 2
- Ontologies are described in several languages,
sometimes with different names for the same thing
in a Universe of Discourse or with the same name
for different things in a UoD. - FMI is to solve these problems, providing a
registration framework for ontologies.
46Difficulty caused by independent development
and autonomous evolution
This ontology has a definition of green card
and does not have a definition of Christmas
card.
This ontology does not have a definition of
green card but has a definition of Christmas
card.
- To avoid this difficulty, FMI Ontology
Registration provides two types of ontologies,
Reference Ontology and Local Ontology.
47Reference Ontology
- FMI Ontology Registration provides the
registration framework where a local ontology is
defined based on reference ontologies
48Goal of Common Logic
- Two agents, A and B, each have a first-order
formalization of some knowledge - A and B wish to communicate their knowledge to
each other so as to draw some conclusions. - Any inferences which B draws from A's input
should also be derivable by A using basic logical
principles, and vice versa - The goal of Common Logic is to provide a logic
based framework which can support this kind of
use and communication without requiring complex
negotiations between the agents.
49Motivation for ISO/IEC 11179Metadata Registry
Extensions
- Support traditional data management and data
administration in more powerful way. - Go beyond traditional Data Standards and Data
Administration. We want to support computer
processing based on semantics--concepts and
relationships.
50Evolution of metadata technology
- From unstructured natural language metadata
(written as text) to structured metadata - Explicit modeling and characterization of
relationships - Graph based metamodels to aid comprehension and
searching - Formal ontologies
- AND from human consumption to machine processing
for - Software agents
- Computing inferences
- Semantic applications (e.g., transitive search,
subsumption testing, etc.), - Semantic services, E.g., mapping between
equivalent value domains, units conversion, - With new key technologies
- Graph databases (e.g., RDF) facilitate
visualization machine processing - Description logic (e.g., OWL DL) for more precise
semantics machine reasoning - Software Reasoners (e.g., inference engines)
51ISO/IEC 11179Metadata Registry Extensions
- Register (and manage) any semantic artifacts that
are useful for managing data. - E.g., this includes registering concepts in any
way related to data e.g., permissible values
and data element definitions. - It extends to registration of the full concept
systems related to an organizations information
held in structured, semi-structured or
unstructured (text) form. - E.g., may want to register keywords, thesauri,
taxonomies, ontologies, axiomatized ontologies. - Provide new services for semantic computing
Semantics Service Oriented Architecture, Semantic
Grids, semantics based workflows, Semantic Web .
52ISO/IEC 11179Metadata Registry Extensions
- In addition to natural language, we want to
capture semantics with more formal techniques - First Order Logic, Description Logic, Common
Logic, OWL - However, maintain backward compatibility for
implementers of 11179 E2
53Motivation Urgent demands for Data Integration
and Harmonization
- Facilitate consolidation reorganization of
government, private companies, and other
organizations - Ongoing acquisitions and mergers of organizations
- Corporations E.g, telecon, energy, banking,
- Government E.g., many agencies put under Dept of
Homeland Security - In National Institutes of Heath, the National
Cancer Institute was created to focus on cancer - Enable cooperation between countries and groups
- World Trade Organization
- North American Free Trade agreement
- European environment Basel Convention
- UN Food and Agriculture global food supply
- Enable sharing of data required quickly for
emergencies - Bird flu terrorism
54Who could use extended metadata registries for
what purposes?
- Analysts, researchers anyone trying to create,
harmonize, and manage data, concept systems,
knowledge bases, rule bases, ontologies, RDF
statements - Engineering and Harmonization
- Vetting (gaining approval), establish trust, and
enable stewardship - Creators of new semantic computing systems
applications - Ground OWL ontologies and RDF statements
(subjects, predicates, objects) in agreed upon
definitions maintained in a metadata registry - Use managed semantics within a community of
interest - Integrate existing semantics in new ways
- Improve semantics re-use
- Computers that are processing semantic computing
applications - Agents to access, map, and reason over data and
concepts - Applications that interact with both concepts in
concept systems and data in databases. - Grid computing - grid software can use the MDR
XML representations for exchanging comparing
objects (also, possibly RDF or OWL
representations). Service Metadata in an MDR
can be used on the grid to support semantic
service discovery, service consolidation and
dynamic creation of services workflows.
55Concept Management
- In general, we want to register any concept based
graph structure comprised of nodes,
relationships, and possibly axioms - possibly including millions of concepts, millions
of terms, and millions of relationships (maybe
billions). - We want to link the concepts (e.g., research
organization w, person x, disease y, location z)
to data and text.
56Example Concept Systems
- NBII Biocomplexity Thesaurus
- National Cancer Institute Metathesaurus
- NCI Data Elements (National Cancer Institute
Data Standards Registry - UMLS (non-proprietary portions)
- GEMET (General Multilingual Environmental
Thesaurus) - EDR Data Elements (Environmental Data Registry)
- USGS Geographic Names Information System (GNIS)
HL7 Terminology, Data Elements - Mouse Anatomy
- GO (Gene Ontology)
- EPA Web Registry Controlled Vocabulary
- BioPAX Ontology
- NASA SWEET Ontologies
57Concept Systems and traditional metadata can be
represented queried as graphs
Nodes represent concepts or types of metadata
A
Lines (arcs) represent relationships
2
1
b
a
c
d
58Finding Hidden Information in Registry Metadata
(Including Concept Systems)
Waterfowl
Waterfowl
Goose
Duck
Goose
Duck
59Include Concept System Semantics in Metadata
Registries
Represent concepts and relationships as nodes
and edges in formal graph structures e.g., is-a
hierarchies.
Waterfowl
Duck
Goose
60What new search capabilities do graph models
inference support?
- SQL-like structured queries (e.g., RDQL)
- e.g., SELECT ?x WHERE (?x rdftype
xmdrValueDomain) - Can span items that are only indirectly connected
- e.g., data elements associated with a permissible
value - Expand queries to subsumed classes in a hierarchy
- e.g., all cities within state and states within
countries (partonomy) - e.g., all species subsumed under birds
(taxonomy) - Search for higher level concepts or metadata
- e.g., all superclasses (ancestors) of a
particular class - e.g., least common ancestor (subsuming concept)
for cat and snake - Explore sibling items
- e.g., other airport codes comparable to SFO
61Inference
Disease
is-a
is-a
Infectious Disease
Chronic Disease
is-a
is-a
is-a
is-a
Heart disease
Polio
Smallpox
Diabetes
Signifies inferred is-a relationship
62Taxonomies partonomies can be used to support
inference queries
E.g., if a database contains information on
events by city, we could query that database for
events that happened in a particular county or
state, even though the event data does not
contain explicit state or county codes.
63Relationship metadata can be used to infer
non-explicit data
Analgesic Agent
- For example
- patient data on drugs currently being taken
contains brand names (e.g. Tylenol, Anacin-3,
Datril,) - (2) thesaurus connects different drug types and
names with one another (via is-a, part-of, etc.
relationships) - (3) so patient data can be linked and searched
by inferred terms like acetominophen and
analgesic as well as trade names explicitly
stored as text strings in the database
Non-Narcotic Analgesic
Analgesic and Antipyretic
Acetominophen
Nonsteroidal Antiinflammatory Drug
Datril
Anacin-3
Tylenol
64Least Common Ancestor Query
What is the least common ancestor concept in NCI
Thesaurus for Acetominophen and Morphine
Sulfate? (answer Analgesic Agent)
Analgesic and Antipyretic
65Example sibling queries concepts that share
ancestor
- Environmental
- "siblings" of Wetland (in SWEET ontology)
- Health
- Siblings of ERK1 finds all 700 other kinase
enzymes - Siblings of Novastatin finds all other statins
- 11179 Metadata
- Sibling values in an enumerated value domain
66More complex sibling queries concepts with
multiple ancestors
- Health
- Find all the siblings of Breast Neoplasm
- Environmental
- Find all chemicals that are a
- carcinogen (cause cancer) and
- toxin (are poisonous) and
- terratogen (cause birth defects)
site neoplasms
breast disorders
Breast neoplasm
Non-Neoplastic Breast Disorder
Eye neoplasm
Respiratory System neoplasm
67Metadata relationships can also be used to infer
connected information
Database
- For example
- An agency has hundreds of different databases,
with metadata for each in a 11179 Registry
. - Manager asks which databases can be searched to
find specific information for China? - Search code values for China ( synonyms like
CN) and show all databases that are connected
only indirectly via Ennumerated Value Domain,
Value Domain and Permissible Value
Data Element
Value Domain
Ennumerated Value Domain
Non-ennumerated Value Domain
Permissible Value
Permissible Value Meaning
68Different ontologies support semantics management
at different levels
11179 classes, properties, and relationships
11179 Metamodel Level
Concepts and Terms
11179 Registry Level
Database B
Application Software Level
Database A
69Nodes and relations support inference on 11179
metamodel
70Advanced Use Scenario Allergy AlertLinking
concept searches, metadata searches, and database
queries (outline)
- Event Doctor prescribes medicine. Will patient
have allergic reaction? - Event triggers concept system search to determine
if the prescription is a drug and if so, what
type of drug. The first search is for an isA
relation, followed by a search for a partonomy
relation - Then system must perform a metadata search to
find data elements in information systems
relating to patient allergy - Result of metadata search enables a database
lookup in a patient record - Database lookup produces a drug reaction code
- System must look up the code in a concept
systemto find type of reaction and category of
drug - Relate drug reaction to category of prescribed
drug - Produce Allergy Alert for Dr. Patient
71Scenario Allergy Alert
- Event
- Prescription 500 mg Prevpac bid
72Scenario Allergy Alert
- Is this a prescription for a drug?
- Yes concept system lookup says prescription
category is for drugs and devices, That is,
Prevpac isA Drug - If so, what category (ies) of drug?
- Lookup in Concept system (partonomy) shows that
Prevpac contains - Lansoprozole - proton pump inhibitor
- Amoxicillin - beta-lactam antibiotic
- Clarithromycin - macrolide antibiotic
73Scenario Allergy Alert
- Does the patient have an allergy to any of the
drugs? - Need to Metadata lookup to find relevant data
elements in patient record databases - Need to join the contents of the database(s)
- Diagnosis Allergy to ___________
- Observation Apparent reaction to ______
74Scenario Allergy Alert
- Search Database Patient Record
- Result Dx ICD-9-CM code 996.2 Unspecified
adverse effect of drug, medicinal and biologic
substances - Search Concept System
- Result Adverse reaction SNOMED
- 294461000 (antibacterial drug allergy)
- 246075003 (causative agent)
- 392284008 (nafcillin)
75Scenario Allergy Alert
All of these contain a form of penicillin
76Scenario Allergy Alert
Nafcillin isA Penicillin Amoxicillin isA
Penicillin
77Alert
- Warning!!! Patient has had a prior adverse
reaction to Nafcillin which is similar to the
component Amoxicillin in the current
prescription. - Note The Rand Corporation states that billions
of dollars per year can be saved in healthcare
expenditures and better results can be achieved
with improved medical systems of this type. - --Rand Review, Fall 2005
78Summary MDR FMI Concept System Store
Concept systems Keywords Controlled
Vocabularies Thesauri Taxonomies Ontologies Axioma
tized Ontologies (Essentially graphs
node-relation-node axioms)
79Summary MDR FMI to Manage Concept Systems
Concept system Registration Harmonization
Standardization Acceptance (vetting) Mapping
(correspondences)
80Summary MDR FMI for Life Cycle Management
Life cycle management Data and Concept
systems (ontologies)
81Summary MDR for Grounding Semantics
Metadata Registries
Semantic Web RDF Triples Subject (node URI) Verb
(relation URI) Object (node URI)
Ontologies
8211179 E3 Proposal
83TC 37 WG 2Lets continue to align our standards
- Some topics to discuss
- What is understood under "concept" in
terminology, ontology, metadata approach? - Should WG 2 use the term concept system where
we are using it? - How do we include CL axioms and sentences?
- What is the role of attributes, characteristics,
qualifiers, identifiers etc. in data modelling? - What should be the use of "meta" metadata,
metainformation, metaontology, metasystem, etc. - What is the role of classification (incl.
different types of classification)? - Can we agree on harmonized naming principles?
- What is a typology, nomenclature, categorization?
84Presentations to Follow
- I look forward to hearing the tutorials and
presentations covering standards, RD, and
practical applications in these areas.