Formal and Informal Approaches to Indexing - PowerPoint PPT Presentation

1 / 81
About This Presentation
Title:

Formal and Informal Approaches to Indexing

Description:

Simple Model of Hospital Information System (HIS) & Data Movement ... well-formulated problems in fields such as physics or mathematics is bounded. ... – PowerPoint PPT presentation

Number of Views:172
Avg rating:3.0/5.0
Slides: 82
Provided by: test156
Category:

less

Transcript and Presenter's Notes

Title: Formal and Informal Approaches to Indexing


1
Formal and Informal Approaches to Indexing
Assimilating Data Knowledge
  • Presentation for NIH/BCIG
  • Sept. 2002
  • Gary Berg-Cross
  • Knowledge Strategies/SLAG

2
The Problem
  • We are drowning in information but starved for
    knowledge. This level of information is clearly
    impossible to be handled by present means.
    Uncontrolled and unorganized information is no
    longer a resource in an information society,
    instead it becomes the enemy."
  • -- John Naisbitt, author of 1982
    bestseller Megatrends
  • Common Approaches
  • Systematically formalize and standardize
    OR
  • do what is practical even if not well
    founded.

3
Outline
  • Thematic Overview
  • Access to nuggets
  • IR, summarization Portals
  • Structured Data, Warehousing Metadata
  • XML Knowledge
  • Concluding Thoughts

4
We are flooded more is on the way
Which file has my article on Wolfram?
Whos using Topic Maps for healthcare data?
After Tim Finin (UMBC), Intelligent Information
Systems on the Web and in the Aether
5
Evolving User Tasks and Technology
  • Pull technology
  • User requests information in an interactive
    manner
  • 3 access tasks
  • Retrieval (classical IR systems)
  • Browsing (hypertext)
  • Browsing and retrieval (modern digital libraries
    and web systems)
  • Push technology
  • automatic and permanent pushing of information to
    user
  • software agents
  • example news service
  • filtering (retrieval task) relevant information
    for later inspection by user

6
Simple Model of Hospital Information System (HIS)
Data Movement
Vertical silo" or "stovepipe" phenomenon At
present most clinical software systems are
closed with little or no interoperability
between them.
MD needed to share this information and know
what it is.
Sharing and exchange of clinical data are
currently impeded by the lack of standards for
electronic health records and the lack of
harmonization between different clinical
computing systems. Clinical data are locked in
a variety of different incompatible databases
7
In the Context of such Complexity
  • How can we find anything?
  • How do we gather information that is distributed
    over various computer systems and represented
    using different formats?
  • If we find something, how do we know that it is
    complete?
  • How can large amount of information be
    analyzed?
  • How do we integrate diverse types of info?
  • Can I create summaries?
  • What information/derived information can we
    believe trust?

8
Evolving Old Devices to Aid Digital Access
  • Indexes (weak links, no semantics)
  • Glossaries
  • Keywords e.g.
  • metadata, metadata architecture, gene map,
    dynamic query, standard
  • Thesauri
  • Catalogs
  • Cross-references
  • Can these be integrated/assimilated?
  • To traditional uses we add desire for
  • Easy maintenance of links.
  • End of cross-reference nightmare.
  • So we can focus on semantics of linking, while
    machines will do the addressing job.

9
General Themes Throughout
  • We need bridging work in the middle
  • There is something like a continuation from data
    to knowledge with lots of action in the middle
  • Sharing metadata
  • Enterprise architectures
  • Data semantic models experience as a baseline
    for XML.
  • (Besides continued deepening of existing
    approaches) Integration and reuse to test reality
    as we go
  • We simplify organizational efforts in 2 competing
    ways work approximately scruffy or formally
    neat
  • There are scruffy approaches, models etc.
  • Granular complexity, heterogeneity drives us to
    scruffy pragmatics
  • Scruffy models may be formalized into neat models
    and visa versa
  • Text codes as HTML or XML
  • Ontologies are realized by scruffier data models,
    glossaries and vocabularies

10
Data to Knowledge Continuum
Text search (key words)
Data Queries
Knowledge Management
Un-Structured
Middle
Structured
Few general principles K is a kludge
inelegant shortcuts necessary
Theories should be neat elegant
Parsimonious understand exactly how theories
behave
Metadata Management data mining
Ongoing research to construct formal agent
ontologies, like KB development, is difficult
XML queries text mining
11
Recent View of Index Management
  • Content is a set of document units
  • Files, pages, images, sections (now XML elements)
  • Each document unit should be reusable outside
    its creation context
  • Classical tools to achieve reusability
  • Unique identification and network-addressability
    of document units
  • External indexing using controlled vocabulary,
    domain-expert indexers, and/or DB-based indexing
    tools
  • Internal annotation by standard metadataand
    metadata-aware technology

12
This Static Integrated Index Management
  • Makes strong assumption ...
  • Stability of documents content and address
  • Universally understood vocabulary for Index
    Subjects
  • Integration of Metadata and External Indexation

Documents
13
The Real Document World
  • Is dynamic - Documents are moving targets
  • Index Subjects make sense only in local
    vocabulary/context
  • Internal metadata and external indexing are
    managed separately and are likely to be
    inconsistent

Falling Behind
?
14
Information Access Contrasted to Data Access
XML may allow this. Covered later.
Can I integrate these?
15
Scruffy Vs. Neat Approaches Computational Models
  • Neats believe that engineered programming/logic
    is king,
  • Scruffy methods appear sloppy, succeeding by
    luck, no insights about how intelligence works
  • Neat generally involves provable algorithms,
    starts top-down, modeling higher level behavior.
  • IT neats' try to build systems that
    process/reason in a formal way
  • Scruffies favor looser, more ad-hoc methods
    driven by empirical knowledge.
  • To a scruffy, neat methods appear to be hung
    up on formalism irrelevant to the hard-to-capture
    common sense'
  • Scruffies (Brooks etc.) favor a bottom-up
    approach to produce complex behavior from the
    interactions of simple rules.
  • Scruffies' may mix approaches to see what
    happens

But we may mix a scuffy method with a formal
representation visa versa
16
Formal Representation Example (Bipartite SM)
   Person Gary  - ( measure ) -gt height
_at_68 _at_unitsinches ( measure ) -gt weight
_at_180 ( part ) -gt hair -gt ( has_color ) -gt
color brown ( part ) -gt body -gt ( part )
-gt stomach -gt (has_thickness)-gt
thicknessthick ( part ) -gt eye - (
shape ) -gt circular ( has_size ) -gt
sizesmall ( has_color ) -gt colorbrown
Person Instance-Of Class,
Primitive, Relation, Set, Thing
Subclass-Of Entity, Individual,
Individual-Thing, Thing .
17
Logic is Easy, IM is Hard
  • The knowledge needed to solve a commonsense
    reasoning problem is typically much more
    extensive and general than the knowledge needed
    to solve difficult problems. McCarthy
  • the knowledge needed to solve well-formulated
    problems in fields such as physics or mathematics
    is bounded.
  • In contrast, there are no a priori limitations to
    the facts that are needed to solve commonsense
    problems (manage documents) the given knowledge
    may be incomplete
  • one may have to use approximate concepts and
    approximate theories
  • one will generally have to use nonmonotonic
    reasoning to reach conclusions and one will need
    some ability to reflect upon one's own reasoning
    processes.

18
Explicit vs. Implicit Knowledge
  • Commonsense knowledge is often implicit, whereas
    the knowledge needed to solve well-formulated
    difficult problems is often explicit.
  • E.g., the knowledge needed to solve integrals is
    explicitly found in a standard calculus book.
  • However, the knowledge needed to arrange a
    meeting or talk exists in vague, implicit form.
  • Tacit knowledge must first be made explicit, - a
    time-consuming task requiring a serious knowledge
    engineering (KE) effort.

19
Advances may add Scruffy or Formal Info in formal
or informal ways
Knowledge Extraction
Progress/Quality Of result
Time
For example, formal testing process is
fundamentally slow and cannot be conducted
exhaustively. Consequently, some argue that the
usual case for model testing has to be
approximate non-exhaustive testing i.e. some
subset of the possible tests are chosen and
executed.
20
Does Granular Differences Limit the Scope of
Formal Approaches?
  • Movement from primitive to compound/conglomerate
    concepts

Complexity challenges Complete, formal
understanding
Moderately complex. ltcc1ccgt
Conglomerate 1 ltc1cngt
Conglomerate2 ltc2cgt
Glue?
Compound 1 ltb1,bngt
Compound 2 ltb2,bxgt
Compound 3 ltb13by.gt
Working Base b1,b2,b3
Ultimate Base P1,P2,P3
21
My work environment moving to enterprise
approach
Text Base
XML Tools
TEXT
ANALYTICAL
PROCESSING
PROCESSING
SYSTEM
SYSTEM
CORPORATE DATA WAREHOUSE
Data
User
provider
Data
User
provider
Data
User
GLOBAL METADATA
provider
DATA ENTRY
PRINTING
INCLUDING REPOSITORIES
Data
User
provider
ELECTRONIC
DOCUMENTS
Data
User
provider
Data
User
provider
Source Systems
RAW
Tables
MULTIDIMENSIONAL
DATA
STATISTICS
MD Repository
Gneralized software
and reusable
software components
METADATA
PROCESSING
SYSTEM
22
Abstracting Silos and Integration
Silo Integration
Silo Integration
Silo Integration
Research
Data/DW App
XML App
Text App
App/ Data Level
Indexes
RDF/MD
Models/ MD
Tools
MD Level
Tools
Tools
Inter-Silo Integration?
Ontologies, lexicons, standard vocabularies
General Scruffy Knowledge
23
Focus of Current Work on Integration
DOD Ent Arch EI/DS CHCS .
MD Repository Registries HDD .
Portals eBPS
Arch Models ..
XML Schemas
Docs/ Web Pages
App/ Data Level
HTML MD Indexes
RDF/MD
Models/ MD
Modeling Tools
MD Level
Portal Tools
XML Tools
Common Warehouse Model, MetaIntegration Tools
Standardize Knowledge (Scruffy Limit)
Standards medical dictionaries, HL7 models
vocabularies
24
Focus and Goals of Work
  • Build a scalable, integrated, standardized
    information system infrastructure to provide
  • Library and Forum Portal with document management
  • Explore standards and tools to annotate
    semi-structured information with concepts
    obtained from DOD concept-oriented, controlled
    terminology.
  • Help the user to index information
  • Tools to help integrate data, model metadata
    represented/produced by MHS logical and system
    Enterprise Architectures which will serve as a
    prototype for later work including
  • DOD HA System integration and development
  • DOD VA enterprise sharing (models before data)
  • Ushik repository built on 11179
  • Design a metadata Repository with database system
    infrastructure to stores content-dependent
    metadata, concept-oriented, indexing metadata
    (data annotations), and links of the metadata to
    the primary data sets in Systems and Projects

Information and knowledge sharing through common
representation languages, ontologies and
protocols -combines neat and scruffy techniques
25
Some Information Access Requirements Documents,
Portals Content etc.
  • Text retrieval, search and processing
  • Challenges Contests
  • TREK text retrieval, MUC (NLP),Tipster
    (summaries)
  • Summarization to simplify browsing create better
    content indexes
  • Integrating of text and data
  • Can I have a report please???

26
Information Access AKA information retrieval
(IR)
  • Goal is to
  • Find a needle, help users query documents to
    satisfy information needs
  • make things easy to find
  • Existence is assumed in some formal fashion
  • Copy theory of knowledge query reuse (what
    about implicit content?)
  • Context is challenging

Categorization
27
Typical Indexing Arrangement
Document Sources
Stands alone
Full text representation Most complete
representation High computational cost
Set of index terms or keywords extracted directly
form text Or specified by human subjects
(information science) Most concise
representation Poor quality of retrieval
Text Index
28
The Retrieval Process After Indexing
Index is just a list, no integration or
summary. What if we formalize content a
bit? What if we allow fuzzy Concepts?
Coordinated
  • Summarizer
  • Creates a topic-based (thematic) summary of
    document content.
  • Its outputs might include 
  • a sentence-extracted summary, 
  • keywords and phrases for each of the topics in a
    document.
  • Do we have general principles or allow scruffy
    heuristics?

29
We Now Have Expanded Content Search Techniques
AltaVista automatically surfs and indexes the
web. Yahoo catalogues and organizes useful
web sites. Fulcrum provide a way to query both
full-text and RDB sources from a
single user interface Excite also tracks
queries and classifies customers. Firefly
provides builds customer profiles. Alexa
collects webpages and their usage. Google
ranks the reference importance of web
pages. Junglee integrates diverse sources
into a VDB (weak
summarization). Verity,
Infoseek , Inktomi (HotBot) . . . (after
Widerhold) Yahoo organizes indexes material but
not by general principles -seems scruffy at the
macro level. Is any of the metadata in these
efforts reusable beyond keywords?
30
Automatic Text Summarization Indexed to Manage
Explosion Simplify Messy Population of Portals
  • Historically relied on the extraction of key
    sentences from the summarized document
  • Many tradeoffs
  • May convey explanations missing in the original.
  • But extracted sentences may contain extraneous
    information, which stretches the length of the
    summary and increases the chance of introducing
    incoherence.
  • Because sentences are extracted without context,
    at best they can be incoherent and at worst, they
    can convey misleading information.
  • The summary extract also lacks balance and text
    structure (Paice, 1990). We may allow human
    editing.
  • In the last 10 years the focus has been to
    develop summary generation techniques that can do
    better than naive extraction. Theoretical
    foundations, including cognitive models are
    making this NEAT.

31
Making Web Pub NEATER (Part 1)Dublin Core
Medical Core Standard
  • Metadata model of indices for use with Web
    content resulted from a meeting in Dublin( Ohio!)
  • Provides a generic model for Core components
  • Title, Author, Keywords, Description, Publisher,
    Resource Type, Format, Resource Identifier,
    Language, etc..
  • Dublin Core for Medicine Medical Core Metadata
    (MCM) Added some resource types
  • meeting, pathology images, radiology images,
    patient educational material, review, practice
    guidelines, etc...
  • Implementation of MeSH information in the Dublin
    Core Metadata
  • Malet G, Munoz F, Appleyard R, Hersh W. J Am Med
    Inform Assoc 1999 Mar-Apr6(2)163-72
  • NOTE Dr. Malet published the first "list of
    medical sites" on the Internet --- survives today
    as Medical Matrix (www.medmatrix.org)

32
This is a Public Document Portal. DOD has its own
Documents by subject with indexes, etc. It
includes structured data and text!!!
33
Example of Categorization Push for Portals
  • A categorization engine is used for sorting
    documents into the folders based on a taxonomy.
  • People try general principles but usually wind up
    with a hybrid, since this document/ knowledge
    engineering is hard.
  • The categorization engine may do this based on
    metadata in the documents, based on business
    rules, based on the content of the document,
    based on search criteria or filters, or some
    other scheme.

34
Based on Concepts Rather than Words
  • Access is increasingly concept-based in an
    informal way to handle Context ? Its Important
  • The words prices, prescription, and patent'' are
    highly likely to co-occur with the medical sense
    of drug''
  • Abuse, paraphernalia, and illicit'' are likely
    to co-occur with the illegal drug sense of drug
  • Church and Liberman1991

35
The Universe of Portals Info using a Scruffy
Information Directory?
Glossary of professional terms
Categorize
Spreadsheets Project Catalogs
Understanding for Collaboration
ID
Management metadata Data collection Databases
DWs Publications
Summarize
Data Stores/Marts
Data Bases
Unstructured XML Documents
Corporate
Corporate
Structured Data Reports
Region
Region
Service
Service
Markets
Markets
36
Towards and intelligent Enterprise
Integrated Enterprise Data Models
clinical resource
Patient info
mktg resource
Multiple Communities
BI Data
Data Warehouse, ODS, Data Mart
Other Apps, External ContentSources
BI Apps
BI Meta data
37
Lets Move to Structured Data
Grounding Of Instances
Formally Defined MD
Standards
  • Useful to Consider Three Levels
  • Integration examples

38
Metadata Consistency Issues
  • A Hierarchy of Collection Contexts for
    information (earlier drug example)
  • What we mean depends on these contexts.
  • Architecture collections
  • Data Models Functional collections
  • Messages Data Elements
  • Clinical vocabulary domains

A problem is the disparate nature of the
metadata collection and reuse. Because there is
no coordinated MD repository, the same metadata
may be defined repeatedly by various
groups/departments. Hence as a field,
healthcare metadata are inconsistent.
39
Federal Enterprise Architecture Framework (FEAF)
40
Architecture Examples
High Level Operational Graphic (OV-1)
Views
Domain and Naming are not typically managed
between the conceptual products, during the
product development cycle and especially
instances of the Architecture.
41
Need for Standard Terms to Support Business Model
Mapping
An activity in Model 1
Different definitions
Activities in Model 2
Too high a granularity, too informal.
42
Levels Scope Info Integration Conceptual
Spaces
Each part maintains a body of information, not
easily coordinated or interoperable with other
collections.
Evolved over time w/o the benefit of
strict data standardization policies
enforcement Need to exchange info and use
it- Needs Semantic Interoperability
Coordination (MD XML tag etc.)
43
Why Coordination Interoperability is Hard
  • Groups found it easier to do each part themselves
  • at the least the first time, now they are
    legacies
  • Lack of Enterprise Architecture approach
  • meant gaps in Architecture data models etc.
  • No coding scheme is comprehensive
  • Drugs, lab tests, signs and symptoms
  • Lack of a common business area data model
  • some standards and products are competing for
    this
  • Structure is not coordinated with
    terms/vocabulary
  • Duplication of MD and xml tags
  • Proprietary interests

44
Data Warehouse Architecturean opportunity to
standardize integration
Load all the data periodically into a
warehouse. Separate operational from decision
support DBMS.
OLAP / Decision support/ Data cubes/ data mining
User queries
Relational database (warehouse)
Data extract, transform load (ETL)
Data cleaning/ scrubbing
MD capture
Data source
Data source
Data source
45
Kinds of Metadata
  • There are several kinds of meta-data that people
    commonly talk about.
  • One is structural meta-data
  • schemas, interface definitions, and other
    data-structure-like things, which describe how
    information is put together.
  • E.g. database schema in a database system,
  • E.g. a Web site map that describes how the Web
    pages are connected to one another
  • Process meta-data
  • Descriptive Definitional
  • As seen in information retrieval.
  • It's things like keyword descriptions and other
    content-oriented descriptions of information.

There are different tools for each major kind,
although some integration
46
Neat Attempt OMG CWM Metadata Standard
  • Metadata is used for building, maintaining,
    managing, and using DB collections such as data
    marts warehouses.
  • Most data management, analysis MD driven tools
    have their own infrastructure use different MD
    representations
  • Metamodel are needed to exchange MD
  • Object Management Group (OMG) developed a
    standard, the Common Warehouse Metamodel (CWM),
    to help manage MD
  • It provides a framework for representing metadata
    about data sources, data targets, transformations
    and analysis, and the processes and operations
    that create and manage warehouse data and provide
    lineage information about its use.
  • Note, there are new algebras to manipulate MD (P.
    Bernstein of Microsoft)

47
DOD HA Data Warehouse Repository
MHS Metadata Repository as Metadata Hub
48
An overall MD Repository process
Had someone already written this Schema?
Outlined in the XML/EDI Group's Repository white
paper in 1999
49
Common Warehouse Model (Neat! An attempt to solve
the consistency problems)
Standard definitions on metadata for all these
subjects (UML formalism).
50
Example of Areas Covered
51
Contact Info
52
Keys Index Model
Index Instances of the Index class represent the
ordering of the instances of some other Class,
the Index is said to span the Class. Indexes
normally have an ordered set of attributes of
the Class instance they span that make up the
key of the index this set of relationships is
represented by the IndexedFeature class that
indicates how the attributes are used by the
Index instance. The Index class is intended
primarily as a starting point for tools that
require the notion of an index.
53
About Extensible Markup Language(XML)
  • tag-based, data format, simple SGML subset, for
    structured document/web interchange
  • XML shares some things in common with the display
    format-oriented HTML. Both formats save their
    information in plain text files.
  • XML is focused on document structure rather than
    on document formatting.
  • XML defines a set of tags used for representing
    text as various pieces of information - an
    address, a phone number, a price, etc.
  • XML creates an environment where text may be
    communicated as information.
  • But language requires syntax, vocabulary, and
    semantics.
  • Tag myopia only formally defines syntax (other
    part is HARD)

54
XML DTDs/Schemas take a step in the Neat direction
Schemas help by .
relating common termsbetween documents by
tag labels
lt CV gt
private
after Frank van Harmelen and Jim Hendler
55
XML semi-structure
  • XML allows for structure. With XML, the embedding
    of one element in another declares the structure
    of the data.
  • Simply having the Address element as a
    sub-element of Patient "tells" the receiving
    application that this address belongs to this
    person.
  • ltPatientgtltAddressgt 12 N. Grove Road Potomac MD
    zip20854" lt/Addressgt     ltAddressgt 41 S.
    Soldier Road Arlington VA lt/Addressgtlt/Persongt
  • flat files don't easily allow for structure.
  • A piece of information like Patient or address
    marked by the presence of tags is called an
    element.
  • Elements are further enriched by attaching
    name-value pairs (for example, zip20854" in the
    example above) called attributes.

56
Pure XML -- Schema LanguageDocument Type
Definitions (DTDs)
  • lt!ELEMENT element-type content-modelgt
  • Defines content model of an element type
  • Element-type is the name of the element (or tag)
  • Content-model is a regular expression defining
    structure of sub-elements
  • Data if a leaf
  • lt!ATTLIST element-type attribute-name
    attribute-typegt
  • Defines for elements named element-type
    associated attributes and their types
  • Element-type is the name of the element (or tag)

57
XML DTDs exist for things likeltPrescriptiongt
  • ltMedication.NamegtAmoxicillinlt/Medication.Namegt
  • ltFormgt250 mg. Capsulelt/Formgt
  • ltDispensegt30lt/Dispensegt
  • ltDosage Amount"1"gt1 cap(s) lt/Dosagegt
  • ltInstructionsgt3 times daily until
    gonelt/Instructionsgt
  • ltRefill Number"0"gtno refillslt/Refillgt
  • ltSubstitutegtcan substitute generic
    equivalentlt/Substitutegt
  • lt/Prescriptiongt

Drug form?
Early tags were not carefully named. Creates a
legacy problem. Coding helps make it a
Processable form, but there are no semantics.
58
Can map Relational Data to XML
R
?R? ?tuple? ?A? a1 ?/A? ?B? b1 ?/B? ?C? c1
?/C? ?/tuple? ?tuple? ?A? a2 ?/A? ?B? b2
?/B? ?C? c2 ?/C? ?/tuple? ?/R?
(XML Tree)
59
Tags are Names Requires Work
  • Assuming you have developed a robust markup
    language for data exchange, you still need to
    perform the following tasks
  • Metadata Mapping. Like DWs you must understand
    the metadata between systems that will
    communicate using your new markup language. You
    must map the physical storage of the application
    to the elements and attributes in the markup
    language.
  • Data Mapping. You must map the data content as
    well, e.g. if the source and target systems
    expect a different set of valid values for an
    element, you must provide the rules for the
    translation. You may also have to combine data
    mapping and metadata mapping -- for example, when
    a set of source data values maps to one place in
    the target under one set of conditions, and to a
    different place in the target under a different
    set of conditions. Data Mapping can be especially
    problematic when the source contains many default
    values -- but the default value is not valid in
    the target.
  • Null Mapping. If one system allows many values to
    be null, but the other system cannot handle
    nulls, some allowance must be made for this.

60
Making XML Neater Structural Schemas
  • Problems with DTDs
  • no data types or specialization/extension of
    types
  • no "higher level" modeling (classes,
    relationships, constraints, etc.)
  • Integration schemas
  • primitive data types
  • integers, dates, and the like, based on our
    experience with SQL, Java primitives, byte
    sequences
  • cardinality constraints
  • Inheritance.
  • Making kind-of relations explicit would make both
    understanding and maintenance easier
  • markup languages are now commonly pressed into
    service as "data modeling languages" and
    "conceptual modeling languages" although (to some
    of us) the particular features of (SGML/XML)
    markup languages render them unsuitable to the
    task.

61
WWW (S or N?)
  • Starts with minimal formalism and proceeds a to
    add scruffy complexity

Qualitative Change
Growth/ Effectiveness concept
Web pages
S
MML/RDF/SOAP.. Semantic Foundation
N
HTML/HTTP
time
Late 90s
62
XML the Semantic Web Thrust
  • Web Builds on simple but neat start. Now messy.
  • Semantic web"coined by Tim Berners-Lee entails
    adding "concept/meaning" information to Web
    content
  • Globalize link structured collections of
    information in a general auto-processable way
  • agent systems cooperate to facilitate resource
    discovery, intelligent browsing, e- commerce,
    etc.
  • Use a simple ontological approach structuring
    relies on the eXtensible Markup Language (XML)
    the Resource Description Framework (RDF) RDF
    Schema.
  • The RDF model is like an ERA model, but is open
    to interpretations -relationships are not rigid
    definitions Obj attributes are not fixed in
    class definitions.
  • Instead they are linked by a Uniform Resource
    Indicator (URI). Thus anyone can make a
    relationship to a topic anyone can provide such
    a view of meaning via a URI.

63
Semantic Web (SW) would do what?
  • Concept-based search
  • ? keyword-based search
  • Semantic navigation ? link-based
    navigation
  • Personalization
  • ? one size fits all
  • Query answering
  • ? document retrieval
  • Services
  • ? CGI calls, but service-description languages,
    negotiation, service composition, etc

After Tim Finin
64
Semantic Summarization Tagging
  • Metadata would greatly enhance the ability to
    link Web content semantically --- or by meaning,
    rather than just by keyword or Web master
    arrangement
  • Automated semantic tagging
  • Semantic information has traditionally added
    manually --- VERY costly and not practical for
    the way content is created today
  • Example -- MeSH "coding" by the NLM
  • Unclear whether it can be done consistently among
    various human indexers
  • Scruffy tagging summariztion tools may be
    beneficial -- even if not perfect
  • Return to this in Topic Map discussion

65
Three Layered SW Architecture
Logic Layer Formal Semantics and Agent Reasoning
Support DAML-OIL
RDF Schema Layer (Bickley Guha, 2000) Defines
simple Ontological Vocabulary (Class/ Sub-class)
to help ensure model/MD consistency
Data/Resource Instance Layer Uses a subject,
property, object model, statement syntax for
metadata RDF
66
Resource Description Framework (RDF/RDF-Schema)
  • Metadata model
  • The designer can describe objects, add properties
    to define and describe them, and also make
    statements about the objects (statements about
    relationships between resources).
  • The specification comes in two sections
  • Basic instance model/syntax (viewed as directed,
    labeled graphs)
  • RDF Schemas

67
Resource Description Framework (RDF)
  • Metadata is useful for information retrieval
    (esp. if no other schema info or semantics is
    available)
  • Idea representation independent encoding of MD
    as triples (Resource, PropertyType,
    Value)
  • (NIH, Protocolcreator, Cancer Protocol), (Cancer
    Protocol, DescriptionlTitle, breast cancer), ...

DCName
Cancer Protocol
www.NIH..
DescriptionTitle
Breast Cancer
Maps into Logic
68
Resource Description Format Metadata Role
  • RDF is essentially an extended layer on top of
    XML and uses a simple data model expressed in
    XML syntax as the basis for a language for
    representing properties of Web resources/collectio
    ns.
  • Resources include images, documents and the
    relationships held between them.
  • RDF provides interoperability between
    applications that exchange information.
  • When XML data is in RDF format, applications can
    understand the data without knowing who sent it.
  • XML points to a resource to scope and uniquely
    identify a set of properties known as the schema.

69
Example of an RDF Model

Sanctioned-by
Www.HHS/HC-gp
RDF subject
RDF Subject
RDF Predicate
RDF Object
70
The Challenge of a Semantic Web Semantic web
languages today
  • Limited semantics
  • Besides RDF there is
  • DAML Darpa Agent Markup Languagehttp//www.daml
    .org/ (OIL)
  • with another under development by the W3C
  • OWL Ontology Web Languagehttp//www.w3.org/2001
    /sw/
  • Reasoning limited to inheritance

71
Topic Maps (TM) Approach
  • One of my favorite examples of a light
    formalization reification attempt -still
    underway
  • TMs are a collection of topics (semantically
    meaningful) their relationships with a
    standardized notation for interchangeably
    representing information about the structure of
    information resources used to define topics
  • Topic Maps link these topics with external
    references objects), such as resources behind
    URLs
  • XTM - XML-based interchange format for topic maps
  • look like a semantic net

Essentially we have a weak semantic model of
indexed document topics
72
Recapitulation Why Coordination
Interoperability is Hard
  • As before groups find it easier to do each part
    themselves
  • at the least the first time, now they are
    legacies
  • Lack of integrating (Enterprise Architecture)
    approaches now mean gaps between RDF etc.
  • No vocabulary scheme is yet comprehensive
  • Semantics of Drugs, lab tests, signs and symptoms
  • Lack of a common metadata models
  • some standards and products are competing for
    this
  • Structure is not coordinated with
    terms/vocabulary
  • Duplication of MD and now rdf language
  • Proprietary interests

73
Many Efforts Need Ontologies for Assimilation
  • Essence of the interoperability problem is
    semantics (e.g. as in a shared conceptual model
    of a particular application domain idea
    knowledge base), not syntax
  • Ontologies provide a vocabulary for representing
    knowledge about a domain and for describing
    specific situations in a domain (tool for
    defining and describing domain-specific
    vocabularies) --- idea language for
    communication
  • For data/knowledge translation and transformation
    (provide a solution to the translation problem
    between different terminologies) for fusion and
    refinement of existing knowledge --- idea
    interoperation
  • As reusable building blocks to build systems that
    solve particular problems in the application
    domain --- idea model reuse

74
Some Appropriate Middle Modeling Steps
  • Recognize the interplay between S N and their
    evolution
  • Push to capture essential semantic relationships
    e.g. express, constrain, validate the
    relations
  • Push to free effort from artificial syntax
    requirements locked into transfer interchange/
    import/export (make it concept based)
  • Support principles of semantic transparency as
    a/the preeminent concern for scruffy needs
  • Make models accessible to and tuned for use by
    the principal domain experts and 'end users' as
    stakeholders
  • Work needs to become sufficiently formal to
    support testing for conceptual integrity

75
Mapping Account Glossary Content
Conceptual Graph Model (Mapable to RDF (Corby et
al, 2000))
Account A customer, usually an institution or
another organization, that purchases a companys
products or services.
AccountCustomer  - ( prototype ) -gt
institution,organization State Customer
-gt(Agent) -gt purchase -gt product/service (
poss) -gt institution
Uses a Class Hierarchy such as Entity Legal
Entity Organization Customer Organization
.
Human action Purchase
76
On the Other hand.
  • Formal Knowledge Engineering hasnt been a clear
    victory
  • KE may need to be a permanent task
  • Continuous Knowledge Engineering is an
    alternative approach to KE that embraces the
    philosophy that knowledge systems are open-ended,
    dynamic artifacts that develop through a learning
    process in reaction to their environment.
  • Implicit knowledge must first be made explicit,
    which is a time-consuming task requiring a
    serious knowledge engineering effort.

77
Opportunities
  • The vast amounts of information with little or no
    structure published over the World- Wide-Web
    raise a host of new, challenging problems for
    data-mining research examples include web
    resource discovery and topic distillation web
    structure/ linkage mining intelligent web
    searching and crawling personalization of web
    content.
  • Knowledge Discovery in Biological Data Management
    Systems and Bioinformatics High-performance data
    mining tools will play a crucial role in the
    analysis of the ever-growing databases of genetic
    sequences accumulated over the course of large
    bioinformatics efforts (e.g., the Human Genome
    Project).

78
Discussion
  • Scruffy vs. Neat Reasoning
  • Knowledge Soup The Chaos and Complexity of the
    Human Mind
  • John F. Sowa http//residentassociates.org/com/So
    up2.htm
  • Neat vs Scruffy A review of Computational Models
    for Spatial Expressions. Amitabha Mukerjee
    Center for Robotics, http//www.cs.albany.edu/ami
    t/review.html
  • J. McCarthy, From Here to Human-Level
    Intelligence, Proceedings of the Fifth
    International Conference on Principles of
    Knowledge Representation and Reasoning (KR'96),
    Cambridge, MA, November 1996, Morgan Kaufmann,
    San Mateo, CA (1996), pp. 640646.
  • Info Access References
  • C. Paice, "Constructing literature abstracts by
    computer Techniques and prospects," Information
    Processing and Management, vol. 26, pp. 171-186,
    1990.
  • SUMMARIST Automated Text Summarization
    http//www.isi.edu/cyl/summarist/summarist.html
  • Church K. and M. Liberman (1991) "A Status Report
    on the ACL/DCI". In Proc. of the 7th Annual
    Conference of the UW Centre for the New OED and
    Text Research Using Corpora, pp. 84-91.

79
Data Models Metadata references
  • Phil Bernstien "Representing and Reasoning About
    Mappings between Domain Models,"  18th National
    Conference on Artificial Intelligence (AAAI
    2002), Edmonton, Canada http//www.cs.washington
    .edu/homes/jayant/Pubs/SemanticsAAAI02.pdf
  • MetaIntegration tools http//www.metaintegration.n
    et/
  • Implementation of MeSH information in the Dublin
    Core Metadata Malet G, Munoz F, Appleyard R,
    Hersh W. J Am Med Inform Assoc 1999
    Mar-Apr6(2)163-72
  • Ontologies Knowledge Models
  • Towards continuous knowledge engineering by Klaas
    Schilstra thesis Delft University of Technology
    http//www.kbs.twi.tudelft.nl/Publications/PhD/200
    2-Schilstra-PhD.html
  • Towards Situated Knowledge Acquisition, Tim
    Menzies, http//www.phil.canterbury.ac.nz/tom_best
    or/etexts/Menzies2020Towards20Situated20Knowle
    dge20Acquisition.htm
  • Formal Ontology, Conceptual Modelling, and
    Knowledge Engineering, http//www.ladseb.pd.cnr.it
    /infor/ontology/Papers/OntologyPapers.html
  • DAML Homepage, http//www.daml.org/

80
XML Registries
  • W3C http//www.wc3.org http//www.wc3.com
  • US Federal CIO Council http//xml.coverpages.org/C
    IO-Council-XML-DevelopersGuidenceVersion1.pdf
  • ASC X12 Reference Model
    http//www.x12.org/x12org/comments/X12Reference_Mo
    del_For_XML_Design.pdf
  • Metadata registries USHIK etc.
    http//www.bls.gov/ore/pdf/st000010.pdf

81
RDF Metadata Registries
  • The Open Metadata Registry Prototypes - Dublin
    Core Metadata Initiative http//wip.dublincore.org
    8080/registry/Registry
  • SCHEMAS Project Registry http//www.schemas-forum.
    org/registry/
  • DESIRE Registry http//www.ukoln.ac.uk 
  • SWAG- WebNS Registry http//webns.net/
  • Xmlns.com Registry http//xmlns.com/
  • ULIS Open Metadata Registry http//avalon.ulis.ac
    .jp/registry/

82
Semantic Web Topic maps
  • W3C Semantic Web Site http//www.w3.org/2001/sw/
  • SemanticWeb.orghttp//www.semanticweb.org/
  • The emerging semantic web Selected papers from
    the first Semantic web working symposium Edited
    by Isabel Cruz, Stefan Decker, Jérôme Euzenat,
    and Deborah McGuinnessVolume 75 in the Frontiers
    in artificial intelligence and applications
    seriesIOS press, Amsterdam (NL), 2002 300pp.,
    hardcover, ISBN 1 58603 255 0 (IOS press)
  • Markup Languages Comparison and Examples
    http//trellis.semanticweb.org/expect/web/semantic
    web/comparison.html
  • Work in progress on Topic Maps http//www.topicmap
    s.net/
Write a Comment
User Comments (0)
About PowerShow.com