Title: Topic Maps the GPS of the Web
1Topic Maps the GPS of the Web
2Outline
- Overview
- XTM
- Case Study 1
- Case Study 2
- Case Study 3
- Topic Map Software
- Topic Map Visualization
3Overview
- Knowledge is becoming the key asset of
organizations, but we find ourselves drowning in
excesses of information. Given such a situation,
how do we - Find that needle of relevant information in the
haystack of infosmog - Capture and manage precious corporate memory
- Build a bridge between knowledge and the
information - Topic map can help with all three
- Revolutionize the ways in which we search for and
navigate information - Can model and represent knowledge in an
interchangeable form - Provide a unifying framework for representing
knowledge and linking it with the information
resources in which it is embodied
4Overview (Cont.)
- Topic Maps (TM) is a technology to address the
issue of semantically characterizing and
categorizing documents and sections of documents
on the Web w.r.t their content in other words,
what topics or subject areas those documents
actually address - A TM act as a set of linked topics that index a
document collection - One can have multiple TMs indexing the same Web
document collections - TM can be viewed as information overlays on
documents or arbitrary information resources - TM acts as taxonomiesways of describing,
classifying, and indexing an information space
consisting of Web and non-Web objects
5Overview (Cont.)
- Topic maps have a lot in common with the semantic
networks used to represent knowledge in the field
of artificial intelligence (topics and
association). - But they also add a new axis to the model of
semantic networks -- that of occurrences -- that
provides a bridge to the domain of information
management. - Knowledge management information management
- Knowing a thing versus simply having information
about that thing
6Overview (Cont.)
- The entities involved in an organization (people,
roles, products, etc.) can be represented as
topics (T) - The complex and shifting relationships between
those entities can be represented as associations
(A) - The documentation and other information resources
that relate to them can be represented as
occurrences (O).
7An Opera Index
8Features of Index
- Typographical conventions are used to distinguish
between different types of topic (names of operas
are shown in italic) - Typographical conventions are used to distinguish
between different types of occurrence (references
to synopses are shown in bold) - The use of see references handles synonyms by
allowing multiple points of entry (by different
names) to the same topic - See also references point to associated topics
- Subentries provide an alternative mechanism for
pointing out associations between different
topics (e.g. between a composer and his works)
9Features of Index (Cont.)
- A book may contain multiple indexes
- Homonyms can be distinguished through the use of
explanatory labels following the names, e.g.
Tosca (opera) and Tosca (character) - The locators (page numbers) may contain modifiers
that help distinguish between different types of
occurrence, for example 54n for a footnote on
page 54 - The nature of an occurrence (i.e., the way in
which the information is pertinent to its
subject) might also be shown using a subentry
mechanism (clause, defined in, defined in
glossary, used in production)
10Topic Map Standards
- TM began in the pre-XML and pre-WWW era SGML
- TM today ISO 13250 is specified in terms of two
different interchange syntaxes - One based on an SGML DTD (that used the ISO19744
HyperTime) - XML TM syntax XTM
- 19 XTM elements
11Components of the TM Standard
- SAM defines the formal data model of TM and its
semantics in natural language - Reference Model a mode abstract model of TM than
SAM and to enable TM to semantically interoperate
with other knowledge representation formalisms
and Semantic Web ontology languages - TMQL SQL-like language for query topic map
information - TMCL a database schema like capability to TM
enabling constraints on the meaning to be defined
for TM
12Components of the TM Standard (Cont.)
- The products of the OASIS technical committees
are intended to be layered onto the ISO13250
standard's products - The Published Subjects Technical Committee will
define and manage published subjects, and
establish usage requirements for these - The XML vocabulary TC will define the vocabulary
to enable TM to interact with existing and
emerging XML standards and technologies - The vocabulary will be defined as published
subjects - The GC TC will define geographical country,
region, and language-based published subjects to
ensure interoperability across geographical and
linguistic boundaries
13TM Concepts Topic
- Any distinct subject of interest for which
assertions can be made (nearly everything in TM
can become a topic) - A topic is a representation of the subject
- A topic reifies a subject
- According to XTM, a topic acts as a resource that
is a proxy (information representation) for the
subject - Each topic need a base name that reflects the
intent - ltbaseNamegt -- ltbaseNameStringgt
- What is a tomato???
- ltoccurrencegt -- ltresourceRefgt
- The base name string works fine for human, but
maybe machines need some help ? need to say what
the subject of my topic is more precisely
tomato1.xml
tomato2.xml
14Topics
15TM Concepts Occurrence
- A topic may be linked to one or more information
resources that are deemed to be relevant to the
topic in some way. Such resources are called
occurrences of the topic - Addressable (URI) ltresourceRefgt
- Not addressable and has a data value specified
inline ltresourceDatagt
16TM Concepts Subject
- Give the topic an identity that both machines and
humans can understand - Use a Published Subject Indicators (PSIs)
- ltsubjectIdentitygt -- ltsubjectIndicatorRefgt
- The way the subject of a topic is referred to is
by having the topic point to a resource that
express the subject - The subject of the topic is represented by an
occurrence of a resource, and it is the nature of
that resource that determines the addressability
of the subject - resourceRef it constitutes the subject and is
addressable - subjectIndicatorRef it indicates the subject,
not directly addressable
tomato3.xml
17TM Concepts Subject (Cont.)
- A subject indicator is just a way of indicating
subjects - Topics are really the information representation
of subjects - A subject is indicated by defining a resource
- If two given topics use the same resource, then
their subjects (identified or indicated by those
resources) are identical - XTM allows for a published subject indicator
(PSI) - A published subject is simply a subject that has
general definition and usage and is identified by
a specific published reference
18TM Concepts Scope
- A ltbaseNamegt might be required to make sense to
at least some humans - Use ltscopegt to choose an appropriate ltbaseNamegt
- Default ltscopegt
- Depend on XTM application to choose
- lttopicRefgt -- point to an lttopicgt element that in
turn has a subject - Make the topic easier to read and write
- Make the topic map easier to maintain
- Occurrences can also be of different types,
specified by the topicRef markup
tomato4.xml
19Scopes
20TM Concepts Association
- An association is the relationship between (one
or more) topics - ltassociationgt, ltmembergt, ltroleSpecgt
- An association is similar to the database notion
of a relation, to the ontology notion of a
predicate - An association role specifies how a particular
topic acts as a member of an association, its
manner of playing in that association - tomato5.xml, tomato6.xml
21Associations
22TM Concepts InstanceOf
tomato7.xml
- Topics can be categorized according to their kind
- Any given topic is an instance of zero or more
topic types - Topic types are themselves defined as topics by
the standard - Usage
- We can ask the topic map for all the dishes that
have tomatoes as ingredients - We can ask the topic map for all the desserts
23TM Concepts variant (Cont.)
- Give a topic a variant name under a certain
situation - ltvariantgt, ltparametersgt, ltvariantNamegt,
ltresourceDatagt - ltresourceDatagt is a shortcut for ltresourceRefgt
- It would be foolish to have to create a file and
a URI for every tiny piece of text in the whole
topic map, so with ltresourceDatagt we allow text
to be entered into the topic map document
directly - tomato8.xml
24TM Concepts mergeMap
- The ltmergeMapgt element makes two or more topic
maps to merge - ltmenugt -- ltrecipegt -- ltpricegt
- Merge strategy
- All topics with the same name in the same scope
are merged (a name-based merge) - All topics with the same subject identity are
merged (a subject-based merge) - mergeMap enables the interchange of knowledge
25Summary of TM
- Topic maps consist mainly of topics and
associations - A topic map is an overlay on information
resources occurrence - A topic is a stand-in, proxy, or surrogate for a
subject PSI - Topics have characteristics (names, occurrences,
and roles played in association) - The author controls the meaning of a topic map
through topic characteristics and choices of
subject - Scopes in topic maps define the validity of
associations and allow fine-tuning of merge
operations
26Summary of TM (Cont.)
- ltbaseNamegt
- ltbaseNameStringgt
- ltoccurrencegt
- ltresourceRefgt
- ltscopegt
- ltsubjectIdentitygt
- ltsubjectIndicatorRefgt
- lttopicgt
- lttopicRefgt
- ltassociationgt
- ltinstanceOfgt
- ltmembergt
- ltroleSpecgt
- ltmergeMapgt
- ltparametersgt
- ltresourceDatagt
- ltvariantgt
- ltvariantNamegt
- lttopicMapgt
27TM Resources
- Entry points
- http//www.topicmaps.org/
- http//www.topicmaps.net/
- http//www.oasis-open.org/cover/topicMaps.html
- Sites for TM vendors and service providers
- http//www.infoloom.com/
- http//www.ontopia.net/
- http//www.semantext.com/
- http//www.empolis.com/
- http//www.cogx.com/
- http//globalwisdom.org/
28Case Study 1Topic Maps in the Life Science
- Chapter 8 of XML Topic MapsCreating and Using
Topic Maps for the Web
29Overview
- Create a design for a new Web site that would
allow learners all over the world to participate
in the collection and representation of knowledge
about the life science - Develop a series of topic maps that will allow us
to represent and navigate a large knowledge space - If we are going to build a Web site where lots of
different information can be captured using topic
maps, it must begin with a knowledge structure
that allow us to classify all related things
30Linnaean Classification of Humans
31The Five Kingdoms
- Kingdom ? Phylum ? Subphylum ? Class ? Subclass ?
Infraclass ? Order ? Suborder ? Superfamily ?
Family ? Genus ? Species
32Some of the Phyla for the Animalia Kingdom
33The Chordata Phylum
34Creating Topic Maps for A Web Site
- User navigation
- Start with the big picture
- Drill down to more detail by selecting topic maps
that are referenced as occurrences of some topic
in the visible topic map - Drill-down scheme
- A more detailed topic map is referenced as an
occurrence of a particular topic in a less
detailed topic map - Developing the XTM Document
- Bottom-up
- Animalia TM ? FiveKingdoms TM
35The Top-Level Topic Map
36FiveKingdoms TM Pointing to the Animalia TM
37Steps
- Create a shell for the Animalia topic map
- Create a shell for the FiveKingdoms topic map
- Create the TopicMap topic
- Since one of its occurrences will be an instance
of a topic map, the bottom-up design approach
suggests that we first define the TopicMap topic
with a PSI - Create the AnimaliaTopicMap topic
38Steps (Count.)
- Create the Animalia topic
- Construct the topic that will server as a
container for one or more occurrences of type
TopicMap and for other associated information - With this topic, we are now able to construct an
occurrence that links the topic Animalia with the
topic map Animalia - Define an occurrence
- Select the Animalia topic and create a new
occurrence - Set the new occurrence as an instance of
AnimaliaTopicMap - Select the Topic Map Occurrence ? Animalia topic
map
39Creating the new Animalia Topic Map
40Creating the new FiveKingdoms Topic Map
41Creating the new TopicMap Topic
42Creating the new AnimaliaTopicMap Topic
43Creating the new Animalia Topic
44Setting the InstanceOf Parameter in the New
Occurrence Editor Window
45Selecting a Topic Map
46Where are We Now?
- We create two topic maps
- Animalia
- FiveKingdoms
- Within the FiveKingdoms topic maps, we create
three topics - TopicMap
- AnimaliaTopicMap
- Animalia
- Create an occurrence, which offers the topic map
Animalia as its resource reference
47What's Next?
- Creating and maintaining Enterprise Web Sites
with Topic Maps and XSLT - Chapter 9 of XML Topic Maps Creating and Using
Topic Maps for the Web - The Cogitative Topic Map Web sites (CTW)
- Topic map source code (markup) that control Web
site content and site maps - XSLT stylesheets that control Web page layout and
look-and-feel style - The whole Web universe of resources referenced by
XTM topic ltoccurrencegt resource locators
48Case Study 2Linking Clinical Data Using XML
Topic Maps
- R. Schweiger, S. Hoelzer, D. Rudolf, J. Rieger,
J. Dudeck - Artificial Intelligence in Medicine 28
(2003)105-115
49Introduction
- Develop a search engine that allows indexing,
searching and linking different kinds of clinical
data - Text matching methods fail to represent implicit
relationships between data, e.g. HIV ??AIDS - Topic maps provides a data model that allows
representing arbitrary relationships between
resources. - Such relationships form the basis for a context
sensitive search and accurate search results - Relationships between the data are often hard
wired in the application logic. XML, on the other
hand, allows representing such relationships in a
more flexible way.
50Representing Relational Knowledge using TM
- Text matching relates search terms to resources
and have limitations to identify relationships
between the terms - Search results are often inaccurate
- Need a data model that can represent arbitrary
relationships between terms and other resources ?
topic map - AIDS phobia AIDS ??phobia
- AIDS phobia ??F45.2 (somatoform disorder)
- HIV ??AIDS
51TM Relating "HIV" to "AIDS", and "AIDS" to
"image.gif"
TM enables a machine to reason that the term
HIV is indirectly related to the resource
52"Human-immunodeficiency-virus" represents the
same concept as "HIV"
ltassociation id"F45.2"gt ltmembergt
lttopicRef xlinkhref"AIDS-phobia" /gt
lt/membergtltassociationgt
53A few words about TM
- The most significant characteristic of a lttopicgt
is the identifier, which specifies the topic as a
fragment of the topic map and which allows to
address the topic - Topics have some implied meaning, also referred
to as subject, that might be described somewhere
else. The act of relating a topic to a subject is
called reification - In the long run, standards bodies will create
so-called public subject indicators (PSIs), i.e.
standard topics that are reused by topic map
designers all over the world. - Topic maps can also map between different topics
representing the same subject ltsubjectIdentitygt
54Context Sensitive Searching
- TM provides a flexible data model for
representing arbitrary relationships between
resources - Need an inference method that use the given
relationships - Concept a meaningful relationship of terms
- Context sensitive searching ? concept macthing
- Association phase finds a set of concepts that
relate the search terms meaningfully with each
other - Occurrence phase relates the resulting concepts
to resources such as documents and images
55Text Matching VS. Concept Matching
56Context Sensitive Searching (Cont.)
- Concept matching allows a machine to understand
precise queries and to produce accurate results. - Search context A given set of terms
- Concept matching aims to find a context in a
resource that relates the search terms
meaningfully with each other, i.e. a resource
context that matches the search context.
57Semantic Linking
- TM can representclass-instance relationships
- As a result, we can categorize data and
relationships between data - Classified relationships, i.e. semantic links can
be used to define customized search pathways
58Semantic Network Linking ICD and DRG
- Enable a physician to enter diagnoses and codes
and to find related information. - Encode the semantic network using TM
- Search target "DRG"
- icdTitle?(synonym) ? icdTitle ? (has) ?
icdCode ? (2drg) ? drgCode ? (drg) ?
drgUri.
- ICD International Classification of diseases
- DRG diagnosis related groups
59LuMrix Search for Diagnoses and Codes
60Lurmix Search for Drug Information
61Case Study 3Navigation and Interaction in
Medical Knowledge Spaces Using Topic Maps
- J. Beier, and T. Tesche
- International Congress Series 1230 (2001)pp.
384-388
62Introduction
- The medical occupation requires a widespread and
up-to-date access to various information sources - Text books, journals, guidelines, medical indexes
(PubMed/Medline), selected internet sites, news
groups, colleagues and medical experts - An efficient use of these information resources
in clinical routine is hampered by the
heterogeneity and spatial distribution of the
data, implying the necessity to utilize different
retrieval techniques (libraries, telephone,
internet). - Using these conventional techniques, a
comprehensive research requires a considerable
amount of time, whichunder most circumstancesis
not available
63Introduction (Cont.)
- Limitation of medical search engines
- Internet search engines do not make use of a
medical thesaurus but perform simple Boolean text
pattern matching. - Internet search engines collect each web page,
disregarding its medical or non-medical content. - The search string remains in the language it was
entered. A translation to other languages of
interest is usually not performed. - In most retrieval tools, the context of the
search (scope, aims) is not regarded and has to
be expressed by the user specifying additional
search terms. - The vector space model most search engines use,
calculates a hit ranking order based on word
frequencies at document level. Unfortunately, an
individual weighting of the specified search
terms themselves is not possible.
64Introduction (Cont.)
- Proposed method
- A knowledge-guided user front-end and an
automatic generation of search engine queries - The medical knowledge of MeSH (Medical Subject
Headings) classification was transferred into a
Topic Map - The knowledge contained at each topic is utilized
to control the search for documents or other
information related to the query - Enables an interactive navigation through topics
of the medical (or another) domain. - A graphical user-interface allows the fast and
associative browsing in networks of themes.
65Materials and Methods
- Topic Maps
- Topics (themes)
- Associations
- Occurrence TM connects nodes to related
documents, images or other data - For the developed IRS, special TM associations
were chosen is-subclass-of, issuperclass-of,
has-synonyms, has-preferred-term, is-related-to,
is-definition, is-scope-note. - MeSH classification (http//www.nlm.nih.gov/mesh)
- MeSH is the most used controlled vocabulary for
document indexing - http//www.nlm.nih.gov/databases/freemedl.html
66Topic Map
Topic Maps combine a knowledge representation
with information resources
67Proposed System
- Two-level architecture
- Knowledge organized by TM topics, associations
and occurrences - Documents described by their metadata and full
text. - User interface
- Navigation area for topic maps
- Description area of the selected topic
- Cards for document categories Guidelines,
Lexica, EBM, Journals - Hit list
68Steps TM
- After entering a search string, the IRS
determines a list of one or more topics that
refer to this query concerning its title,
synonyms and annotation in German/English - According to the desired scope, the user chooses
a topic and navigation within the topic map is
started - Ongoing from that starting point, the user
interactively and graphically navigates through
the network of themes, determining the topics
that optimally fit to his or her demands. - For the currently selected topic, additional
information is displayed (MeSH code, definition
and annotations, synonyms, translation). - Using these topic commentaries, the system
on-the-fly automatically generates a search
query, expanding the search string with topic
name, synonyms, translations, and definition. - Pre-defined weighting factors at search string
level control the impact of these different word
groups for ranking the hits.
69Steps Documents
- The document search space was divided into
several categories (e.g. AHCPR guidelines,
journals, selected internet sites). - The categories were defined, extended and
customized the systems administrator. - The resulting hits are displayed in a manner
known from internet search engines and are
grouped according to their category
70User Interface
71Topic Map Software
- Chapter 10 of XML Topic MapsCreating and Using
Topic Maps for the Web
72Commercial Sources of TM Software
- Empolis http//k42.empolis.co.uk
- InfoLoom http//www.infoloom.com
- Mondeca http//www.mondeca.com
- Ontopia http//www.ontopia.net
73Open Source TM Projects
- SemanText
- Construct, browse, and write rules and perform
inference rules on topic maps - Python based
- TM4J (http//tm4j.org/)
- XTM Programming
- A set of Java APIs for parsing, manipulating, and
writing XTM
74Open Source TM Projects (Cont.)
- Nexist (http//nexist.sourceforge.net)
- Java based XTM application
- Persistent Store HypersonicSQL (HSQL)
(http//sourceforge.net/projects/hsqldb) - GooseWorks (http//www.goose-works.org)
- Apache-licensed implementation of the graph-based
data model for topic maps - C/Python API
- TM for Javascript (TM4jscript)
75Topic Map Visualization
- Chapter 11 of XML Topic MapsCreating and Using
Topic Maps for the Web
76Requirement for TM Visualization
77Overview
- TM provide a bridge between the domains of
knowledge representation and information
management - TM may be very large ? need a intuitive visual
user interface to reduce users' cognitive load - Different uses for Topic Maps
- If the user has a specific question ? query
language (does not require visualization) - Consider the relationships among objects ? more
precisely - If the user wants to simply explore a Web site, a
TM can provide an overview so the user can decide
where to start the exploration (require
visualization)
78Requirement for TM Visualization (Cont.)
- Two kinds of requirements for retrieving
information - Representation help users identify interesting
sources - Navigation help users access information rapidly
- Both representation and navigation are essential
in a good visualization - "the visual information-seeking mantra is
overview first, zoom and filter, then details
on-demand"
79Representation Requirement
- Represent the whole topic map to help users
understand it globally - The overview should reflect the main properties
of the structure - Users should be able to focus on any part of the
topic map and see all the dimensions they need - Require the use of different levels of detail
(generality/specificity) - The position of topics on the visual display
should reflect their semantic proximity - Show all characteristics (topics, associations,
scope) - The representation should be updated in real time
to enable user interaction
80Navigation Requirement
- Navigation needs to be intuitive
- Free navigation should be kept for small
structures or expert users - Beginners prefer predefined navigation paths
- Expert users should be allowed to explore the
structure freely
81Visualization Techniques
82Current TM Visualizations
- Most of them display lists or indexes from which
users can select a topic and see related
information - Convenient when users' needs are clearly
identified - Usually the same as that on Web sites users
click on a link to open a new topic or
association - Examples
- Ontopia Navigator (Omnigator)
- Empolis K42 application hyperbolic tree
- Mondeca's Topic Navigator graph representation
- UNIVIT 3D interactive TM visualization
83Omnigator
84Empolis K42 StarTreeView
853D Interactive Topic Map Visualization with UNIVIT
86General Visualization Techniques Graphs and
Trees
- TM can be seen as a network of topics ? network,
graph - Graphs and trees are suitable for representing
the global structure of topic maps - Hyperbolic geometry allows the display of a very
large number of nodes in a graph (efficient node
positioning) - Topics linked together by an association can be
represented close to each other - Topics of the same type or pointing to the same
occurrences can be clustered - Can represent the whole TM ? may become cluttered
rapidly as the number of topics and associations
increases
87Example of a graph in 3D hyperbolic space
88Graphs and Trees (Cont.)
- Different shapes and colors can be used to
symbolize various dimensions of nodes and arcs - The number of different shapes, colors, icons,
and textures is limited - Not suited for a TM containing millions of topics
and associations
GraphVisualizer 3D
89General Visualization Techniques Maps
- ET-Maps Internet home page categorization and
searches - Relative importance of each page according to the
size of the corresponding zone - May be used to represent topics and associations
- ThemeScape
- Topographical maps with mountains and valleys
- Documents with similar content are placed closer
together - Peaks appear where there is a concentration of
documents about a similar topic (height of peaks) - Valleys contain fewer documents and more unique
content - Topic labels reflect the major two or three
related topics - Different levels of detail
90ET-Map
91ThemeScape
92General Visualization Techniques Virtual Words
- City metaphor
- A topic map a city
- Topics buildings (characteristics name, color,
height) - Association streets, bridges
- Topics and associations related to the same scope
can belong to the same neighborhood - Multiple dimensions of a topic map can be
represented with this technique - Navigate freely/guided tour walk/fly
93Example of a Virtual City
94Virtual City and A 2D Map
Occurrences and associated topics are displayed
in the bottom windows