Title: SLA Annual Meeting
1SLA Annual Meeting June 8st, 2004 JR Gardner,
Phd., Sun Microsystems Inc.
2Agenda
- Semantic Web Primer a big pair of shoes to
fill... - A bug in my bonnet
- Search for the solution
- Sharing the solution
- Securing the solution
- Q A
3Semantic Web
- Ontologies (to describe the data with varying
degrees of formality) - Classifiers/reasoners (to infer new relationships
from existing assertions) - 'annotation' -- this is a process whereby the
content of a document is annotated with a concept
(or relationship) from an ontology. Also known as
'tagging'. (The annotation doesn't have to be in
the original document.)
4Semantic Web, cont'd.
The goal of the Semantic Web is to develop
enabling standards and technologies designed to
help machines understand more information on the
Web so that they can support richer discovery,
data integration, navigation, and automation of
tasks. With Semantic Web we not only receive more
exact results when searching for information, but
also know when we can integrate information from
different sources, know what information to
compare, and can provide all kinds of automated
services in different domains from future home
and digital libraries to electronic business and
health services Berners-Lee 2001.
5The Vision of how it works
6An information sciences take
Policies Profession- alism
Z39.50 Dublin Core, OPACS
Software Engineering
7What bugs me
- Promises . . . but ...
- No deployable solutions
- Context-specific solutions
- Black box solutions
- Ontology Holy Grails
- Semantic schemantics
- Too many flavors and brands of everything
8Epistemology for Etymology with Entomology-the
Business of Beetle Bugs fear and loathing
Beetle
Bug
might mean
Meanings
surveillance
error/fault
Also called
Also called
creepy-crawler
spy
Also called
feature
Alias/AKA
bad ambiguity
VW
Coleoptera
SunRay
Names
Is a
Is a
Is a
security
insect
car
product
Concepts
9Ontology is not enough (or can be too much)
AIAG Automotive Ind. Action Grp.
STAR Std. For Tech. For Auto Retail
swoRDFish Sun RDF product/svc. Ontology
Meanings
surveillance
error/fault
creepy-crawler
spy
feature
Alias/AKA
Coleoptera
VW
SunRay
Names
security
insect
car
product
Concepts
10Ummmmm......
Semantic Web
Babel?
11What bugs the system-
- Promises . . . but ...
- No deployable solutions (cost, support)
- Context-specific solutions (my sandbox or the
world?) - Black box solutions (can I change it, grow with
it, enhance, integrate, expand?) - Ontology Holy Grails (what of plethora?)
- Semantic schemantics (format freakout)
- Too many flavors and brands of everything
12Ontology solutions can limit, not solve ...
AIAG Automovtive Ind. Action Grp.
STAR Std. For Tech. For Auto Retail
swoRDFish Sun RDF product/svc. Ontology
Meanings
surveillance
error/fault
creepy-crawler
spy
feature
Alias/AKA
Coleoptera
VW
SunRay
Names
product
security
insect
car
Concepts
13What if we start with a search engine?
- It should possess ability to ingest ontologies
- Take advantage of ontological rigor yet employ
soft reasoning/brain-type associations - Needs contextual search capabilities with passage
retrieval so I know what I'm getting - Integrated ID management context is everything
- Ease of integration with knowledge tools
- Ease of deployment (no clients required)
- Mature and well-supported platform
14Well, search is fine, but libraries have always
had this ... what of MARC? Z39.50?
- Information Sciences, especially the special
libraries are on the brink of a MARC/RLIN-like
revolution - Z39.50 is no longer confined to ASN.1 XML and
Web services are brought to bear - The best of MARC (in MarcXml), Dublin Core, and
ISO23950 - Simplified, smaller scale, smaller footprint
less hardware, software less - New possibilities as global information hub
- OPACs on steriods
15SRW/SRU new Info-sci Discovery-speak
- SRW Search/Retrieve Web Service
- A note on Web Services
- Lighter, nimbler, more open integration
- SRU Search/Retrieve Using URLs
- Even lighter, palm-sized subscribable
- Z39.50 concepts intact
- Result Sets
- Abstract Access points
- Abstract Record schemas
- Explain (simplified!!)
- Diagnostics
16Look familiar?
01173cam 2200289 a 4500
1668735
19981110163141.5
tag"008"980227s1998 njua b 001 0
eng " ind2" " (DLC)
98004504 tag"906" ind1" " ind2" " code"a"7 code"b"cbu code"c"orignew code"d"1 code"e"ocip code"f"19 code"g"y-gencatlg
... and so on, remainder snipped. This is SRW
SRU is even slimmer, removing SOAP (Web Service)
wrapper and presenting single record.
17The next phase of the revolution......
NOW...
Then...
SRW/U, WS, Inference Search with the Semantic
Web
Z39.50, MARC EDI Changed Info and
Economic Commerce
18Back to search, and about NOVA
NOVA is a language-independent, semantic search
engine that can be trained on any ontology to
yield custom classification of any data form.
Based on 9 years of research in SunLabs, it
employs a patented, unique passage retrieval and
semantic analysis algorithm for superior
inference with automatic stemming and Natural
Language Processing procedures to infer matches
on context and proximity. Cf. April, 2004 ACM
Queue Cover Article on NOVA Searching versus
Finding Why Systems Need Knowledge to Find What
You Really Want
19NOVA's Semantic Inferencing/Clustering
NOVA enables phrasing and predicate matching for
language structures. These variants can, in
turn, be browsed as a component of a given search
result set, with clustering
20NOVA's Semantic Inferencing
Results can then be browsed in context, with each
semantic node revealing relative number of hits.
Upon selection of the occurrences, a jump to
passage feature is dynamically inserted in the
target enabling quick review and assessment of a
given result set.
21RDF and NOVA Integration
RDF Parse
S1 Portal Search Robot
Robot uses in crawl
1. Updates taxonomy dynamically
2. Robot metadata auto-classification Rules
Sun Portal Server NOVA
NOVA Rules stored as RDFS
22Taxonomy Introspection Mapping
XSLT conversion to Search Database Records (RDs),
include UIDs
XSLT conversion to Search Database Records (RDs),
include UIDs
23NCI Oncology
24(No Transcript)
25Integrated, Searchable Weighted Annotation
26Browse Annotation weighting and Subscription,
Access-controlled
After indexing, searches use the GO/NCI links
271. Aggregative loading of taxonomies (
creation, RDF for mgmt)
2. Iterative inferencing builds core associative
knowledge base, export to RDF compilable
3. Apply across applications, taxonomy systems
for a VIB ontology, with access management and
interest subscription
28Identity-Enabled Secure Knowledge Network
Information in various representations
CAD Applications
DSAtlas Data Info. Acecss
Ontologies, Dictionaries
OPACs
Identity Attributes Functions
29Applying SemWeb Tools
30Annotation
- Definition Ability to attach a comment with
weighting to a specific item and to share it with
other users - Annotation is done through search result sets,
other search tools' data, federated with NOVA,
can be annotated and subscribed to, weighted,
access mng'd - Tools are supported out-of-the-box with Java
Enterprise System and NOVA search - API's in C and Java enable easy federation of
diverse data sources, including web service'd
JDBC calls - Annotation, combined with browsing, can allow
subscription to different and merged taxonomies
31(No Transcript)
32Dynamic Meta-markup in NOVA using KAIN or COHSE
Based on COHSE, KAIN uses proxy and an ontology
to auto-markup pages with non-intrusive link-lets
(little orange icons inline with text), flagging
terms in a given ontology (DAMLOIL, OWL, RDF,
etc.), rollover gives highlight of matched term,
original site layout untouched.
33What is KAIN?
KAIN is Suns tool for Knowledge Assisted
Intranet Navigation, based on COHSE (Conceptual
Open Hypermedia) project from the Univ
Manchester. The tool provides users with an
enhanced web navigation and browsing experience.
Webpages are automatically marked up - virtually
only to link users to all material relevant to
the context of what was queried. KAIN has not
been productized yet.
34Integrated Metadata with any Query or URI
accept
edit
35Additional info on tools
- NOVA search, together with your distributed and
managed content leverages taxonomies and browse
for annotation, group/individual description,
aggregation (e.g., NOVA search and management of
annotations themselves) - Working with RDF via IFC-Host Java Libraries
enables native RDF integration with NOVA which
federates services and other data/search sources - Extend beyond the immediate collaborative
consortia via Z39.50-based SRW/SRU (LOC ifx now
available) - Harvest taxonomies and infer HourGlass view of
new collaboration areas - StarOffice and OpenOffice render all content in
XML, presentations in SVG, export XHTML and
inline XSLT conversions with WebDAV support
36Re-Cap
- Immediate Results on Semantic Web Scale
- NOVAs out-of-the-box ability to ingest
ontologies - Taxonomy introspection and mapping ability to
establish relationships amongst taxonomies - Immediate access available to authorized
personnel (internal external) - NOVA provides semantic inferencing
- Ease of integration with knowledge tools
- All components are interoperable, addressable by
multiple APIs, with WSDL, RSS, XML/XSLT all of
which in turn are identity/auth managed - Ability to manage who sees what, target things to
who most wants to see which
37Contact for Resources
John.Robert.Gardner_at_sun.com
38ebXML
- ebXML Registry/Repository (sourceforge.net) from
Sun manages services, visual taxonomy browsing - Used internnal at Sun for managing corporate Web
Services Respository - Enables services (e.g., database reports, etc.)
to be managed according to a variety of
taxonomies - Provides graphical browsable display of
taxonomies - Part of forthcoming Sun product line
39Taxonomy/WS/KM Management with ebXML R/R