Title: Principles%20and%20pragmatics%20of%20a%20Semantic%20Culture%20Web
1Principles and pragmatics of a Semantic Culture
Web
Tearing down walls and Building bridges
2Overview
- Virtual collections and Semantic Web
- Semantic collection-search demonstrator
- For cultural heritage objects
- Metadata vocabulary representation and
enrichment - Principles for knowledge engineering on the Web
3Acknowledgements
- Part of large Dutch knowledge-economy project
MultimediaN - Partners VU, CWI, UvA, DEN,ICN
- People
- Alia Amin, Lora Aroyo, Mark van Assem,
Victor de Boer, Lynda Hardman, Michiel
Hildebrand, Laura Hollink, Marco de Niet, Borys
Omelayenko, Marie-France van Orsouw, Jacco van
Ossenbruggen, Guus Schreiber Jos Taekema,
Annemiek Teesing, Anna Tordai, Jan Wielemaker,
Bob Wielinga - Artchive.com, Rijksmuseum Amsterdam, Dutch
ethnology musea (Amsterdam, Leiden), National
Library (Bibliopolis)
4Hypothesis
- Semantic Web technology is in particular useful
in knowledge-rich domains - or formulated differently
- If we cannot show added value in knowledge-rich
domains, then it may have no value at all
5The Web resources and links
Web link
URL
URL
6The Semantic Web typed resources and links
Painting Woman with hat SFMOMA
Dublin Core creator
ULAN Henri Matisse
Web link
URL
URL
7(No Transcript)
8(No Transcript)
9(No Transcript)
10Principle 1 semantic annotation
- Description of web objects with concepts from a
shared vocabulary
11Principle 2 semantic search
Query Paris
- Search for objects which are linked via concepts
(semantic link) - Use the type of semantic link to provide
meaningful presentation of the search results
Paris
PartOf
Montmartre
12The myth of a unified vocabulary
- In large virtual collections there are always
multiple vocabularies - In multiple languages
- Every vocabulary has its own perspective
- You cant just merge them
- But you can use vocabularies jointly by defining
a limited set of links - Vocabulary alignment
- It is surprising what you can do with just a few
links
13Principle 3 vocabulary alignment
Tokugawa
14A link between two thesauri
15Levels of interoperability
- Syntactic interoperability
- using data formats that you can share
- XML family is the preferred option
- Semantic interoperability
- How to share meaning / concepts
- Technology for finding and representing semantic
links
16(No Transcript)
17Distributed vs. centralized collection data
- Minimal requirement collection object has image
URI - Preference for external metadata, accessed
through protocol such as OAI - In practice, external metadata access is still
cumbersome
18http//e-culture.multimedian.nl/demo/search
19Search strategies
- Basic search keyword-oriented
- Advanced search
- Tweaking default search parameters
- Time-related queries
- Faceted search
- Relation search
- How are two URIs related?
20(No Transcript)
21Keyword search with semantic clustering
- Btree of literals plus Porter stem and metaphone
index - Find resources with matching labels
- Default resources are Works
- Find related resources by one-way graph traversal
- owlinverseOf is used
- Threshold used for constraining search
- Cluster results (group instances)
22(No Transcript)
23(No Transcript)
24(No Transcript)
25Search WordNet patterns that increase recall
without sacrificing precisions
26Term disambiguation is key issue in semantic
search
- Post-query
- Sort search results based on different meanings
of the search term - Mimics Google-type search
- Pre-query
- Ask user to disambiguate by displaying list of
possible meanings - Interface is more complex, but more search
functionality can be offered
27Faceted search
- Use Dublin Core scheme to formulate complex
queries - Navigate through relevant metadata
28Faceted search
Faceted search
29What do you need to do to make your collection
part of a Semantic Culture Web?
30From metadata to semantic metadata
1. Make vocabulary interoperable
2. Align metadata schema
3. Enrich metadata
4. Align vocabulary
31Activity 1 syntactic vocabulary interoperability
- Making vocabularies available in the Web standard
RDF - Many organizations already do this
- W3C provides the SKOS template to make this
almost straightforward - Effort required at most a few days
32(No Transcript)
33Multi-lingual labels for concepts
34Semantic relationbroader and narrower
- No subclass semantics assumed!
35(No Transcript)
36Activity 2 aligning the metadata schema
- Specify your collection metadata scheme as a
specialization of Dublin Core - With RDF/OWL this is easy/trivial!
- Cf. DC Application Profiles
37Aligning VRA with Dublin Core
- VRA is specialization of Dublin Core for visual
resources - VRA properties material.medium and
material.support are specializations of Dublin
Core property format - vramaterial.medium rdfssubPropertyOf dcfotmat
. - vramaterial.medium rdfssubPropertyOf dcformat .
38Activity 3 enriching the metadata
- Extracting additional concepts from an annotation
- Matching the string Paris to a vocabulary term
- Information-extraction techniques exists (and
continue to be developed) - Effort required can be up to a few weeks
- The more concepts, the better, but no need to be
perfect!
39Example textual annotation
40Resulting semantic annotation (rendered as HTML
with RDFa)
41RDFa embedding RDF in (X)HTML
42Activity 4 aligning the vocabulary
- Find semantic links between vocabulary links
- Derain (ULAN) related-to Fauve (AAT))
- Automatic techniques exists, but performance
varies - Often combination of automatic and manual
alignment - Effort strongly dependent on vocabularies
- But a little semantic goes a long way (Hendler)
43Learning alignments
- Learning relations between art styles in AAT and
artists in ULAN through NLP of art historic texts - Who are Impressionist painters?
44Extracting additional knowledge from scope notes
45Principles for knowledge engineering on the Web
46Principle 1 Be modest!
- Ontology engineers should refrain from developing
their own idiosyncratic ontologies - Instead, they should make the available rich
vocabularies, thesauri and databases available in
web format - Initially, only add the originally intended
semantics
47Principle 2 Think large!
Doug Lenat
"Once you have a truly massive amount of
information integrated as knowledge, then the
human-software system will be superhuman, in the
same sense that mankind with writing is
superhuman compared to mankind before writing."
48Principle 3 Develop and use patterns!
- Dont try to be (too) creative
- Ontology engineering should not be an art but a
discipline - Patterns play a key role in methodology for
ontology engineering - See for example patterns developed by the W3C
Semantic Web Best Practices group - http//www.w3.org/2001/sw/BestPractices/
- SKOS can also be considered a pattern
49Principle 4 Dont recreate, but enrich and align
- Techniques
- Learning ontology relations/mappings
- Semantic analysis, e.g. OntoClean
- Processing of scope notes in thesauri
50Principle 5 Beware of ontologicalover-commitment
!
51Principle 6 Specifying a data model in OWL does
ot make it an ontology!
- Papers about your own idiosyncratic university
ontology should be rejected at SW conferences - The qality of an ontology does not depend on the
number of OWL constrcts sed
52Principle 7 Required level of formal semantics
depends on the domain!
- In our semantic search we use three OWL
constructs - owlsameAs, owlTransitiveProperty,
owlSymmetricProperty - But cultural heritage has is very different from
medicine and bioinformatics - Dont over-generalize on requirements for e.g.
OWL
53Perspectives
- Basic Semantic Web technology is ready for
deployment - Research themes
- Scalability, vocabulary alignment, metadata
extraction - Web 2.0 facilities fit well
- Involving community experts in annotation
- Personalization
- Social barriers have to be overcome!