Title: Controlled%20Vocabularies%20in%20TELPlus
1Controlled Vocabularies in TELPlus
- Antoine ISAAC
- Vrije Universiteit Amsterdam
- EDLProject Workshop
- 22-23 November 2007
2Agenda
- TELPlus Context
- Improving subject access
- 3 sub-tasks
- Services for TEL
3TELPlus Context
- Started October 2007
- Running 27 months
- Content WPs
- OCRing previously digitised material
- Improving the usability of TEL through OAI PMH
compliancy - Improving Access
- Integrating services with TEL portal
- User personalisation services
- Extending TEL to Bulgaria Romania
4WP3 Improving Access
- Task 1 Indexing for usability
- Review/test state-of-the-art semantic search
engines - On content of documents
- Task 2 Improving subject access
- Task 3 FRBR aggregation, search and browsing
- Create/exploit FRBR metadata repositories
- Task 4 Focus on users
- Focus groups on prototypes
5WP 3 Task 2 Improving Subject Access
- Improving subject access via semantic alignment
between subjects - Search through collections
- Using metadata
- In a controlled setting
- Paving the way for enhanced usages
- Advanced treatments mentioned in TELplus need
conceptual structures and links between these
structures - E.g. clustering
6WP 3 Task 2 Improving Subject Access
- Improving subject access via semantic alignment
between subjects - Reference MACS project
- Manually-built semantic equivalences between
Rameau, SWD LCSH headings
7MACS Querying Collections
8MACS Query Reformulation Options
9WP 3 Task 2 Improving Subject Access
- Improving subject access via semantic alignment
between subjects - Reference MACS project
- Manual equivalences between Rameau, SWD, LCSH
headings - Here an experiment on deploying automatic
alignment techniques - Determining possible strategies
- Assessing feasibility and usefulness
- MACS context
10WP3.2 Sub-tasks
- 3.2.1. Converting the subjects to standard
representation language - Semantic web format (SKOS)
- 3.2.2. Aligning the vocabularies
- Semantic correspondences between subjects
- 3.2.3. Deploying the alignment knowledge obtained
into TEL framework - E.g. using links to reformulate queries from one
subject list to the other
11Converting subjects to standard representation
language
- Goal solving syntactic heterogeneity between
vocabularies - Enabling the use of standard tools
- E.g. for query (re)formulation
- Paving the way for dealing with semantic
heterogeneity - Definitions of concepts expressed according to a
common model
12Converting subjects to standard representation
language
- Approach Semantic Web and SKOS
- Semantic Web
- Knowledge objects as web resources (URIs)
- Description by linking resources (RDF)
- Description using shared formal vocabularies
(ontologies) - SKOS
- A standard Semantic Web model (ontology)
- For knowledge organization systems (thesauri,
subject heading lists)
13SKOS Example
skosConceptScheme
rdftype
skosConcept
http//www.iconclass.nl/
rdftype
skosinScheme
http//www.iconclass.nl/s_11F
skosprefLabel
skosbroader
the Virgin Mary_at_en
la Vierge Marie_at_fr
skosprefLabel
http//www.iconclass.nl/s_11
14Converting subjects to standard representation
language - Process
- Getting processable versions from owners
- E.g. XML
- Analyzing the models
- Converting to SKOS
15WP3.2 Sub-tasks
- 3.2.1. Converting the subjects to standard
representation language - Semantic web format (SKOS)
- 3.2.2. Aligning the vocabularies
- Semantic correspondences between subjects
- 3.2.3. Deploying the alignment knowledge obtained
into TEL framework - E.g. using links to reformulate queries from one
subject list to the other
16Vocabulary Alignment
- Specifying required alignment format (links)
- Type of mapping links equivalence, broader
- Cardinality one-to-one, one-to-many
- Taking application context (TEL) into account
17Vocabulary Alignment
- Specifying required alignment format (links)
- Selecting ( running) alignment techniques/tools
- Inspired by semantic web approaches
18Vocabulary Alignment Techniques
- Similar to ontology alignment problem
- Existing approaches for (semi-) automatic
ontology alignment - Using techniques from linguistics, computer
science, statistics - Problem performances do not allow 100 automatic
alignment - Problem multilingual case
- Some techniques cannot be used
19Potential Technique Using Background Knowledge
- Using a shared conceptual reference to find links
Publication
Calendar
SHL 1
SHL 2
20Potential Technique Statistical Alignment
- Object information (book indexing)
Dutch Literature
SHL 1
SHL 2
Dutch
Dually-indexed books
21Vocabulary Alignment
- Specifying required alignment format (links)
- Selection ( running) of tool/method
- Evaluation ( cleaning)
- Considering application
22Evaluation of Alignments
- MACS has produced mappings!
- Possible gold standard
- But has MACS produced all mappings?
- Which proportion of the SHLs is covered?
- Taking into account all indexing strings?
- Are MACS mappings the only interesting ones?
- Serendipity mappings
- Concepts that are not equivalent but could bring
useful results when added to queries - Compensating for indexing variability
23Evaluation of Alignments
- Several scenarios for using and evaluating
alignments - Concept-based search
- Re-indexing
- Integration of one SHL into the other
- SHL Merging
- Free-text search
- Navigation
24Evaluation of Alignments
- Several scenarios for using and evaluating
alignments - Concept-based search
- Retrieving books indexed by SHL1 using SHL2
concepts - Re-indexing
- Integration of one SHL into the other
- SHL Merging
- Free-text search
- Matching user search terms to both SHL1 or SHL2
concepts - Navigation
- Browsing several collections using one SHL
structure
25Evaluation of Alignments
- Several settings for a single scenario
- Fully automatic reformulation vs assisted
reformulation (candidates) - Different evaluation measures
- Good mappings vs acceptable ones
- Number of candidates for reformulation
- Semantic closeness to original query
26Vocabulary Alignment
- Specifying required alignment format (links)
- Selection ( running) of tool/method
- Evaluation ( cleaning)
- Assessment of the approach
- Efforts required, quality, extendibility
27WP3.2 Sub-tasks
- 3.2.1. Converting the subjects to standard
representation language - Semantic web format (SKOS)
- 3.2.2. Aligning the vocabularies
- Semantic correspondences between subjects
- 3.2.3. Deploying the alignment knowledge obtained
into TEL framework - E.g. using links to reformulate queries from one
subject list to the other
28Deploying the alignment knowledge obtained into
TEL framework
- Observing integration of MACS data into TEL
- Conceptual input for alignment requirements
- Integration of the obtained alignment in TEL
- Assessment of the alignment integration
- Technical aspects, usage aspects
29Reminder
- Alignment is a difficult problem
- Application-specific alignment pretty much
unexplored in Semantic Web research - More a feasibility study than a complete solution
to the problem - Practical goal investigate how automatic
techniques could help MACS-like initiatives - Manual mapping is labour-intensive
30Agenda
- TELPlus Context
- Improving subject access
- 3 sub-tasks
- Services for TEL
31WP4 Integrating services with the European
Library portal
- Theo van Veen (KB)
- Tasks
- Identifying services that are going to give the
user the greatest return - Creating new services
- Integrating services within TEL
32WP4 Some Services Mentioned
- Preliminary inventory no official commitment!
- Services based on controlled vocabularies
- Thesaurus and name authority service
- Providing terms linked to query terms
- Semantic enrichment service
- Users can annotate search results with terms
- Distance between terms and related terms
33WP4 Some Services Mentioned
- Preliminary inventory no official commitment!
- Services based on controlled vocabularies
- Thesaurus and name authority service
- Semantic enrichment service
- Distance between terms and related terms
- Adding more value from controlled vocabularies
and alignments between them
34Thanks!