Title: MESMUSES methodology
1MESMUSES methodology
- Lessons learned and open issues
- Alain Michard
- Florence, June 2003
2MESMUSES broad vision
- Just like several other projects
- SW is all about semantic interoperability
- Sharing machine-readable terminologies and
classification schemes - Science and culture are collective and
international - Semantic Web methodology should be highly
relevant for managing and sharing scientific and
cultural information
3Some key ST issues in the Project
- Model is RDFS / OWL-Lite adequate ?
- Schema authoring method and tools needed !
- Metadata where does it come from ?
- Automatic Indexing experiments with a
categorizer
4The basic SW model
Type texte imprimé, monographie Auteur(s)
Zola, Émile (1840-1902) Titre(s) L'assommoir
Texte imprimé / par Emile Zola Edition 50e
éd. Publication Paris G. Charpentier,
1878 Description matérielle 111-569 p. Notice
n FRBNF35963044
Real-world entities
5Model and Schema Language
- Typed attributes are needed
- XML-Schema types
- Derived types (e.g. Celsius temperature,
Gregorian date, etc.) - Enumerated types, thesauri
- Time-stamping
- Cardinality constraints
- Explicit transitivity of properties (e.g.
geographic inclusion)
6Schema authoring issues (1)
- Find the right level of abstraction
- Is Glucid a class or an instance ?
- Or is it sometime a class and sometime an
instance ? - Avoid the KR attitude and practices !
- Its all about indexing resources with shared
terminologies, not about representing human
knowledge !
7Schema authoring issues (2)
8Schema authoring issues (3)
9Schema authoring issues (4)
- Authoring tools are badly needed
- Graphical representation of the schema
- Zooming on sub-graphs (hierarchies)
- Versioning
- Consider using UML authoring environment ?
- Established methodology and tutorials are needed
10Creating Surrogates
- Data extraction and fusion from structured
sources - R-DB, XML-DB, LDAP
- Updating
- When ?
- Should not create duplicates !
- Detect cross-references
- Authority lists
- Thesauri
- Lexical distance
- ???
11Automatic Categorization
- Automatic indexing
- By extracting metadata from resources
- By automatic categorization
- Define hierarchies of concepts inside the
schema - Seeding with representative documents
- Machine learning to create categorizers
- Pros enriched search functionality
- Cons hierarchies of categories are static
- Adding a category may change the categorizers of
the others
12Bottom-line
- RDFS schema authoring may be more difficult than
E-R modelling - Debates on syntactic features are irrelevant
- Should be grounded on real-world implementations
and testbeds - A new query language (e.g. RQL) is not high
priority - We have not addressed the logical rules layer
- Semantic Web vs. Community Webs