Title: Annotation for the Semantic Web
1Annotation for the Semantic Web
- Yihong Ding
- A PhD Research Area Background Study
2Introduction
- Current web is designed for humans
- Semantic web (next-generation web) is designed
for both humans and machines - Semantic annotation
- Disclose semantic meanings of web content
- Convert current HTML web pages to
machine-understandable semantic web pages
3Outline
- Historical Review
- Current Status
- Related Research Fields
- Future Challenges
4Semantic Annotation in Ancient Ages
- No evidence when humans started to annotate text
history of semantic annotation history of
ontologies
5The First Dream of Modern Semantic Annotation
- July 1945, Vannevar Bush, As We May Think, The
Atlantic Monthly - Bush's dream device
- humans could acquire information (World Wide Web)
- humans could contribute their own ideas (Web
Annotation) - from/to the community
6Web Annotation before 1999
- Heck et. al., 1999
- Developing better user interfaces
- Improving storage structures
- Increasing annotation sharability
- Example systems
- ComMentor, AnnotatorTM, Third Voice, CritLink,
CoNote, and Futplex
7Semantic Labeling before 1999
- Dublin Core Metadata Standard http//dublincore.o
rg/ - 15 element sets encapsulate data
- Superimposed
- Information
- Delcambre et. al., 2001
- Type
- Format
- Identifier
- Source
- Language
- Relation
- Coverage
- Rights
- Title
- Subject
- Description
- Creator
- Publisher
- Contributor
- Date
Superimposed Layer
marks
Base Layer
Information Source1
Information Source2
Information Sourcen
8Status of Current Web Semantic Annotation Studies
- Interactive annotation
- Automatic annotation
9Interactive Annotation Systems
- Lets humans interact through machine interfaces
to annotate documents - Problems
- Inconsistency
- Error-proneness
- Lack of scalability
- Values
- Easy to implement
- Suitable for small-scale tasks and experiments
- Helpful to build corpora for evaluations
10Interactive Annotation Systems
- Annotea Kahan et. al., 2001
- W3C project
- An open RDF infrastructure for shared web
annotations - SHOE (Simple HTML Ontology Extensions) Heflin
et. al., 2000 - University of Maryland, College Park
- Manual annotator using SHOE ontologies
11Automatic Annotation Systems
- Common feature use of ontologies
- Typical approaches
- Annotation with automatic ontology generation (1
system) - Annotation with automatic information extraction
(6 systems)
12Annotation with Ontology Generation
- SCORE (Semantic Content Organization and
Retrieval Engine) Sheth et. al., 2002 - Voquette (now acquired by Semagix Co.),
University of Georgia
13Annotation with Automatic IE
- Ont-O-Mat Handschuh et. al., 2002
- University of Karlsruhe at Germany
- MnM Vargas-Vera et. al., 2002
- Open University of United Kingdom
- Common features
- DAMLOIL ontologies
- Supervised adaptive learning with Lazy-NLP
(Amilcare) - Annotation stored inside web pages
- Differences
- MnM allows multiple ontologies at one time
- MnM also stores annotations in a knowledge base
- Ont-O-Mat uses OntoBroker both as an annotation
server and as a reasoning engine
14Annotation with Automatic IE
- KIM Platform Kiryakov et. al., 2004
- Ontotext Lab., Sirma Group, a Canadian-Bulgarian
joint venture - SemTag Dill et. al., 2003
- IBM Almaden Research Center
- Similar features
- Use one special designed upper-level ontology,
KIM ontology vs. TAP ontology - Specific features
- KIM uses an NLP tool (GATE) to extract
information - KIM stores annotations in a separate file
- SemTag uses inductive learning to extract
information - SemTag annotates 264 million Web pages and
generate approximately 434 million semantic tags
15Annotation with Automatic IE
- Stony Brook Annotator Mukherjee et. al., 2003
- Stony Brook University
- Structural analysis of DOM tree for HTML pages
- Drawbacks
- Taxonomic relationships only
- No generic labeling algorithm disclosed
- RoadRunner Labeller Arlotta et. al., 2003
- Università di Roma Tre and Università della
Basilicata - Automatic assign label names based on image
recognition - Drawbacks
- Semantic meaning of labels unknown
- Difficulty in associating labels with ontologies
16Related Research Fields
- Semantic Web
- Information extraction
- Ontology related topics
- Conceptual modeling
- Logic languages
- Web services
17Semantic Web
- Weaving the Web Berners-Lee 1999, birth of the
Semantic Web - The Semantic Web Berners-Lee et. al., 2001
18Information Extraction Laender et. al., 2002
- Human-guided approaches
- Wrapper languages, Modeling-based tools
- No annotation examples
- Too heavily human involvement
- Non-ontology-based approaches
- HTML-aware tools StonyBrook tool Mukherjee et.
al., 2003, - RoadRunner Labeller Arlotta
et. al., 2003 - NLP-based tools Ont-O-Mat Handschuh et.
al., 2002, - MnM Vargas-Vera et. al., 2002,
- KIM platform Kiryakov et. al., 2004
- ILP-based tools SemTag Dill et. al.,
2003 - Require extra alignment between extraction
categories in wrappers and concepts in ontologies - Ontology-based Approaches
- Ontology-based tools my proposal
- Not require alignment, resilient to web page
layouts - Slow in execution time
19Ontology Related Topics
- Ontology languages W3C, OWL
- Knowledge representation and reasoning
- Ontology generation Ding et. al., 2002a
- Annotation domain specification
- Ontology enrichment Parekh et. al., 2004
- Annotation domain specification expanding
- Ontology population Alani et. al., 2003
- Annotation result output
- Ontology mapping and merging Ding et. al.,
2002b - Large-scale annotation requires large-scale
ontologies - Small-scale ontologies are less expensive to
build - Ontology mapping creates the links among
small-scale ontologies - Ontology merging fuses small-scale ontologies
into a large-scale ontology
20Conceptual Modeling
- Annotation requires knowledge modeling
- Ontology is a type of conceptual modeling
- ER Model Chen 1976
- The most influential conceptual model
- Influence OSM model, basis of data-extraction
ontology
21Logic Languages
- Logic foundation provides reasoning and inference
power for modeling languages - Examples
- First-order logic Smullyan 1995
- Description logics Brachman et. al., 1984
22Web Services
- More and more, web services become the typical
application in semantic web scenario. - Two ways aligning web services with semantic
annotation - Web service annotation Brodie 2003
- Semantic annotation web service
23Summary and Future Challenges
- Annotation for the semantic web
- Enable machine-understandable web
- Support semantic searching
- Support global-wide web services
- Still an unsolved problem
- Main technical challenges
- Direct ontology-driven annotation mechanism
- Concept disambiguation
- Automatic domain ontology generation
- Scalability