Title: OntoELAN: An OntologyBased Linguistic Multimedia Annotator
1OntoELAN An Ontology-Based Linguistic
Multimedia Annotator
Speaker Artem Chebotko (artem_at_cs.wayne.edu) Dep
artment of Computer Science Wayne State
University
2Coauthors
From left Ms. Yu Deng, graduated with M.S. in
Computer Science in 2004 Prof. Shiyong Lu,
Computer Science, my advisor Prof. Farshad
Fotouhi, Computer Science, Chair of the
department Prof. Anthony Aristar, Dept. of
English, Linguistics Program. All at the Wayne
State University. Hennie Brugman, Alexander
Klassmann, Han Sloetjes, Albert Russel, Peter
Wittenburg, Max Planck Institute for
Psycholinguistics, Nijmegen, Netherlands. Acknowle
dgements Laura Buszard-Welcher and Andrea Berez,
Dept. of English, Linguistics Program, WSU.
3The Outline of The Talk
- Background and Motivation
- The Limitations of Existing Tools
- Our Approach and Advantages
- An Overview of OntoELAN
- Demo
4Background and Motivation
- Linguistics
- Many languages are in serious danger of being
lost - In fact, half of the world's approximately 6,500
languages may disappear in the next 100 years - Language data is critical to the research of
linguistics, anthropology, history, sociology,
and political science, etc. - Language data is also important for the community
of that language.
5Background and Motivation
- Multimedia
- Many language data are collected as audio and
video recordings - Difficult for indexing and retrieval because
multimedia data are not structured and their
semantics are implicit in their contents. - Annotation of multimedia data provides an
opportunity for making the semantics explicit
6Background and Motivation
- Ontology-based annotation
- An ontology is an explicit specification of a
shared conceptualization. It formalizes the
knowledge of various concepts and their
relationships in a particular domain - Annotation with ontological terms, whose meaning
is known and understood by the domain community
7Requirements for a Linguistic Multimedia
Annotator
- Support for the annotation of descriptive
metadata such as title, authors, date, time, etc. - Support for a time axis and temporal segmentation
of clips into slots - Support for multiple-tier annotation, with each
tier providing one avenue for annotation - Support for ontology-based annotation to avoid
incompatible formats and vocabularies
8The Limitations of Existing Tools
- Either dont support ontology
- IBM MPEG-7 Annotation Tool, ELAN
- or provide limited support of multimedia
- Protégé, ImageSpace, IBM MPEG-7 Annotation Tool
9Our Approach and Advantages
- We developed an ontology-based annotation tool,
OntoELAN, for linguistic multimedia data that
satisfies all the above requirements - The ontological approach eliminates multiple
incompatible annotation formats - if the whole community can agree upon one domain
ontology - Annotations are formally defined and machine
interpretable - Deduction of additional, implicit information
- Search is precise and easier
10An Overview of OntoELAN
- Developed on the top of ELAN annotator
- Max Planck Institute for Psycholinguistics team
- Features inherited from ELAN
- display a speech and/or video signals, together
with their annotations - time linking of annotations to media streams
- linking of annotations to other annotations
- unlimited number of annotation tiers as defined
by a user - different character sets
- basic search facilities.
11An Overview of OntoELAN
- Ontology support
- Wayne State University team
- New features
- language profile creation
- ontology-based annotation
- storing annotations in the XML format based on
the General Multimedia Ontology and domain
ontologies.
12An Overview of OntoELAN
13An Overview of OntoELAN
14Linguistic Domain Ontology
- One example is the General Ontology for
Linguistic Description (GOLD) - Developed at University of Arizona
- Expressions
- OrthographicExpression, Utterance,
SignedExpression, Word, WordPart - Grammar
- Tense, Number, Agreement, PartOfSpeech
- PartOfSpeech Noun, Verb, Participle, Preverb
- Data structures
- A lexical entry, a phoneme table and a syntactic
tree - Metaconcepts
- Language itself
15General Multimedia Ontology
- Simple semantic framework for multimedia
annotation - Developed at Wayne State University especially
for OntoELAN - AnnotationDocument
- Tier
- TimeSlot
- Annotation
- AlignableAnnotation
- ReferringAnnotation
- AnnotationValue
- StringAnnotation
- OntologyAnnotation
- etc.
16General Multimedia Ontology
17Language Profile
- is a subset of ontological terms, possibly
renamed, that are used in the annotation of a
particular multimedia resource - ontological terms
- user-defined terms
- a mapping between ontological terms and
user-defined terms - a reference to an ontology
18Language Profile
- Advantages
- Only a subset of ontological terms is useful for
a particular resource annotation - Renaming ontological terms, e.g. use another
language, give an abbreviation or a synonym - Combining the meaning of two or many ontological
terms in one user-defined term. - Disadvantage
- More work
19Language Profile
20Annotation Tiers and Linguistic Types
- Annotation tiers
- contain annotation values
- can be either alignable or referring
- are associated with their linguistic types
- Linguistic types
- None
- Time Subdivision
- Symbolic Subdivision
- Symbolic Association
- Ontological tier
21Linguistic Multimedia Annotation with OntoELAN
- Language profile creation
- Creation of tiers
- Creation of annotations
22Linguistic Multimedia Annotation with OntoELAN
23Demos
- Language profile creation
- profile01.swf profile01.AVI
- profile02.swf profile02.AVI
- Creation of tiers Creation of annotations
- annotate01.swf annotate01.AVI
- annotate02.swf annotate02.AVI
24Conclusions and Future Work
- OntoELAN is the first attempt at annotating
linguistic multimedia data with a linguistic
ontology - Future Work
- provide more channels for sharing data on the
Web, such as the multimedia descriptions, the
language words, etc. - improve the current searching system
- integrate a text document annotation
25References
- Artem Chebotko, Yu Deng, Shiyong Lu and Farshad
Fotouhi. An Ontology-based Multimedia Annotator
for the Semantic Web of Language Engineering.
International Journal on Semantic Web and
Information Systems, January, 2005. - Artem Chebotko et al. OntoELAN An Ontology-based
Linguistic Multimedia Annotator. Proc. of the
IEEE Sixth International Symposium on Multimedia
Software Engineering (IEEE-MSE'2004), Miami, FL,
USA, December, 2004.
26References
- OntoELAN
- http//www.cs.wayne.edu/yudeng/projects.htm
- LangDL A Digital Library For Language
Engineering And Research - http//database.cs.wayne.edu/proj/langdl/index.htm
l - ELAN
- http//www.mpi.nl/tools/elan.html
- E-MELD
- http//www.emeld.org
- GOLD
- http//www.emeld.org/gold
- General Multimedia Ontology
- http//database.cs.wayne.edu/proj/OntoELAN/multime
dia.owl
27Questions?
- Contact information
- Artem Chebotko
- artem_at_cs.wayne.edu
- 313-577-6711