OntoELAN: An OntologyBased Linguistic Multimedia Annotator - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

OntoELAN: An OntologyBased Linguistic Multimedia Annotator

Description:

Linguistics. Many languages are in serious danger of being lost ... Support for the annotation of descriptive metadata such as title, authors, date, time, etc. ... – PowerPoint PPT presentation

Number of Views:116
Avg rating:3.0/5.0
Slides: 28
Provided by: ArtemCh2
Category:

less

Transcript and Presenter's Notes

Title: OntoELAN: An OntologyBased Linguistic Multimedia Annotator


1
OntoELAN An Ontology-Based Linguistic
Multimedia Annotator
Speaker Artem Chebotko (artem_at_cs.wayne.edu) Dep
artment of Computer Science Wayne State
University
2
Coauthors
From left Ms. Yu Deng, graduated with M.S. in
Computer Science in 2004 Prof. Shiyong Lu,
Computer Science, my advisor Prof. Farshad
Fotouhi, Computer Science, Chair of the
department Prof. Anthony Aristar, Dept. of
English, Linguistics Program. All at the Wayne
State University. Hennie Brugman, Alexander
Klassmann, Han Sloetjes, Albert Russel, Peter
Wittenburg, Max Planck Institute for
Psycholinguistics, Nijmegen, Netherlands. Acknowle
dgements Laura Buszard-Welcher and Andrea Berez,
Dept. of English, Linguistics Program, WSU.
3
The Outline of The Talk
  • Background and Motivation
  • The Limitations of Existing Tools
  • Our Approach and Advantages
  • An Overview of OntoELAN
  • Demo

4
Background and Motivation
  • Linguistics
  • Many languages are in serious danger of being
    lost
  • In fact, half of the world's approximately 6,500
    languages may disappear in the next 100 years
  • Language data is critical to the research of
    linguistics, anthropology, history, sociology,
    and political science, etc.
  • Language data is also important for the community
    of that language.

5
Background and Motivation
  • Multimedia
  • Many language data are collected as audio and
    video recordings
  • Difficult for indexing and retrieval because
    multimedia data are not structured and their
    semantics are implicit in their contents.
  • Annotation of multimedia data provides an
    opportunity for making the semantics explicit

6
Background and Motivation
  • Ontology-based annotation
  • An ontology is an explicit specification of a
    shared conceptualization. It formalizes the
    knowledge of various concepts and their
    relationships in a particular domain
  • Annotation with ontological terms, whose meaning
    is known and understood by the domain community

7
Requirements for a Linguistic Multimedia
Annotator
  • Support for the annotation of descriptive
    metadata such as title, authors, date, time, etc.
  • Support for a time axis and temporal segmentation
    of clips into slots
  • Support for multiple-tier annotation, with each
    tier providing one avenue for annotation
  • Support for ontology-based annotation to avoid
    incompatible formats and vocabularies

8
The Limitations of Existing Tools
  • Either dont support ontology
  • IBM MPEG-7 Annotation Tool, ELAN
  • or provide limited support of multimedia
  • Protégé, ImageSpace, IBM MPEG-7 Annotation Tool

9
Our Approach and Advantages
  • We developed an ontology-based annotation tool,
    OntoELAN, for linguistic multimedia data that
    satisfies all the above requirements
  • The ontological approach eliminates multiple
    incompatible annotation formats
  • if the whole community can agree upon one domain
    ontology
  • Annotations are formally defined and machine
    interpretable
  • Deduction of additional, implicit information
  • Search is precise and easier

10
An Overview of OntoELAN
  • Developed on the top of ELAN annotator
  • Max Planck Institute for Psycholinguistics team
  • Features inherited from ELAN
  • display a speech and/or video signals, together
    with their annotations
  • time linking of annotations to media streams
  • linking of annotations to other annotations
  • unlimited number of annotation tiers as defined
    by a user
  • different character sets
  • basic search facilities.

11
An Overview of OntoELAN
  • Ontology support
  • Wayne State University team
  • New features
  • language profile creation
  • ontology-based annotation
  • storing annotations in the XML format based on
    the General Multimedia Ontology and domain
    ontologies.

12
An Overview of OntoELAN
13
An Overview of OntoELAN
14
Linguistic Domain Ontology
  • One example is the General Ontology for
    Linguistic Description (GOLD)
  • Developed at University of Arizona
  • Expressions
  • OrthographicExpression, Utterance,
    SignedExpression, Word, WordPart
  • Grammar
  • Tense, Number, Agreement, PartOfSpeech
  • PartOfSpeech Noun, Verb, Participle, Preverb
  • Data structures
  • A lexical entry, a phoneme table and a syntactic
    tree
  • Metaconcepts
  • Language itself

15
General Multimedia Ontology
  • Simple semantic framework for multimedia
    annotation
  • Developed at Wayne State University especially
    for OntoELAN
  • AnnotationDocument
  • Tier
  • TimeSlot
  • Annotation
  • AlignableAnnotation
  • ReferringAnnotation
  • AnnotationValue
  • StringAnnotation
  • OntologyAnnotation
  • etc.

16
General Multimedia Ontology
17
Language Profile
  • is a subset of ontological terms, possibly
    renamed, that are used in the annotation of a
    particular multimedia resource
  • ontological terms
  • user-defined terms
  • a mapping between ontological terms and
    user-defined terms
  • a reference to an ontology

18
Language Profile
  • Advantages
  • Only a subset of ontological terms is useful for
    a particular resource annotation
  • Renaming ontological terms, e.g. use another
    language, give an abbreviation or a synonym
  • Combining the meaning of two or many ontological
    terms in one user-defined term.
  • Disadvantage
  • More work

19
Language Profile
20
Annotation Tiers and Linguistic Types
  • Annotation tiers
  • contain annotation values
  • can be either alignable or referring
  • are associated with their linguistic types
  • Linguistic types
  • None
  • Time Subdivision
  • Symbolic Subdivision
  • Symbolic Association
  • Ontological tier

21
Linguistic Multimedia Annotation with OntoELAN
  • Language profile creation
  • Creation of tiers
  • Creation of annotations

22
Linguistic Multimedia Annotation with OntoELAN
23
Demos
  • Language profile creation
  • profile01.swf profile01.AVI
  • profile02.swf profile02.AVI
  • Creation of tiers Creation of annotations
  • annotate01.swf annotate01.AVI
  • annotate02.swf annotate02.AVI

24
Conclusions and Future Work
  • OntoELAN is the first attempt at annotating
    linguistic multimedia data with a linguistic
    ontology
  • Future Work
  • provide more channels for sharing data on the
    Web, such as the multimedia descriptions, the
    language words, etc.
  • improve the current searching system
  • integrate a text document annotation

25
References
  • Artem Chebotko, Yu Deng, Shiyong Lu and Farshad
    Fotouhi. An Ontology-based Multimedia Annotator
    for the Semantic Web of Language Engineering.
    International Journal on Semantic Web and
    Information Systems, January, 2005.
  • Artem Chebotko et al. OntoELAN An Ontology-based
    Linguistic Multimedia Annotator. Proc. of the
    IEEE Sixth International Symposium on Multimedia
    Software Engineering (IEEE-MSE'2004), Miami, FL,
    USA, December, 2004.

26
References
  • OntoELAN
  • http//www.cs.wayne.edu/yudeng/projects.htm
  • LangDL A Digital Library For Language
    Engineering And Research
  • http//database.cs.wayne.edu/proj/langdl/index.htm
    l
  • ELAN
  • http//www.mpi.nl/tools/elan.html
  • E-MELD
  • http//www.emeld.org
  • GOLD
  • http//www.emeld.org/gold
  • General Multimedia Ontology
  • http//database.cs.wayne.edu/proj/OntoELAN/multime
    dia.owl

27
Questions?
  • Contact information
  • Artem Chebotko
  • artem_at_cs.wayne.edu
  • 313-577-6711
Write a Comment
User Comments (0)
About PowerShow.com