A Semantic Web for Linguistics - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

A Semantic Web for Linguistics

Description:

We particularly thank Gary Simons, who can't be here because he's doing ISO work ... Baden Hughes and Dafydd Gibbon, for putting the fear of god into us. ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 33
Provided by: terry225
Category:

less

Transcript and Presenter's Notes

Title: A Semantic Web for Linguistics


1
A Semantic Web for Linguistics
  • Scott Farrar
  • farrar_at_uni-bremen.de
  • D. Terence Langendoen
  • langendt_at_u.arizona.edu

2
Acknowledgments
  • We particularly thank Gary Simons, who cant be
    here because hes doing ISO work on language
    codes as we speak.
  • Baden Hughes and Dafydd Gibbon, for putting the
    fear of god into us.
  • Laura Buszard-Welcher for FIELD testing our
    nascent ontology.

3
Purposes of this talk
  • To introduce the problem of data interoperability
    in the context of the EMELD project.
  • To show how the semantics of markup may be
    derived using a metaschema and an ontology,
    resulting in a Semantic Web for linguistics.

4
The EMELD Project
  • Electronic Metastructures for Endangered Language
    Data
  • Five year grant from NSF
  • Eastern Michigan, Wayne State, Arizona, LDC
    (Penn), Endangered Languages Fund, SIL
  • A major objective
  • The "formulation and promulgation of best
    practice in linguistic markup of texts and
    lexicon"

5
Problem Statement
  • Three points of community consensus
  • XML markup provides the best format for the
    interchange and archiving of language data.
  • No single system of XML markup can be imposed on
    all language resources.
  • Linguists need to be able to perform queries
    across data sets.
  • The problem
  • How do we interoperate when resources use
    different markup schemas?

6
Smart searches need smart data
  • XML can be used to represent linguistic analyses
    to any desired degree of refinement.
  • TEI feature system recommendations demonstrate
    this. Now being proposed as ISO standard.
  • Analyses in other formats can be migrated to XML
    for both archiving, and smart web searching.

7
Smart markup isnt enough
  • Meaning and use of structural markup varies from
    site to site.
  • Same term used with different meanings.
  • Different terms used with the same meaning.
  • Markup element and attribute names and values,
    and structural content may be in different
    natural languages.
  • Sites are encoded at different levels of
    granularity.

8
Say what you mean!
  • Markup is syntax its meaning can only be
    inferred for individual sites, or groups of sites
    that use a common markup scheme (e.g. TEI).
  • The element ltdeletedgt is deleted in the
    Wittgenstein archive, but cant be guaranteed
    across archives.
  • The attribute Number plural is gt 1 in a
    hypothetical markup of SAE, but is gt 2 in such
    a markup of Kiowa.

9
Passing the baton
10
Paths to interoperability
  • Possibility 1 simply impose standards on
    individual data sources.
  • Possibility 2 allow freedom in the construction
    of data, but map to a shared semantic resource
    (Semantic Web idea).

11
Mapping to a shared resource
  • Develop markup with the semantic resource in mind
    (develop the SemWeb directly).
  • Allow for full freedom in the markup. Map to
    semantic resource later (migrating legacy data).
  • metaschema

12
Defining the semantics of markup
  • markup schema
  • A formal definition (as with XML DTD or XML
    Schema) of the permitted vocabulary and syntax of
    markup for a class of source documents.
  • semantic schema
  • A formal definition (as with RDF Schema or OWL)
    of the concepts in a particular domain.
  • metaschema
  • A formal definition of how the elements and
    attributes of a markup schema are interpreted in
    terms of the concepts of a semantic schema.

13
A metaschema language
  • lt!ELEMENT metaschema (interpret ignore) gt
  • lt!ELEMENT interpret (resource literal
    property) gtlt!ATTLIST interpret markup
    CDATA REQUIREDgt
  • lt!ELEMENT resource (literal property embed)gt
  • lt!ATTLIST resource concept CDATA REQUIREDgt
  • lt!ELEMENT literal (text-content) gt lt!ATTLIST
    literal concept CDATA REQUIREDgt
  • lt!ELEMENT property (resource resourceRef
    embed)gt lt!ATTLIST property concept CDATA
    REQUIREDgt
  • ...

14
Metaschema schematic
Metaschema
SourceDocument
SemanticInterpretations
DocumentInterpreter
SourceDocument
SourceDocument
15
Implementation
  • The document interpreter has been implemented in
    XSLT as a two-stage process
  • Input a metaschema documentStylesheet the
    metaschema compiler (XSLT)Output interpreter
    for that metaschema (XSLT)
  • Input a source documentStylesheet interpreter
    for the metaschema (XSLT) Output the semantic
    interpretation (RDF/XML)

16
For example
  • Source document
  • ltmorph id"aba"gt lt!-- Content --gt lt/morphgt
  • Metaschema directive
  • ltinterpret markup"morph"gt ltresource
    concept"goldmorpheme"/gt lt/interpretgt
  • Interpretation of document
  • ltgoldmorpheme rdfabout"element(aba)"gt
    lt!-- Interpretation of content --gt
    lt/goldmorphemegt

17
Example 2
  • Source document
  • ltorth typevariantgtabbalt/orthgt
  • Metaschema directive
  • ltinterpret markuporth_at_typevariant"gt
    ltliteral concept"goldOrthoWord"/gt
    lt/interpretgt
  • Interpretation of document
  • ltgoldOrthoWordgtabbalt/goldOrthoWordgt

18
The full power
  • The full XPath expression language is available
    to specify _at_markup.
  • lttext-contentgt allows literal values to be
    composed (with optional before and after labels)
    from multiple markup sources.
  • ltembedgt allows explicit control of embedding
  • partition of source child elements into separate
    semantic substructures
  • movement of source elements

19
Ontological support
  • For a definition, refer to other presentations
  • There can be many ontologies for a given domain.
  • The same domain can be modeled by more than one
    ontology.
  • Some domain may require the use of more than one
    ontology.

20
Some examples of upper ontologies
  • Practical ontologies
  • CYC (Cycorp)
  • SUMO (Teknowledge)
  • GUM (Bateman et al.)
  • Almost practical ontologies
  • DOLCE (Trento, Rome),
  • BFO (Leipzig--IFOMIS)

21
GOLD
  • General Ontology for Linguistic Description
  • Based on SUMO
  • Currently in OWL-DL
  • http//emeld.org/gold
  • Not a theory of language, but a metalanguage to
    talk about theory specific constructs.

22
Major Conceptual Categories
  • started with morphosyntax (morphemes, words,
    meanings)
  • linguistic expressions
  • linguistic units
  • features/values
  • linguistic relations
  • NL semantics

23
Partial SUMO Taxonomy
Entity
is-a
is-a
Physical
Abstract
24
Linguistic Expressions
  • Orthographic units of language

Physical
Object
SelfConnectedObject
ContentBearingObject
WrittenExpression
25
Linguistic Units
  • Abstract aspect of language

Abstract
LinguisticUnit
MorphosyntacticUnit
PhrasalUnit
LexicalUnit
SublexicalUnit
26
Semantic Units
  • Cognitive containers for meanings of lexemes
    and bound forms (like synsets in WordNet)

SelfConnectedObject
Artifact
HuntingImplement
(function Arrow Hunting)
27
Linguistic Features
  • Attributes which inhere to linguistic units (in
    the spirit of HPSG)

Abstract
Attribute
InternalAttribute
LinguisticFeature
MorphosyntacticFeature
28
GOLD Relations
  • Formal means of relating linguistic entities to
    one another

BinaryPredicate
LinguisticRelation
SyntacticRelation
RhetoricalRelation
SemanticRelation
29
GOLD Relations Basic
  • relations between two entities of the same type

part
Eingang
gang
constituent
/in the middle/
/the middle/
antonym
Happy
Sad
30
GOLD Relations Semiotic
  • Special kinds of linguistic relation that cut
    across linguistic types

realizes
designates
dog
/dog/
Dog
31
Summary
  • Achieving semantic interoperability among
    disparate data sets is a non-trivial problem, as
    shown by the EMELD project.
  • Interoperability may be achieved using a
    metaschema and an ontology, resulting in a
    Semantic Web for linguistics.
  • Any kind of semantic interoperability requires a
    shared semantic resourceGOLD.

32
Contact Info
  • D. Terence Langendoen
  • langendt_at_u.arizona.edu
  • Scott Farrar
  • farrar_at_uni-bremen.de
Write a Comment
User Comments (0)
About PowerShow.com