Post PAROLESIMPLE lexical resources and initiatives in Sweden - PowerPoint PPT Presentation

1 / 22
About This Presentation

Post PAROLESIMPLE lexical resources and initiatives in Sweden


brudkl nning (bride dress) dress bride. brudkrona (bride crown) crown bride ... The lexemes sharing mother and father relations are closer related to each other, ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 23
Provided by: Office20060


Transcript and Presenter's Notes

Title: Post PAROLESIMPLE lexical resources and initiatives in Sweden

Post PAROLE/SIMPLE lexical resources and
initiatives in Sweden
  • Maria Toporowska Gronostaj

New Horizons for Linguistic Resources in a Global
Context 7-8th July 2009, Barcelona
The main aims of my talk
  • present a work in progress on building a free
    full-scale lexical resource, the Swedish
    FrameNet (SFN), being conducted by the
    Swedish Language Bank
  • give an overview of lexical resources
    contributing to the development of SFN
  • describe a core of SFN, the SALDO lexicon
  • reflect on merging lexical data from different
    resources to acquire information on frames

The overall objectives of the SFN
  • create a robust lexical resource aimed at LT
    applications with
  • exhaustive morphological, syntactic and semantic
    description of lexical units, incl. information
    on frames and world knowledge relevant for
    word/text understanding
  • produce it cost-effectively by merging data from
    free lexical resources and re-using free software
  • ensure its content interoperability
  • create an interactive text-lexicon block with
    morphological and semantic annotations on the fly

Content interoperability a challenge for SFN
  • The contributing lexicons are heterogeneous in
    several respects
  • have partly different types of content
  • were developed for different purposes
  • were or are developed by different groups of
  • language experts
  • a collective effort of both language engineers
    and users of web-lexicons

Free lexical resources behind SFN (1)
  • SALDO Swedish monolingual lexicon with semantic
    and morphological layers
  • 76,750 entries 74,000 distinct semantic units
  • The Swedish Associative Thesaurus by
  • L. Lönngren (1992) reincarnated by L. Borin
  • enhanced with a complete morphological
    description by L. Borin M. Forsberg
  • People's Synonym Dictionary (web-lexicon)
  • 80,000 Swedish synonym pairs
  • synonymy graded from 5 to 0 by lexicon users
  • collective effort of web-lexicon users
  • language engineering Viggo Kann

Free lexical resources behind SFN (2)
  • The People's Dictionary (Swedish/English)
  • collective effort of web-lexicon users
  • equivalents are graded by lexicon users
  • language engineering Viggo Kann
  • SemNet
  • 52,800 hyperonymy/hyponymy relations
    automatically retrieved from the definitions of
    nouns and verbs in GLDB
  • Parole/Simple lexicons
  • 29,000 syntactic units (valency) and 8,500
    semantic units encoded with mandatory information

(No Transcript)
SALDO unusual semantic network
  • Lexemes, arranged in a hierarchical network
    according to the principle of centrality,
    capturing semantic closeness between two lexemes
  • Semantic relations are postulated for both open
    and closed classes and can go beyond a word class
  • There are 51 primitive semantically unrelated
    concepts being the top nodes of the hierarchies
    capturing the centrality. These nodes are
    connected to an artificial top node PRIM to form
    a tree
  • There are no synsets in the sense of Wordnet.
    Neither glosses of the lexemes, nor semantic
    relations, such as hyponymy, hyperonymy or qualia
    relations are explicitly specified there.

Semantic centrality in SALDO
  • Each lexical unit is given
  • an obligatory main descriptor, mother, which can
    be complemented by an optional determinative
    descriptor, father
  • bröd (bread) mat mjöl (foodflour)
  • brud (bride) gifta sig hon (get marriedshe)
  • bröllop (wedding) gifta sig (get married)
  • gifta sig (get married) par (pair)

Semantic relations in SALDO
  • Mother descriptor is usually
  • semantically more close to the key word,
  • semantically and/or morphologically less complex
    than the key word
  • more frequent
  • stylistically more unmarked
  • acquired earlier in the first and second language
  • Father descriptors are used mainly to
    differentiate lexemes having the same mother.
  • They are assigned to ca 50 of words

Associative sets, assets
  • Keywords can function as mother- or father
    descriptors for other lexemes and thus form the
    basis of any number of derived relations,
    referred to as assets
  • brud (bride) get married she
  • kronbrud (crown bride) bride chastity
  • brudbukett (bride bouquet) bouquet bride
  • brudklänning (bride dress) dress bride
  • brudkrona (bride crown) crown bride
  • brudgum (bridegroom) get maried he
  • no assets

Assets sharing mother relations build natural
semantic groupings
  • sol (sun) lysa himmel (shine sky)
  • comet, moon, star (shine sky)
  • blinka (blink) lysa snabbt (shine quickly)
  • ljus (candle) lysa brinna (shine burn)
  • The lexemes sharing mother and father relations
    are closer related to each other, as compared to
    those having different father descriptors

SALDO world knowledge (1)
  • SALDO an intrinsic network capturing the world
    knowledge underlying lexical-semantic relations
  • The network relations are based on the notion of
    centrality by the depth of an entry, its
    distance down from the PRIM root node
  • The deeper an entry lies in the tree, the less
    central it is
  • PRIM
  • one
  • unit
  • two
  • pair
  • get married
  • bride
  • The average depth of entries in SALDO is 5, 7

SALDO world knowledge (2)
  • SALDO is supportive in recognizing entailments by
    pointing out the mother to a key word, which
    promotes word text understanding
  • It provides explicit information on distribution
    of the associative sets among lexemes (e.g. bride
  • It includes named entities as entries
  • Bulgakov författare rysk (writer Russian)

Approaches towards frames acquisition in SFN
  • Merging relevant lexical data from available free
    lexical resources
  • Cross-language transfer of lexical units with
    information on the frames and frame elements from
    FN to SFN
  • Automatic acquisition of frames from corpora
    using a software tool, FrameNet Labeler system
    for Swedish text

Merging lexical data with SALDO involves
  • interlinking the morphological units from the
    component lexicons (based on lemmas form, part
    of speech and inflectional patterns, whenever
  • augmenting the SALDOs lexical units with the
    semantic content from SemNet, SIMPLE, Peoples
    Synonym Dictionary and English equivalents from
    the Peoples Dictionary (Swedish/English)
  • adding syntactic information from the PAROLE
    lexicon to SALDO

Frame acquistion supported by PAROLE/SIMPLE
  • V gifta sig (to marry/get married)
  • Sub. (Anim.) V (refl.) PrepObj (Anim.) med
  • Sub. (Plural) (Anim.) V (refl.)
  • Semantic type V Cooperative activity
  • Selection restrictions Human V Human, Human V
  • HumanVCooperative activityHuman gt Partner(s)
  • In FN the Partner role is a core FE in the
  • Collaboration, Forming Relationship, Personal
  • Due to the semantic syntactic data in the P/S
    lexicon, the frame Forming Relationship is
    selected for the verb marry

Automatic acquistion of frames and FEs
  • a software tool FrameNet Labeler for Swedish
  • elaborated by R. Johansson, P. Nugues
  • trained on semantically annotated corpus,
    produced by a cross-language transfer
  • 75 accuracy in classification of FEs

Populating the frames in SFNwith lexical units
  • re-using the lexical data retrieved from corpora
    by the FrameNet labeler
  • cross-language transfer of lexical units from FN
    to SFN
  • semantic mining and refining lexical data in the
    SIMPLE lexicon
  • enhancing the repository of lexical units with
    synonyms, hyponyms and siblings

Conclusions (1)
  • Lexicons can be re-purposed and re-used for the
    task of SFN creation
  • Content integration and interoperability seems to
    be feasible to achieve
  • SFN can be augmented with
  • synsets to compensate for the lack of glosses,
    (data from Peoples Synonym Dictionary)
  • hyperonymy/hyponymy relationer from SemNet
  • world knowledge from the SALDO lexicon
  • Creation of a text-lexicon block with SALDO
    annotations on the fly is in progress

Conclusions (2)
  • Desirable further extensions of SFN
  • valency information
  • explicit semantic typing of lexical units
  • multi-word expressions
  • broader coverage of different domains
  • creation of text-lexicon block with semantic role
  • SFN will make a Swedish contribution to
    BLARK/CLARIN available under Creative Commons
    Attribute-Share Alike Licence and LGPL 3.0

  • Borin L., Forsberg M. 2009. All in the Family A
    comparison of SALDO and WordNet. Proceedings of
    the 17th Nordic Conference of Computational
    Linguistics NODALIDA 2009. Odense.
  • Johansson, R. Nugues, P. 2007. Construction of a
    FrameNet labeler for Swedish Text. NODALIDA 2007.
  • Kann, V. , Rosell, M. 2005. Free Construction of
    a Swedish Dictionary of Synonyms. NODALIDA 2005.
  • Lönngren, L. 1989. Svensk associationslexikon.
    Del /-IV Institutionen för lingvistik. Uppsala
    universitet. Rapport UCDL-R-89-1.
  • Lönngren, L. 1998. A Swedish associative
    thesaurus. In Euralex 98 proceedings, Vol.2. pp
  • SALDO http//
Write a Comment
User Comments (0)