Bulgarian WordNet - PowerPoint PPT Presentation

About This Presentation
Title:

Bulgarian WordNet

Description:

The BalkaNet project (Multilingual Semantic Network for the Balkan Languages) ... Facultative empty tags. Duplicated literals in a synset. Sense numbers ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 20
Provided by: svetla8
Category:

less

Transcript and Presenter's Notes

Title: Bulgarian WordNet


1
Bulgarian WordNet
  • Svetla Koeva
  • Institute for Bulgarian Language
  • Bulgarian Academy of Sciences

2
Bulgarian WordNet
  • The Bulgarian WordNet (BulNet) has been under
    development for two years within the framework of
    the BalkaNet project.
  • The BalkaNet project (Multilingual Semantic
    Network for the Balkan Languages), aims to
    develop a multilingual resource representing
    semantic relationships in five Balkan languages
    (Bulgarian, Greek, Serbian, Romanian and
    Turkish).
  • Each set of synonymous words in a given language
    is linked to the closest set in the Princeton
    WordNet2.0 via its ID number.

3
BulNet DCMB team
  • The partners from Bulgarian site are Bulgarian
    Academy of Sciences and Plovdiv University.
  • The Bulgarian WordNet is being developed by the
    Department of Computer Modeling of Bulgarian
    Language within the Institute for Bulgarian
    language - Bulgarian Academy of Sciences.
    http//ibl.bas.bg/departments_en6.htm
  • The DCMB BulNet team consists of small group of
    researchers linguists, computational linguists,
    logicians and mathematicians.

4
BulNet current state
  • The Bulgarian WordNet models nouns, verbs, and
    adjectives, and contains already 17 291 word
    senses (towards 20.01.2003), where 31 164
    literals have been included (the ratio is 1.8).
  • The distribution of synsets into parts of speech
  • Nouns 12 223 synsets
  • Verbs 3 408 synsets
  • Adjectives 1 656 synsets
  • Adverbs 4 synsets

5
BulNet current state
6
Completeness
  • Presence of all members from the chosen up to now
    Base Concepts within the framework of the
    BalkaNet project.
  • Base Concepts 1 (1218 members)
  • BC2 (3471 members)
  • BC3 (4855 members)
  • Lack of any "dangling relations"
  • Lack of any gaps
  • Presence of an appropriate interpretation
    definition for each synset

7
Consistency
  • The are no duplicated literals in a given synset.
  • There are no identical or almost identical
    glosses of different synsets.
  • There are no literals that coincide with their
    glosses.
  • There are no duplicated relations between two
    synsets.
  • Every difference in relations according to EWN is
    language specific and linguistically grounded.
  • There are no hypernym cycles, as well as any
    relation loops inside BulNet.

8
Main achievements
  • Theoretical linguistic work
  • Validation tests
  • Dependencies between relations
  • Combination of Bulgarian language resources
  • Descriptive logic
  • Design and development of tools
  • WordNet Explorer
  • WordNet Validator

9
Validation tests
  • Our approach to validation of WordNets includes
    three separate levels
  • Checking the syntax of the XML files
  • Completeness checking of WordNets
  • Checking for consistency in defining the semantic
    relations and glosses.
  • Every level is distinguished with
  • Different degrees of complexity and significance
  • Different possibilities for automatic data
    correction

10
Validation tests
  • The lowest level, which is also the easiest for
    processing and correction, is XML fails syntax.
  • In the following cases automatic checking as well
    as automatic data correction is possible
  • Facultative empty tags
  • Duplicated literals in a synset
  • Sense numbers

11
Validation tests
  • In other cases where automatic correction is
    possible manual confirmation of replacements is
    necessary
  • Accepted ID standard
  • Missing values of the obligatory tags
  • Corespondence of BCS tags
  • At least one literal in a synset

12
Validation tests
  • In some cases only validation is possible
  • No duplicated ltIDgt numbers
  • No duplicated relations between two synsets
  • No gaps
  • No dangling relations
  • No loops

13
Relations dependencies
  • Description of the dependencies between the
    relations
  • Hyponyms of two antonyms (nouns) should also be
    antonyms (woman man female actor actor)
  • Antonyms (nouns) should have equivalent
    holo_parts woman - arm, head man arm, head.
  • Hyponym should have the same mero_parts (for
    concrete nouns as its hypernym (man head,
    arm, woman head, arm, ..)
  • Collective nouns that are holo/mero_members
    should share the same hypernym, not necessarily
    the immediate one (football team is an
    organization, as well as football league)
  • Nouns that are holo/mero_portions should share
    the same hypernym, not necessarily the immediate
    one (coffee substance caffeine - substance)

14
Combining language resources
  • Three large Bulgarian resources
  • BulNet
  • Bulgarian Syntax Dictionary encoding the
    arguments of the verbs and their semantic
    features
  • Bulgarian Grammatical Dictionary encoding over
    83 000 lemmas are their corresponding word forms
  • Mutual supplement
  • Expansion of the resources
  • Validation of the resources
  • Uniform grammatical characteristics

15
WordNet logic
  • The DCMB team developed a uniform, efficient and
    powerful utility system for querying and
    exploring of WordNet WordNet logic.
  • Tailored for the WordNet developers needs
  • Powerful enough for expressing complex statements
    and queries
  • Fully decidable
  • The formal background consists of WordNet
    Structure, WN Language, WN Semantics,WN Logic and
    WN Logic theorems.
  • Tinko Tinchev, Stoyan Mihov, Svetla Koeva, Angel
    Genov Logic for WordNet, Annual Journal of Sofia
    University, 2003

16
WordNet Validator
  • The WordNet Validator (WNV) is a Web-based system
    for validation (and correction) of WordNets
    completeness and consistency
  • The WordNet Validator has the following main
    functions automatic correction of xml syntax,
    validation of WordNet completeness and
    consistency, search for a given synset and
    visualization of semantic trees.
  • The WordNet Validator can be used for practical
    work during constructing monolingual WordNets of
    Balkan languages as well as for evaluation of the
    completeness and consistency of different
    WordNet.

17

18

19
  • Future directions
Write a Comment
User Comments (0)
About PowerShow.com