BUILDING BULGARIAN NooJ RESOURCES - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

BUILDING BULGARIAN NooJ RESOURCES

Description:

... such as machine translation, information extraction and information retrieval, etc. ... Information retrieval, machine translation, etc. ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 16
Provided by: com99
Category:

less

Transcript and Presenter's Notes

Title: BUILDING BULGARIAN NooJ RESOURCES


1
BUILDING BULGARIAN NooJ RESOURCES
  • SVETLA KOEVA
  • SVETLOZARA LESEVA
  • BORISLAV RIZOV

2
BUILDING BULGARIAN NooJ RESOURCES
  • The project Automatic information extraction
    based on semantic relations (RILA a bilateral
    co-operation programme)
  • Objectives
  • Reliable (exhaustive and precise) multilingual
    lexical resources for a variety of purposes such
    as machine translation, information extraction
    and information retrieval, etc.

3
BUILDING BULGARIAN NooJ RESOURCES
  • Prerequisites for carrying out such task
  • Large-coverage linguistic resources such as
    comprehensive multilingual and monolingual
    dictionaries (designed according to certain
    criteria and stored in a format such as would
    ensure accessibility and manageability).
  • Ancillary (esp. disambiguation and recognition)
    resources.
  • An appropriate system for the storage and
    management of multilingual linguistic data, as
    well as the implementation of task-related
    procedures.

4
BUILDING BULGARIAN NooJ RESOURCES
  • Methodology
  • Systematization and unification of the existing
    INTEX resources as well as their conversion in
    compatibility with the established NooJ format.
  • Expansion and enhancement of the resources aiming
    at ever higher precision and recall parameters.
  • Creation of various new resources using the
    experience, resources and tools developed along
    the first two lines.

5
BUILDING BULGARIAN NooJ RESOURCES
  • Conversion of the lexical resources in DELA
    format to the .nod format
  • Conversion of the BGD (Bulgarian Grammar
    Dictionary)1 automata underlying the DELAF
    dictionaries to the .flx automata description.
  • Creation of automata for the existing
    dictionaries of compounds since they have been
    stored in DELACF format.

Koeva, S. Grammar Dictionary of Bulgarian.
Description of the concept of organization of the
linguistic data. Bulgarian Language 6, pp. 49-58
6
BUILDING BULGARIAN NooJ RESOURCES
  • Conversion of the INTEX graphs into the NooJ
    format
  • Preprocessing graphs
  • Compound conjunctions graphs.
  • Abbreviations and elision graphs (with possible
    treatment in a dictionary), etc.
  • Recognition graphs developed along tasks
    involving automatic treatment of syntactic
    phenomena.

7
BUILDING BULGARIAN NooJ RESOURCES
  • Expanding the compound words dictionaries with
    new entries in a systematic way (covering large
    and diverse areas of the lexicons inventory of
    compounds).
  • Establishing the resources to be used
  • The available specialised on-line dictionaries
  • The lexical-semantic data base - the Bulgarian
    WordNet.
  • Developing automata for the inflection types in
    the established format.

8
BUILDING BULGARIAN NooJ RESOURCES
  • Specifics
  • Restricted paradigms for certain types of
    compounds (esp. domain-specific terms) pluralia
    tantum, singularia tantum, count forms, plural
    endings.
  • Invariable forms or forms that are not
    established in the Bulgarian language, esp. ones
    introduced in the language as transcription of
    mainly English terms, etc. (hedge, swap, bear
    market, bull market, etc.)

9
BUILDING BULGARIAN NooJ RESOURCES
  • Compounds extraction from the above mentioned
    resources (enhanced complementarily)
  • Extraction of thematic compound dictionaries of
    terms, named entities, other compound lexemes
    (using semantic relations encoded in the data
    base and employing inheritance to the task).
  • Employing NooJ as environment for compounds
    extraction, processing of the obtained material
    with the already designed dictionaries and
    encoding of the appropriate candidates among the
    unrecognized tokens.

10
BUILDING BULGARIAN NooJ RESOURCES
11
BUILDING BULGARIAN NooJ RESOURCES
  • Dictionaries generation enhancement
  • Exploring large data bases and spotting different
    head words inflection types using the existing
    automata
  • Using chiefly Bulgarian WordNet where head words
    of compounds are marked unambiguously.
  • Using simple syntactic grammars (identifying NPs)
    to spot head words in the available domain
    specific dictionaries of concepts and terms (more
    comprehensive with regard to the coverage of
    types of inflection).

12
BUILDING BULGARIAN NooJ RESOURCES
13
BUILDING BULGARIAN NooJ RESOURCES
  • Recognition enhancement
  • Development of morphological grammars embracing
    certain classes of words not present currently in
    any dictionary, provided the source words are in
    the dictionary
  • Personal feminine nouns ??????? (friend) -
    ????????? (girl friend)
  • Diminutive nouns ??????? (a small child),
    ??????? (a small dog), etc.
  • Verbal nouns, etc.

14
BUILDING BULGARIAN NooJ RESOURCES
15
BUILDING BULGARIAN NooJ RESOURCES
  • Present day and future directions
  • Information retrieval, machine translation, etc.
  • Facilitating linguistic tasks by supplying the
    prerequisites - large resources as input data
    for the exploration of linguistic phenomena,
    validation of linguistic hypotheses on language
    material.
  • Education (facilitating the acquisition of
    knowledge and skills in NLP)
Write a Comment
User Comments (0)
About PowerShow.com