Title: BUILDING BULGARIAN NooJ RESOURCES
1BUILDING BULGARIAN NooJ RESOURCES
- SVETLA KOEVA
- SVETLOZARA LESEVA
- BORISLAV RIZOV
2BUILDING BULGARIAN NooJ RESOURCES
- The project Automatic information extraction
based on semantic relations (RILA a bilateral
co-operation programme) - Objectives
- Reliable (exhaustive and precise) multilingual
lexical resources for a variety of purposes such
as machine translation, information extraction
and information retrieval, etc.
3BUILDING BULGARIAN NooJ RESOURCES
- Prerequisites for carrying out such task
- Large-coverage linguistic resources such as
comprehensive multilingual and monolingual
dictionaries (designed according to certain
criteria and stored in a format such as would
ensure accessibility and manageability). - Ancillary (esp. disambiguation and recognition)
resources. - An appropriate system for the storage and
management of multilingual linguistic data, as
well as the implementation of task-related
procedures.
4BUILDING BULGARIAN NooJ RESOURCES
- Methodology
- Systematization and unification of the existing
INTEX resources as well as their conversion in
compatibility with the established NooJ format. - Expansion and enhancement of the resources aiming
at ever higher precision and recall parameters. - Creation of various new resources using the
experience, resources and tools developed along
the first two lines.
5BUILDING BULGARIAN NooJ RESOURCES
- Conversion of the lexical resources in DELA
format to the .nod format - Conversion of the BGD (Bulgarian Grammar
Dictionary)1 automata underlying the DELAF
dictionaries to the .flx automata description. - Creation of automata for the existing
dictionaries of compounds since they have been
stored in DELACF format.
Koeva, S. Grammar Dictionary of Bulgarian.
Description of the concept of organization of the
linguistic data. Bulgarian Language 6, pp. 49-58
6BUILDING BULGARIAN NooJ RESOURCES
- Conversion of the INTEX graphs into the NooJ
format - Preprocessing graphs
- Compound conjunctions graphs.
- Abbreviations and elision graphs (with possible
treatment in a dictionary), etc. - Recognition graphs developed along tasks
involving automatic treatment of syntactic
phenomena.
7BUILDING BULGARIAN NooJ RESOURCES
- Expanding the compound words dictionaries with
new entries in a systematic way (covering large
and diverse areas of the lexicons inventory of
compounds). - Establishing the resources to be used
- The available specialised on-line dictionaries
- The lexical-semantic data base - the Bulgarian
WordNet. - Developing automata for the inflection types in
the established format.
8BUILDING BULGARIAN NooJ RESOURCES
- Specifics
- Restricted paradigms for certain types of
compounds (esp. domain-specific terms) pluralia
tantum, singularia tantum, count forms, plural
endings. - Invariable forms or forms that are not
established in the Bulgarian language, esp. ones
introduced in the language as transcription of
mainly English terms, etc. (hedge, swap, bear
market, bull market, etc.)
9BUILDING BULGARIAN NooJ RESOURCES
- Compounds extraction from the above mentioned
resources (enhanced complementarily) - Extraction of thematic compound dictionaries of
terms, named entities, other compound lexemes
(using semantic relations encoded in the data
base and employing inheritance to the task). - Employing NooJ as environment for compounds
extraction, processing of the obtained material
with the already designed dictionaries and
encoding of the appropriate candidates among the
unrecognized tokens.
10BUILDING BULGARIAN NooJ RESOURCES
11BUILDING BULGARIAN NooJ RESOURCES
- Dictionaries generation enhancement
- Exploring large data bases and spotting different
head words inflection types using the existing
automata - Using chiefly Bulgarian WordNet where head words
of compounds are marked unambiguously. - Using simple syntactic grammars (identifying NPs)
to spot head words in the available domain
specific dictionaries of concepts and terms (more
comprehensive with regard to the coverage of
types of inflection).
12BUILDING BULGARIAN NooJ RESOURCES
13BUILDING BULGARIAN NooJ RESOURCES
- Recognition enhancement
- Development of morphological grammars embracing
certain classes of words not present currently in
any dictionary, provided the source words are in
the dictionary - Personal feminine nouns ??????? (friend) -
????????? (girl friend) - Diminutive nouns ??????? (a small child),
??????? (a small dog), etc. - Verbal nouns, etc.
14BUILDING BULGARIAN NooJ RESOURCES
15BUILDING BULGARIAN NooJ RESOURCES
- Present day and future directions
- Information retrieval, machine translation, etc.
- Facilitating linguistic tasks by supplying the
prerequisites - large resources as input data
for the exploration of linguistic phenomena,
validation of linguistic hypotheses on language
material. - Education (facilitating the acquisition of
knowledge and skills in NLP)