A renewed Portuguese module for INTEX 4.3x PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: A renewed Portuguese module for INTEX 4.3x


1
A renewed Portuguese module for INTEX 4.3x
  • Cristina Mota
  • LabEL (CAUTL/IST) and Linguateca
  • Av. Rovisco Pais I
  • 1049-001 Lisboa, Portugal
  • cristina_at_label.ist.utl.pt

6th Intex WorkshopSofia, Bulgaria - May 28-30
2
Overview
  • The aim of this presentation is to show how the
    new features of Intex 4.3x helped in the
    representation and treatment of Portuguese
    specific problems.
  • The analysis of modified verbal and clitic forms.
    Example
  • Nós comprámos um livro (We bought a book)
  • Nós comprámo-lo (We bought it)
  • Nós comprámos-o

3
Analysis of diminutive, augmentative and
superlative forms
Nouns and adjectives in Portuguese vary in gender
and number. Besides receiving the gender and
number morphemes they also accept diminutive,
augmentative and superlative (only the
adjectives) suffixes.
4
Analysis of diminutive, augmentative and
superlative forms
Representation by Inflectional Graphs Prior to
the new morphological parser, the only way of
recognizing nouns and adjectives accepting grade
variation was by introducing a code in the DELAS
entries that allowed the generation of the
corresponding diminutive, augmentative and
superlative forms in the DELAF dictionary.
coelho rabbit ? coelhinho, coelhinha,
coelhinho, coelhinha
coelho,N001_dh001_dt001
coelhinho,coelho.Nms coelhinha,coelho.Nfs coel
hinhos,coelho.Nmp coelhinhas,coelho.Nfp
5
Analysis of diminutive, augmentative and
superlative forms
The Problematic cases In Portuguese, words may
have one of four accents acute (á, é, í, ó, ú),
grave (à), circumflex (â, ê, ô) and tilde (ã, õ).
There are a few words with two accents an acute
accent and a tilde.
Even though all these diminutive words are formed
by adding the suffix zinha, the base forms
should have different inflectional codes, using
this first approach, increasing the number of
inflectional graphs. In order to keep the same
code, these forms are generated with accents and
then a AWK script removes them obtaining the
final DELAF.
6
Analysis of diminutive, augmentative and
superlative forms
Representation by Derivational Graphs The new
morphological parser of INTEX 4.3x makes possible
the representation of the accent deletion process.
7
Analysis of diminutive, augmentative and
superlative forms
Misleading Results Laginha,Laga,laga.Nfsinha,.
SUFDimfs Laginha is a proper
name trocinhos,trocos,troco.Nmpinhos,.SUFDim
mp troquinhos is the diminutive of trocos
not trocinhos
8
Analysis of diminutive, augmentative and
superlative forms
Advantages Disadvantages
Inflectional Graphs Simple representation Fast lookup Accurate Final DELAF is created with AWK script For each different combination of possible diminutive, augmentative and superlative a new graph has to be created (N001_dh001_dt001_ss001, N001_dh006_dt006, etc..)
Derivational Graphs Smaller DELAF Higher recall Less graphs Complex representation Slow lookup
?
Solution A Remove diminutives, augmentatives and
superlatives from the DELAF. Since they can be
homographs of other words, the derivational
graphs will be very restrictive and used with
normal priority.
Solution B Keep diminutives, augmentatives and
superlatives in the DELAF. The derivational
graphs will be more flexible and conceived in a
way they can help easily enlarging the DELAS.
They will be used with low priority.
9
Verb-Clitc Analysis
  • When the clitic pronouns o (3ms), a (3fs), os
    (3mp), as (3fp) are after the verbal form, bound
    to it by an hyphen, they may have undergone
    formal modifications, depending on the verbal
    form termination. Thus, if the termination is
  • a vowel or an oral diphthong, the clitic forms do
    not undergo any modifications o, a, os, as
  • a nasal diphthong, the clitic forms change to
    no, na, nos, nas
  • -r, -s or -z, the clitic forms change to lo, la,
    los, las. In this context, the verbal forms are
    also modified, loosing the final consonant. The
    vowel preceding the -r forms, will receive an
    accent (acute or circumflex depending on the
    thematic vowel of the verb).

No modification Ele comprou um livro ontem He
bought a book yesterday Ele comprou-o
ontem He bought-it yesterday
Clitic modification Eles compraram um livro
ontem They bought a book yesterday Eles
compraram-no ontem They bougth-it yesterday
Verbal form and Clitic modification Nós
comprámos um livro ontem We bought a book
yesterday Nós comprámo(s)-lo ontem We
bought-it yesterday
10
Verb-Clitc Analysis
In the presence of reflexive and dative pronouns
nos (1p) and vos (2p), the first and second
plural verbal forms ending in -s are modified.
The clitics do not suffer modifications.
Verbal form modification Nós vestimo(s)-nos We
dressed ourselves
The modified verbal and clitic forms are
described in the inflectional graphs and
consequently are generated simultaneously with
the non-modified forms when the DELAF is created.
However, the Intex 4.2x DELAF version did not
have information about the clitics that allowed
to (i) distinguish the two forms and (ii)
guaranty the correct combination of the verbal
form with the clitic.
11
Verb-Clitc Analysis
In the new module, it was integrated information
about clitics to the verbal and to the acusative,
dative and reflexive clitic forms. The possible
clitic codes are
i the verbal form may occur without clitics c
the clitic is not modified and does not modify
the verbal form o clitic forms o, a, os , as l
clitic forms lo, la, los , las n clitic forms
no, na, nos , nas q clitic may modify verbal
form (nos and vos)
This information is enclosed between square
brackets in the inflectional features field
compro,comprar.VP1sicqo compras,comprar.VP2si
cq compra,comprar.VP2slP4sicqoP3sicqoY2
sicqo compramos,comprar.VP1pic compramo,compr
ar.VP1pql
Form occurs with clitic c or without clitic (i)
os,eu.PRO4mpo3mpo los,eu.PRO4mpl3mpl n
os,eu.PRO1pq4mpn3mpn te,eu.PRO2sc
Form occurs only with clitics q and l
12
Verb-Clitc Analysis
The introduction of the clitic codes allows to
disambiguate verb-clitic combinations.
13
Verb-Clitc Analysis
The clitic codes can also be used in syntactic
transformations to obtain the correct forms of
the verbs and clitics.
14
The Portuguese 4.3x module http//label.ist.utl.
pt/public-resources.html
DELAS / DELAF Enhanced with clitic information
  • Inflectional Graphs
  • Nouns, Adjectives
  • Verbs
  • Pronouns
  • Determiners, Conjunctions, Prepositions, Adverbs
  • DELAC / DELACF
  • Nouns
  • Adverbs
  • Prepositions
  • Conjunctions
  • Derivational Graphs
  • Superlative
  • Augmentative
  • Superlative
  • Other productive processes
  • Lexical graphs
  • Roman numerals
  • Cardinal numerals
  • Ordinal numerals

Acronym dictionaries (and corresponding
description dictionary)
  • Local Grammars
  • Auxiliary Verb Tagging
  • Temporal Expressions
  • Numeric Expressions
  • Disambiguation Grammars
  • NP containing Adjectives
  • Verb-Clitic sequences

15
Productive Derivational Creation
The first steps towards a description of
productive derivational processes are also being
given. The main goal is to analyze unknown words
and help in the enhancement of the DELAF.
16
Productive Derivational Creation
Remarks Even though the graph seems very
productive, it should be stressed that it is not
meant to be an alternative to not including, for
instance, nouns resulting from nominalizations,
in the DELAS. If it was the case, the graph
should be more restrictive ltverboir.VNominaliz
ationWgt ção,.SUFfs and the verb entries
should account for the possibility of the
nominalization construir,VNominalization Anyway
it is important to relate the two entries (the
verb and the noun) by adding the corresponding
information to the entries construir,V_N2ção c
onstrução,N_V3ir The introduction of this type
of information will be one of our major concerns.
Write a Comment
User Comments (0)
About PowerShow.com