Diapositive 1 - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Diapositive 1

Description:

Title: Diapositive 1 Author: UNIGE Last modified by: vseretan Created Date: 4/21/2006 1:38:23 PM Document presentation format: Custom Company: Universit de Gen ve – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 2
Provided by: UNIG3
Category:

less

Transcript and Presenter's Notes

Title: Diapositive 1


1
FipsRomanian Towards a Romanian Version of the
Fips Syntactic Parser Violeta Seretan, Eric
Wehrli, Luka Nerima, Gabriela Soare LATL
Language Technology Laboratory
violeta.seretan, eric.wehrli, luka.nerima,
gabriela.soare_at_unige.ch
Extending Fips to Romanian two main tasks
Romanian language
  • Lexicon construction
  • list of headwords (DEX, 1998)
  • morphological generation given a base word
    form, generates all its forms according to the
    appropriate inflection paradigm
  • manual and semi-automatic insertion
  • manual insertion for verbs (specific
    information subcategorization, selectional
    features, thematic function, )
  • Current status
  • simple entries
  • 60K lexemes/ 380K words
  • (10 K proper nouns)
  • complex entries multi-word expressions
    (compounds and collocations)
  • de jur împrejurul around
  • problema a se pune problem to arise
  • Grammar implementation
  • Specifications (Soare, 2005)
  • Customisation of FipsRomanian grammar for
    standard operations (syntactic transformations
    relativization, interrogation, passivization,
    ...)
  • Similarities and differences. Examples
  • clitic system
  • wh-fronting
  • Attachment rules constraints on the main parser
    operation, Merge, which combines two adjacent
    structures into a larger structure
  • Current status about 100 rules specified nearly
    half implemented and tested
  • Vocabulary
  • Latin origin (fundamental vocabulary)
  • Slavic origin
  • Neologisms French, Italian,
  • Loanwords Turkish, Greek, Hungarian, Albanian,
    ...
  • Morphology
  • Case system inherited from Latin
  • nominative-accusative, genitive-dative, vocative
  • Three grammatical genders
  • masculine, feminine, neuter
  • Rich declension of determiners, nouns,
    adjectives, and verbs
  • e.g., about 35 forms for a verb
  • The definite article is enclitic, i.e., suffixed
    to nouns and adjectives
  • casa/house casa/house-the
  • mare/big marea/big-the

Europe - Romance languages
  • Orthography
  • phonemic Latin alphabet (since 1859)
  • Diacritics a/?, â/?, î/? cedilla s/?, t/?
  • Syntax
  • VSO language, relatively free word order

FipsRomanian Sample results
Fips a multilingual parsing architecture
(Wehrli, 2007)
  • Output
  • Rich sentence representation
  • constituent structure
  • predicate-argument table
  • co-indexation chains
  • intra-sentential pronoun resolution
  • Underlying theory
  • Generative Grammar (Chomsky, 1995)
  • Similarities
  • Simpler Syntax (Culicover and Jackendoff, 2005)
  • Lexical Functional Grammar (Bresnan, 2001)

Sample parse tree produced by Fips
  • Implementation
  • Left-to-right, bottom-up tabular parsing
    algorithm, relying on detailed lexical
    information
  • Language-independent core language-specific
    implementation
  • Component Pascal, OOP paradigm, BlackBox IDE
  • Supported languages French, English, German,
    Spanish, Italian, Greek others in progress

Preliminary results
Screen captures
  • Parsing experiment
  • data journalistic texts, 1.05M words
  • average sentence length 26.9 tokens
  • 16.2 full parses (FipsFrench, FipsEnglish
    about 80)
  • average partial parses length 5.3 tokens
  • unknown words 6.5 (of which 39.2 proper
    nouns)
  • satisfactory lexical coverage
  • grammatical coverage needs to be improved (work
    in progress!)
  • Task-based evaluation
  • Collocation extraction from parsed data
    (Seretan, 2008)
  • Collocations are half idioms (of encoding, but
    not of decoding)
  • Used by parser and in-house rule-based machine
    translation system
  • Precision for top 2000 results 30.3
  • (Precision for French data 65.9, top 500
    results)

Lexicon interface
Fips interface
Sample collocations extracted
References
Related work Useful resources
  • Bresnan, J. 2001. Lexical Functional Syntax.
    Blackwell, Oxford.
  • Chomsky, N. 1995. The Minimalist Program. MIT
    Press, Cambridge, Mass.
  • Calacean, M. and J. Nivre. 2009. A data-driven
    dependency parser for Romanian. In Proceedings of
    the 7th International Workshop on Treebanks and
    Linguistic Theories (TLT 7), pages 6576,
    Groningen, Holland.
  • 1998. DEX Dictionarul explicativ al limbii
    române. Academia Româna, Bucharest.
  • Seretan, V. 2008. Collocation extraction based on
    syntactic parsing. Ph.D. thesis, University of
    Geneva.
  • Soare, G. 2005. Romanian syntax. Technical
    report, University of Geneva.
  • Wehrli, E. 2007. Fips, a deep linguistic
    multilingual parser. In ACL 2007 Workshop on Deep
    Linguistic Processing, pages 120127, Prague,
    Czech Republic.
  • Data-driven dependency parser for Romanian based
    on the MaltParser, learns dependencies from
    manual annotations (Calacean and Nivre, 2009).
    Problem reduced treebank size and grammatical
    coverage (simple structures, no subordination,
    average sentence length only 9 words).
  • Sketch Engine for Romanian shallow parsing (POS
    patterns), http//www.sketchengine.co.uk/
  • Dependency treebank construction, work in
    progress at the University of Iasi, Romania
  • Text processing webservices, RACAI Research
    Institute for Artificial Intelligence, Romanian
    Academy, Bucarest, Romania. http//www.racai.ro/we
    bservices/TextProcessing.aspx
  • A repository of tools for Romanian ConsILR -
    Consortium for the Romanian Language Resources
    Tools, research groups from Iasi, Bucarest and
    Chisinau http//consilr.info.uaic.ro/

Faculté des Lettes, Département de Linguistique
Write a Comment
User Comments (0)
About PowerShow.com