Feature Detection for Minority Languages - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Feature Detection for Minority Languages

Description:

Due to scarce resources statistical and example-based methods will ... (Frederking, Rudnicky, Hogan, Lenzo 2001) ... A.Rudnicky; C. Hogan; K. Lenzo. ( 2001) ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 13
Provided by: Kath164
Category:

less

Transcript and Presenter's Notes

Title: Feature Detection for Minority Languages


1
Feature Detection for Minority Languages
  • Katharina Probst,
  • Ralf Brown, Jaime Carbonell, Alon Lavie, Lori
    Levin, Erik Peterson
  • Language Technologies Institute
  • Carnegie Mellon University
  • Pittsburgh, PA, U.S.A.

2
Problem Statement
  • Scarce resources for minority languages
  • Bilingual text
  • Monolingual text
  • Target language grammar
  • Due to scarce resources statistical and
    example-based methods will likely not perform
    well
  • Our approach a system that elicits necessary
    information about the target language from a
    bilingual informant who is not trained in
    linguistics

3
Related Work
  • Project Boas (Sherematyeva, Nirenburg 2000)
  • Diplomat project (Frederking, Rudnicky, Hogan,
    Lenzo 2001)
  • Twisted-pair grammar (Jones, Douglas R. Havrilla
    1998)
  • Translation Templates from Examples (Güvenir,
    Cicekli 1998)
  • Inversion transduction grammar (Wu 1997)

4
Approach
  • Feature detection
  • Prediction of transfer rules
  • Seeded version space learning to refine transfer
    rules (and their coverage)
  • Run-time module

5
Architecture Diagram
SL Input
Run-Time Module
Learning Module
SL Parser
EBMT Engine
Elicitation Process
SVS Learning Process
Transfer Rules
Transfer Engine
TL Generator
User
Unifier Module
TL Output
6
Feature Detection
  • Corpus of well-chosen sentences, aimed at
    covering major linguistic phenomena
  • Organized in minimal pairs
  • Pruning of corpus is based on Implicational
    Universals
  • Minimize training time
  • Minimize informant frustration

7
Example Elicitation Corpus
  • You (John) are falling. 2nd person masc.
    subject, present tense
  • Tu (Juan) estás cayendo
  • Eimi(Kuan) tranmekeymi
  • You (Mary) are falling. 2nd person fem.
    subject, present tense
  • Tu (María) estás cayendo
  • Eimi tranmekeymi (Maria)
  • You (Mary) fell. 2nd person fem. subject, past
    tense
  • Tu (María) caiste
  • Eymi tranimi (Maria)
  •  

8
Feature Detection
  • Corpus of well-chosen sentences, aimed at
    covering major linguistic phenomena
  • Organized in minimal pairs
  • Pruning of corpus is based on Implicational
    Universals
  • Minimize training time
  • Minimize informant frustration

9
Elicitation of DataSpanish Mapudungun Example
10
Organization of Tests
Dual
Plural
Paucal
Diagnostic Tests
Obj-V Agr
Subj-V Agr


11
Conclusions
  • Automated learning of transfer rules
  • Largely language-independent framework for
    building a rule-based MT system
  • Based on the study of a diverse set of languages
  • Allows applying MT to low-density languages, as
    it works with little (albeit well-chosen) data
  • New paradigm of MT research collaboration with
    indigenous communities

12
References
  • Frederking, Robert A.Rudnicky C. Hogan K.
    Lenzo. (2001). Interactive Speech Translation in
    the DIPLOMAT Project. In Machine Translation
    Journal, Special issue on Spoken Language
    Translation.
  • Guvenir, H.A. H. Altay I. Cicekli. (1998).
    Learning Translation Templates from Examples.
    Information Systems, 23(6)353--363.
  • Jones, Douglas R. Havrilla. (1998). Twisted Pair
    Grammar Support for Rapid Development of Machine
    Translation for Low Density Languages. In
    Proceedings of AMTA.
  • Sherematyeva, Svetlana S. Nirenburg.
    (2000).Towards a Universal Tool For NLP Resource
    Acquisition. In Proceedings of The Second
    International Conference on Language Resources
    and Evaluation (Greece, Athens, May 31 -June 3,
    2000).
  • Wu, D. (1997) Stochastic inversion transduction
    grammars and bilingual parsing of parallel
    corpora. Computational Linguistics, 23 (3),
    377-403.
Write a Comment
User Comments (0)
About PowerShow.com