National Centre for Language Technologies - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

National Centre for Language Technologies

Description:

conducts research into the processing of human language by computers ... Human Language Technology reading group. informal weekly seminar series (in its 3rd year) ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 19
Provided by: mwa3
Category:

less

Transcript and Presenter's Notes

Title: National Centre for Language Technologies


1
National Centre for Language Technologies
  • conducts research into the processing of human
    language by computers
  • includes speech, translation, treebanks, CALL,
    software localisation and globalisation
  • interdisciplinary and has substantial economic
    implications and potential
  • basic research, develops applications

2
National Centre for Language Technologies
  • Enterprise Ireland Grants (2 Basic Research
    Grants)
  • Research clusters
  • Research collaborations

3
National Centre for Language Technologies
  • Enterprise Ireland Basic Research Grant
  • Deriving Linguistic Resources from Treebanks
    (BRG)
  • Dates Oct 2001 - Sep 2004
  • People Prof. J. Van Genabith, Dr A. Way,
  • 2 PhD students A. Cahill, M.
    McCarthy
  • Money Euro 130
  • Overview Develop novel automatic annotation
    methods for generating new linguistic resources
    from treebanks.

4
National Centre for Language Technologies
  • Enterprise Ireland Basic Research Grant
  • Deriving Linguistic Resources from Treebanks
    (BRG)
  • Background
  • Many Natural Language Processing (NLP)
    applications require high quality training
    corpora.
  • These corpora have to provide tree structures
    with meaning representations (e.g. who did what
    to whom).
  • Such corpora are difficult and time-consuming to
    construct and hard to find.
  • Our research involves the use of a simple, yet
    innovative approach to this problem.

5
National Centre for Language Technologies
  • Enterprise Ireland Basic Research Grant
  • Deriving Linguistic Resources from Treebanks (BRG)

Wall Street Journal Corpus 50, 000 sentences 1M
words
Machine Translation
Automatically Annotated Treebank
Automatic Annotation Tool
Parsing
Semantics
(42, 000 sentences annotated)
6
National Centre for Language Technologies
  • Enterprise Ireland Basic Research Grant
  • Deriving Linguistic Resources from Treebanks
    (BRG)
  • Publications
  • A. Cahill, M. McCarthy, J. van Genabith and A.
    Way Parsing with a PCFG and Automatic
    F-Structure Annotation LFG 2002, The Seventh
    International Conference on Lexical-Functional
    Grammar Athens, Greece, July 3-5, 2002
  • A. Cahill, M. McCarthy, J. van Genabith and A.
    Way Automatic Annotation of the Penn-Treebank
    with LFG F-Structure Information LREC 2002,
    Third International Conference on Language
    Resources and Evaluation Las Palmas, Canary
    Islands, Spain, 27th May - 2 June, 2002
  • A. Cahill and J. van Genabith TTS - A Treebank
    Tool Suite LREC 2002, Third International
    Conference on Language Resources and Evaluation
    Las Palmas, Canary Islands, Spain, 27th May - 2
    June, 2002

7
National Centre for Language Technologies
  • Enterprise Ireland Basic Research Grant
  • Integrating techniques from Computational
    Linguistics (CL) into CALL
  • Dates Oct 2002 - Sep 2005
  • People M. Ward, Dr A. Way,
  • 2 PhD students
  • Money Euro 140
  • Overview
  • To integrate techniques from Computational
    Linguistics into CALL (which currently
    under-utilises these techniques).

8
National Centre for Language Technologies
  • Enterprise Ireland Basic Research Grant
  • Integrating techniques from CL into CALL
  • CALL is a multidisciplinary domain
  • linguistics, pedagogy, computing
  • The project will involve
  • the development of a multi-dimensional,
    cross-classification model of CALL and CL
  • the development of a low-level CL/CALL
    environment generation software system (1 PhD)
  • the development of a high-level CL/artificial
    co-learner system for CALL (1 PhD)

9
National Centre for Language Technologies
  • Enterprise Ireland Basic Research Grant
  • Integrating techniques from CL into CALL

CALL
Beginner
Advanced Learner
Generate CALL materials
CL
High-level CL techniques
Low-level CL techniques (part of speech tagging,
morphology)
A multi-dimensional, cross-classification matrix
10
National Centre for Language Technologies
  • Research clusters
  • Finite State Technology (FST)
  • Speech
  • Machine Translation (MT)
  • Corpora
  • Computer Assisted Language Learning (CALL)
  • Semantics
  • Machine Learning
  • Virtual Reality

11
National Centre for Language Technologies
  • Finite State Technology (FST)
  • 2-level morphology for Irish (Elaine)
  • uses Xerox Finite State Technology to implement a
    2-level morphology of Irish for the inflected
    parts of speech i.e verbs, nouns and adjectives.
  • E.g. analysis (raibh, bí) conjucation (bí -gt
    all tenses, all forms)
  • FST chunking for English (Patricia)
  • Developing a chunking grammar for unrestricted
    English text using the Xerox Incremental Parser
    (XIP).
  • Advantages
  • fast
  • works with real languages
  • unrestricted
  • input into EI CL/CALL project

12
National Centre for Language Technologies
  • Speech
  • speech generation (Ronan)
  • speaker characterisation (John, Michelle)
  • modelling voice source (John)
  • Irish speech synthesis group (Ronan, John,
    Michelle, Monica)
  • multi-modal interfaces (Donal)

13
National Centre for Language Technologies
  • Applications of Speech Technology and Multi-modal
    interfaces

Internal Format
Document
Very structured information Uses visually
impaired people, PDAs, text msgs,
mobile phones
Spoken text
14
National Centre for Language Technologies
  • Machine Translation (MT)
  • constrained language (Sharon)
  • example based machine translation (Andy, Nano)
  • combination of statistical and rule-based
    translation - LFG-DOT (Andy, Mary)
  • Forthcoming conference
  • May 2003 Controlled Translation (EAMT, CLAW)

15
National Centre for Language Technologies
  • Corpora
  • automatic treebank annotation (Mairead, Andy,
    Josef)
  • stochastic parsing with treebank grammar (Aoife,
    Andy, Josef)
  • translation corpora (Dorothy, Gabi, Marion)
  • aligned bilingual corpora (Nano, Andy)
  • e.g. Belgium
  • test our translation systems with internationally
    recognised reference corpora

16
National Centre for Language Technologies
  • CALL
  • Learner Autonomy (Francoise)
  • Tandem email (Christine)
  • CALL and CL (Monica, Andy, Josef)
  • CALL for Minority and Endangered Languages
    (Monica, Andy)

17
National Centre for Language Technologies
  • Other areas
  • Semantics
  • Machine Learning
  • Virtual Reality

18
National Centre for Language Technologies
  • Active Research Events
  • Dublin Computational Linguistics Seminar Series
  • held weekly with TCD and UCD (initiated by DCU 5
    years ago)
  • Human Language Technology reading group
  • informal weekly seminar series (in its 3rd year)
  • Recent Awards
  • 3 researchers won Albert College Fellowships
Write a Comment
User Comments (0)
About PowerShow.com