Title: National Centre for Language Technologies
1National Centre for Language Technologies
- conducts research into the processing of human
language by computers - includes speech, translation, treebanks, CALL,
software localisation and globalisation - interdisciplinary and has substantial economic
implications and potential - basic research, develops applications
2National Centre for Language Technologies
- Enterprise Ireland Grants (2 Basic Research
Grants) - Research clusters
- Research collaborations
3National Centre for Language Technologies
- Enterprise Ireland Basic Research Grant
- Deriving Linguistic Resources from Treebanks
(BRG) - Dates Oct 2001 - Sep 2004
- People Prof. J. Van Genabith, Dr A. Way,
- 2 PhD students A. Cahill, M.
McCarthy - Money Euro 130
- Overview Develop novel automatic annotation
methods for generating new linguistic resources
from treebanks.
4National Centre for Language Technologies
- Enterprise Ireland Basic Research Grant
- Deriving Linguistic Resources from Treebanks
(BRG) - Background
- Many Natural Language Processing (NLP)
applications require high quality training
corpora. - These corpora have to provide tree structures
with meaning representations (e.g. who did what
to whom). - Such corpora are difficult and time-consuming to
construct and hard to find. - Our research involves the use of a simple, yet
innovative approach to this problem.
5National Centre for Language Technologies
- Enterprise Ireland Basic Research Grant
- Deriving Linguistic Resources from Treebanks (BRG)
Wall Street Journal Corpus 50, 000 sentences 1M
words
Machine Translation
Automatically Annotated Treebank
Automatic Annotation Tool
Parsing
Semantics
(42, 000 sentences annotated)
6National Centre for Language Technologies
- Enterprise Ireland Basic Research Grant
- Deriving Linguistic Resources from Treebanks
(BRG) - Publications
- A. Cahill, M. McCarthy, J. van Genabith and A.
Way Parsing with a PCFG and Automatic
F-Structure Annotation LFG 2002, The Seventh
International Conference on Lexical-Functional
Grammar Athens, Greece, July 3-5, 2002 - A. Cahill, M. McCarthy, J. van Genabith and A.
Way Automatic Annotation of the Penn-Treebank
with LFG F-Structure Information LREC 2002,
Third International Conference on Language
Resources and Evaluation Las Palmas, Canary
Islands, Spain, 27th May - 2 June, 2002 - A. Cahill and J. van Genabith TTS - A Treebank
Tool Suite LREC 2002, Third International
Conference on Language Resources and Evaluation
Las Palmas, Canary Islands, Spain, 27th May - 2
June, 2002
7National Centre for Language Technologies
- Enterprise Ireland Basic Research Grant
- Integrating techniques from Computational
Linguistics (CL) into CALL - Dates Oct 2002 - Sep 2005
- People M. Ward, Dr A. Way,
- 2 PhD students
- Money Euro 140
- Overview
- To integrate techniques from Computational
Linguistics into CALL (which currently
under-utilises these techniques).
8National Centre for Language Technologies
- Enterprise Ireland Basic Research Grant
- Integrating techniques from CL into CALL
- CALL is a multidisciplinary domain
- linguistics, pedagogy, computing
- The project will involve
- the development of a multi-dimensional,
cross-classification model of CALL and CL - the development of a low-level CL/CALL
environment generation software system (1 PhD) - the development of a high-level CL/artificial
co-learner system for CALL (1 PhD)
9National Centre for Language Technologies
- Enterprise Ireland Basic Research Grant
- Integrating techniques from CL into CALL
CALL
Beginner
Advanced Learner
Generate CALL materials
CL
High-level CL techniques
Low-level CL techniques (part of speech tagging,
morphology)
A multi-dimensional, cross-classification matrix
10National Centre for Language Technologies
- Research clusters
- Finite State Technology (FST)
- Speech
- Machine Translation (MT)
- Corpora
- Computer Assisted Language Learning (CALL)
- Semantics
- Machine Learning
- Virtual Reality
11National Centre for Language Technologies
- Finite State Technology (FST)
- 2-level morphology for Irish (Elaine)
- uses Xerox Finite State Technology to implement a
2-level morphology of Irish for the inflected
parts of speech i.e verbs, nouns and adjectives. - E.g. analysis (raibh, bÃ) conjucation (bà -gt
all tenses, all forms) - FST chunking for English (Patricia)
- Developing a chunking grammar for unrestricted
English text using the Xerox Incremental Parser
(XIP). - Advantages
- fast
- works with real languages
- unrestricted
- input into EI CL/CALL project
12National Centre for Language Technologies
- Speech
- speech generation (Ronan)
- speaker characterisation (John, Michelle)
- modelling voice source (John)
- Irish speech synthesis group (Ronan, John,
Michelle, Monica) - multi-modal interfaces (Donal)
13National Centre for Language Technologies
- Applications of Speech Technology and Multi-modal
interfaces
Internal Format
Document
Very structured information Uses visually
impaired people, PDAs, text msgs,
mobile phones
Spoken text
14National Centre for Language Technologies
- Machine Translation (MT)
- constrained language (Sharon)
- example based machine translation (Andy, Nano)
- combination of statistical and rule-based
translation - LFG-DOT (Andy, Mary) - Forthcoming conference
- May 2003 Controlled Translation (EAMT, CLAW)
15National Centre for Language Technologies
- Corpora
- automatic treebank annotation (Mairead, Andy,
Josef) - stochastic parsing with treebank grammar (Aoife,
Andy, Josef) - translation corpora (Dorothy, Gabi, Marion)
- aligned bilingual corpora (Nano, Andy)
- e.g. Belgium
- test our translation systems with internationally
recognised reference corpora
16National Centre for Language Technologies
- CALL
- Learner Autonomy (Francoise)
- Tandem email (Christine)
- CALL and CL (Monica, Andy, Josef)
- CALL for Minority and Endangered Languages
(Monica, Andy)
17National Centre for Language Technologies
- Other areas
- Semantics
- Machine Learning
- Virtual Reality
18National Centre for Language Technologies
- Active Research Events
- Dublin Computational Linguistics Seminar Series
- held weekly with TCD and UCD (initiated by DCU 5
years ago) - Human Language Technology reading group
- informal weekly seminar series (in its 3rd year)
- Recent Awards
- 3 researchers won Albert College Fellowships