Title: NICE: Native language Interpretation and Communication Environment
1NICE Native language Interpretation and
Communication Environment
- Jaime Carbonell, Lori Levin, Alon Lavie, Language
Technologies Institute - Carnegie Mellon University
- jgc, lsl, alavie_at_cs.cmu.edu
2Machine Translation of Indigenous Languages
- Policy makers have access to information about
indigenous people. - Epidemics, crop failures, etc.
- Indigenous people can participate in
- Health care
- Education
- Government
- Internet
- without giving up their languages.
3History of NICE
- Arose from a series of joint workshops of NSF and
OAS. - Workshop recommendations
- Create multinational projects using information
technology to - provide immediate benefits to governments and
citizens - develop critical infrastructure for communication
and collaborative research - training researchers and engineers
- advancing science and technology
4Architecture Diagram
SL Input
Run-Time Module
Learning Module
SL Parser
EBMT Engine
Elicitation Process
Learning Process
Transfer Rules
Transfer Engine
TL Generator
User
Unifier Module
TL Output
5EBMT Example
English I would like to meet
her. Mapudungun Ayükefun trawüael fey
engu.
English The tallest man is my
father. Mapudungun Chi doy fütra chi wentru
fey ta inche ñi chaw.
English I would like to meet the
tallest man Mapudungun (new)
Ayükefun trawüael Chi doy fütra chi
wentru Mapudungun (correct) Ayüken ñi
trawüael chi doy fütra wentruengu.
6NICE Partners
7Agreement Between LTI and Institute of Indigenous
Studies (IEI), Universidad De La Frontera, Chile
- Contributions of IEI
- Native language knowledge and linguistic
expertise in Mapudungun - Experience in bicultural, bilingual education
- Data collection recording, transcribing,
translating - Orthographic normalization of Mapudungun
8Agreement between LTI and Institute of Indigenous
Studies (IEI), Universidad de la Frontera, Chile
- Contributions of LTI
- Develop MT technology for indigenous languages
- Training for data collection and transcription
- Partial support for data collection effort
pending funding from Chilean Ministry of
Education - International coordination, technical and project
management
9LTI/IEI Agreement
- Continue collaboration on data collection and
machine translation technology. - Pursue focused areas of mutual interest, such as
bilingual education. - Seek additional funding sources in Chile and the
US.
10The IEI Team
- Coordinator (leader of a bilingual and
multicultural education project) - Eliseo Canulef
- Distinguished native speaker
- Rosendo Huisca
- Linguists (one native speaker, one near-native)
- Juan Hector Painequeo
- Hugo Carrasco
- Typists/Transcribers
- Recording assistants
- Translators
- Native speaker linguistic informants
11MINEDUC/IEIAgreement Highlights
- Based on the LTI/IEI agreement, the Chilean
Ministry of Education agreed to fund the data
collection and processing team for the year 2001.
This agreement will be renewed each year, as
needed.
12MINEDUC/IEI AgreementObjectives
- To evaluate the NICE/Mapudungun proposal for
orthography and spelling - To collect an oral corpus that represent the four
Mapudungun dialects spoken in Chile. The main
domain is primary health, traditional and western.
13MINEDUC/IEI AgreementDeliverables
- An oral corpus of 800 hours recorded,
proportional to the demography of each current
spoken dialect - 120 hours transcribed and translated from
Mapudungun to Spanish - A refined proposal for writing Mapudungun
14Nice/MapudungunDatabase
- Writing conventions (Grafemario)
- Glossary Mapudungun/Spanish
- Bilingual newspaper, 4 issues
- Ultimas Familias memoirs
- Memorias de Pascual Coña
- Publishable product with new Spanish translation
- 35 hours transcribed speech
- 80 hours recorded speech
15NICE/MapudungunOther Products
- Standardization of orthography Linguists at UFRO
have evaluated the competing orthographies for
Mapudungun and written a report detailing their
recommendations for a standardized orthography
for NICE. - Training for spoken language collection In
January 2001 native speakers of Mapudungun were
trained in the recording and transcription of
spoken data.
16Underfunded Activities
- Data collection
- Colombia (unfunded)
- Chile (partially funded)
- Travel
- More contact between CMU and Chile (UFRO) and
Colombia. - Training
- Train Mapuche linguists in language technologies
at CMU. - Extend training to Colombia
- Refine MT system for Mapudungun and Siona
- Current funding covers research on the MT engine
and data collection, but not detailed linguistic
analysis