Title: Multilingual Information Processing
1Multilingual Information Processing at NMSU CRL
Problems, Technologies, Applications, Tools,
Resources Sergei Nirenburg Director,
CRL sergei_at_crl.nsmu.edu http//crl.nmsu.edu Sept
ember 18, 1998 Purdue University
2Http//crl.nmsu.edu
3(No Transcript)
4Interlingua
Mikrokosmos
Rule-
Based
Transfer
Machine
Translation
Glossary-
Based
Example-
Based
Corpus-
Based
Statistical
HAMT
Machine-
Aided
Translation
MAHT
Japanese, Persian, Russian, Serbo-Croatian,
Spanish,Turkish
5- Tools
- control architectures
- document managers
- text corpus tools
- resource developer tools
- interactive knowledge elicitation systems
- end-user GUIs
- Unicode support
6Engines
- Text tokenizers and segmentors
- morphological analyzers
- syntactic analyzers
- semantic analyzers
- pragmatics/discourse analyzers
- MT transfer modules
- IR modules
- IE modules
- text summarizers
- text generators
7Project Savona Objective
- Develop an environment in which a team of human
and software agents produces hypertext reports
about emerging political and military crises,
based on both external sources and a
system-internal archival fact database
8 Software Agents
- The current configuration includes (engines used
as software agents are listed in parentheses
those listed in red are available to Onyx from
CRL) - Information retrieval (BRS, URSA)
- Information extraction (Cervantes)
- Text summarization (HyperGen)
- Text translation (Oleada, Corelli, Mikrokosmos)
- Planning control architecture (Hunter-Gatherer)
9Human Agents
- Project leader
- Analyst
- Translator
- Profiles of agents are available from the
systems fact DB. Team selection is assisted by
the planning software agent
10This and the following screens demonstrate a
sequence of interactions and operations in the
process of developing a crisis report. Lt.Col.
Franklin is the project leader
Savona
11Savona
12Savona
13Savona
The Savona team consists of analysts who are
experts in particular areas contributing to the
overall crisis type and place. They are led by a
project leader (who assembles the expert team in
the first place) and are supported by software
agents and a variety of online resources,
including the Shared Working Memory. The human
agents jointly produce elements of the output
report. The operation of the joint human-software
agent team is supported by a set of standard
workflow scripts which allow every team member to
know exactly what is expected of him or her, and
the project leader, to manage the overall report
production process.
14Analysts (a subset of human agents) operate with
the help of a set of prepackaged data templates
built for each particular application domain.
These can be edited by the project leader at
early stages of a project. An analyst interacts
with other human agents (for example,
translators) as well as software agents
Savona
15Savona
16Savona
17This is a standard information template about
anti-government demonstrations, an instance of
which will be filled by a project member with the
help of software agents and other team members.
Savona
18The search engine found a number of URLs relevant
to the query specified in the (partially) filled
template.
Savona
19One of these URLs is visited.
Savona
20Another Savona template
Savona
21A sample output from Savona, the short version
Savona
22The URL mentioned in the short version of the
report.
Savona
23Savona
24The front end for the long version of the output
from Savona
Savona
25An advanced browser and editor for the
ontological knowledge base is one of the major
contributions of Savona
Savona
26A top level partial view of the taxonomy of
events used in the proof of concept demonstration
of Savona --- a report about an anti-government
demonstration in Hong Kong
Savona
27The types of complex events used in the
preparation of the report in the Savona proof of
concept
Savona
28Detail of the complex event Anti-government
demonstration
Savona
29The Savona knowledge base includes profiles of
human agents to facilitate the selection of the
most appropriate project team by the project
leader. This process is supported by an automatic
planner.
Savona
30The selection of the most appropriate team is
carried out with the help of the team member
requirements profile
Savona
31 Ontologies Definition
- We view ontologies as collections of
language-neutral concepts describing habitual and
potential states of affairs in a world. - In applications, ontologies are are used together
with episodic memory of actual states of affairs
text meaning repre-sentation is a kind of
episodic memory.
32Ontologies Definition
- In NLP, ontologies and text meaning
representations are connected through the
analyzer and generator engines which use a
lexicon as the central mediating knowledge source
33Ontology Major Problems
- Metalanguage and formalism
- Grain size of description in terms of
- number of properties describing concepts
- size of their value sets
- Coverage
- Ability to capture meaning across languages
34Ontology Main Concern
- In NLP applications, the main concern of an
ontologist is content, not the formal properties
of the ontology. - Motivation NL texts are full of logical and
factual contradictions. It is more realistic to
learn to work with an inconsistent ontology than
to try to maintain large and fast-growing
ontologies consistent.
35cook, prepare a meal, fix a meal
âà ðèòü, ãîòîâèòü
guisar, cocer, cocinar
marinate
bake
prepare-food
subclasses
marinate
ïåü
ìà ðèÃîâà òü
is-a
subclasses
cocer al horno
fry
marinar
bake
æà ðèòü
fry
freir
location
agent
instrument
theme
bakery
bakery
baking-pan
ïåêà ðÃÿ
baker
baked-food
panaderia
form, pan
baker
oven
ëèñò
ïåêà ðü
cazuela, cacerola
oven
panadero
subclasses
äóõîâêÃ
horno
cake
cake
pie
òîðò
pie
contains
bread
pastel, tarta, dulce
ïèðîã
pastel, tarta
meat
bread
empanada
õëåá
pan
36URSA
CORELLI
MINDS
OLEADA
37The Process
English Keywords
Cross-language WWW Retrieval
URSA
Summarization
MINDS
Translation
CORELLI
English Summaries
38URSA
Afrikaans Dutch Finnish Czech Hungarian Portuguese
Bahasa Indonesia Russian Japanese
- Cross-Language Text Retrieval
- English queries against foreign language document
collections - Demonstration system MUNDIAL
- Rapid (2 hours) addition of WWW dictionaries
39(No Transcript)
40Document
41Summary
42(No Transcript)
43Oleada
An Integrated Multilingual Software System for
Language Analysts, Instructors, and Learners
- Goal
- Identify and design useful computer-based aids
using multilingual text processing technologies - Human Factors Methodology
- Understand the user through user-protocol
task-analysis - Involve the user in system design through
participatory prototyping - Test the system using formative evaluations
44Corpus Analysis
Powerful search and contextual displays
45Aligned Text
Easy to create and find translation examples
46On-line Dictionaries
Morphological and fuzzy search formatted
naturally
47Chinese Segmentation
Natural language processing results are useful
for learners
48Annotations
Text processing results made accessible