Multilingual Information Processing - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Multilingual Information Processing

Description:

Glossary- Based. Example- Based. Statistical. Mikrokosmos. Technologies. Applications. Techniques ... Develop an environment in which a team of human and ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 49
Provided by: fres2
Category:

less

Transcript and Presenter's Notes

Title: Multilingual Information Processing


1
Multilingual Information Processing at NMSU CRL
Problems, Technologies, Applications, Tools,
Resources Sergei Nirenburg Director,
CRL sergei_at_crl.nsmu.edu http//crl.nmsu.edu Sept
ember 18, 1998 Purdue University
2
Http//crl.nmsu.edu
3
(No Transcript)
4
Interlingua
Mikrokosmos
Rule-
Based
Transfer
Machine
Translation
Glossary-
Based
Example-
Based
Corpus-
Based
Statistical
HAMT
Machine-
Aided
Translation
MAHT
Japanese, Persian, Russian, Serbo-Croatian,
Spanish,Turkish
5
  • Tools
  • control architectures
  • document managers
  • text corpus tools
  • resource developer tools
  • interactive knowledge elicitation systems
  • end-user GUIs
  • Unicode support

6
Engines
  • Text tokenizers and segmentors
  • morphological analyzers
  • syntactic analyzers
  • semantic analyzers
  • pragmatics/discourse analyzers
  • MT transfer modules
  • IR modules
  • IE modules
  • text summarizers
  • text generators

7
Project Savona Objective
  • Develop an environment in which a team of human
    and software agents produces hypertext reports
    about emerging political and military crises,
    based on both external sources and a
    system-internal archival fact database

8
Software Agents
  • The current configuration includes (engines used
    as software agents are listed in parentheses
    those listed in red are available to Onyx from
    CRL)
  • Information retrieval (BRS, URSA)
  • Information extraction (Cervantes)
  • Text summarization (HyperGen)
  • Text translation (Oleada, Corelli, Mikrokosmos)
  • Planning control architecture (Hunter-Gatherer)

9
Human Agents
  • Project leader
  • Analyst
  • Translator
  • Profiles of agents are available from the
    systems fact DB. Team selection is assisted by
    the planning software agent

10
This and the following screens demonstrate a
sequence of interactions and operations in the
process of developing a crisis report. Lt.Col.
Franklin is the project leader
Savona
11
Savona
12
Savona
13
Savona
The Savona team consists of analysts who are
experts in particular areas contributing to the
overall crisis type and place. They are led by a
project leader (who assembles the expert team in
the first place) and are supported by software
agents and a variety of online resources,
including the Shared Working Memory. The human
agents jointly produce elements of the output
report. The operation of the joint human-software
agent team is supported by a set of standard
workflow scripts which allow every team member to
know exactly what is expected of him or her, and
the project leader, to manage the overall report
production process.
14
Analysts (a subset of human agents) operate with
the help of a set of prepackaged data templates
built for each particular application domain.
These can be edited by the project leader at
early stages of a project. An analyst interacts
with other human agents (for example,
translators) as well as software agents
Savona
15
Savona
16
Savona
17
This is a standard information template about
anti-government demonstrations, an instance of
which will be filled by a project member with the
help of software agents and other team members.
Savona
18
The search engine found a number of URLs relevant
to the query specified in the (partially) filled
template.
Savona
19
One of these URLs is visited.
Savona
20
Another Savona template
Savona
21
A sample output from Savona, the short version
Savona
22
The URL mentioned in the short version of the
report.
Savona
23
Savona
24
The front end for the long version of the output
from Savona
Savona
25
An advanced browser and editor for the
ontological knowledge base is one of the major
contributions of Savona
Savona
26
A top level partial view of the taxonomy of
events used in the proof of concept demonstration
of Savona --- a report about an anti-government
demonstration in Hong Kong
Savona
27
The types of complex events used in the
preparation of the report in the Savona proof of
concept
Savona
28
Detail of the complex event Anti-government
demonstration
Savona
29
The Savona knowledge base includes profiles of
human agents to facilitate the selection of the
most appropriate project team by the project
leader. This process is supported by an automatic
planner.
Savona
30
The selection of the most appropriate team is
carried out with the help of the team member
requirements profile
Savona
31
Ontologies Definition
  • We view ontologies as collections of
    language-neutral concepts describing habitual and
    potential states of affairs in a world.
  • In applications, ontologies are are used together
    with episodic memory of actual states of affairs
    text meaning repre-sentation is a kind of
    episodic memory.

32
Ontologies Definition
  • In NLP, ontologies and text meaning
    representations are connected through the
    analyzer and generator engines which use a
    lexicon as the central mediating knowledge source

33
Ontology Major Problems
  • Metalanguage and formalism
  • Grain size of description in terms of
  • number of properties describing concepts
  • size of their value sets
  • Coverage
  • Ability to capture meaning across languages

34
Ontology Main Concern
  • In NLP applications, the main concern of an
    ontologist is content, not the formal properties
    of the ontology.
  • Motivation NL texts are full of logical and
    factual contradictions. It is more realistic to
    learn to work with an inconsistent ontology than
    to try to maintain large and fast-growing
    ontologies consistent.

35
cook, prepare a meal, fix a meal
âàðèòü, ãîòîâèòü
guisar, cocer, cocinar
marinate
bake
prepare-food
subclasses
marinate
ïåü
ìàðèíîâàòü
is-a
subclasses
cocer al horno
fry
marinar
bake
æàðèòü
fry
freir
location
agent
instrument
theme
bakery
bakery
baking-pan
ïåêàðíÿ
baker
baked-food
panaderia
form, pan
baker
oven
ëèñò
ïåêàðü
cazuela, cacerola
oven
panadero
subclasses
äóõîâêà
horno
cake
cake
pie
òîðò
pie
contains
bread
pastel, tarta, dulce
ïèðîã
pastel, tarta
meat
bread
empanada
õëåá
pan
36
URSA
CORELLI
MINDS
OLEADA
37
The Process
English Keywords
Cross-language WWW Retrieval
URSA
Summarization
MINDS
Translation
CORELLI
English Summaries
38
URSA
Afrikaans Dutch Finnish Czech Hungarian Portuguese
Bahasa Indonesia Russian Japanese
  • Cross-Language Text Retrieval
  • English queries against foreign language document
    collections
  • Demonstration system MUNDIAL
  • Rapid (2 hours) addition of WWW dictionaries

39
(No Transcript)
40
Document
41
Summary
42
(No Transcript)
43
Oleada
An Integrated Multilingual Software System for
Language Analysts, Instructors, and Learners
  • Goal
  • Identify and design useful computer-based aids
    using multilingual text processing technologies
  • Human Factors Methodology
  • Understand the user through user-protocol
    task-analysis
  • Involve the user in system design through
    participatory prototyping
  • Test the system using formative evaluations

44
Corpus Analysis
Powerful search and contextual displays
45
Aligned Text
Easy to create and find translation examples
46
On-line Dictionaries
Morphological and fuzzy search formatted
naturally
47
Chinese Segmentation
Natural language processing results are useful
for learners
48
Annotations
Text processing results made accessible
Write a Comment
User Comments (0)
About PowerShow.com