VisionLanguage Integration in AI: - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

VisionLanguage Integration in AI:

Description:

From technical integration of modalities. multimodal meaning integration ... a) Intensional Definition (what the term is e.g. its genus et differentia) ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 18
Provided by: kate212
Category:

less

Transcript and Presenter's Notes

Title: VisionLanguage Integration in AI:


1
Vision-Language Integration in AI a reality check
Katerina Pastra and Yorick Wilks
Department of Computer Science,
Natural Language Processing Group, University of
Sheffield, U.K.
2
Setting the context
Artificial Intelligence From technical
integration of modalities
? multimodal meaning integration From
Multimedia ? Intellimedia Intelligent
Interfaces Purpose intelligent, natural,
coherent communication
  • We focus on
  • vision and language integration
  • Visual modalities images
  • (visual perception and/or visualisation
    representations
  • physically realised as e.g. 2D/3D graphics,
    photos)
  • Linguistic modalities text and/or speech

3
The problem
? Multimodal Integration an old AI aspiration
(cf. Kirsch 1964) ? A wide variety of V-L
integration prototypes in AI
  • What is computational V-L integration?
    (definition)
  • How is it achieved computationally?
  • (state of the art, practices, tendencies,
    needs)
  • How far can we go?
  • (implementation suggestions, the VLEMA
    prototype)

4
In search of a definition
? Defining computational V-L Integration
could a review of related applied AI research
hold the answer ?
  • Related work
  • Srihari 1994 review of V-L integration
    prototypes
  • ? limited number of prototypes reviewed
  • ? suggestions and implementations are mixed
  • ? no clear focus on how integration is
    achieved
  • ? system classification according to input
    type
  • ? includes cases of quasi-integration

5
The notion of quasi-integration
? Quasi-integration fusion of results
obtained by modality-dependent processes (
intersection or combination of results, or even
the results of
one process constrain the search space
for another)
6
Defining integration through classification
  • Main criterion for considering a prototype for
  • review V-L integration to be essential for
    the task
  • the prototype is built for.

Specifics of the review
? It is diachronic from SHRDLU (Winograd 72)
to conversational robots of the new millennium
(e.g. Shapiro and Ismail 2003, Roy et al.
2003)
? It crosses over into diverse AI areas and
applications more than 60 prototypes
reviewed from IR to Robotics
  • ? System classification criterion
  • the integration purpose served

7
Classification of V-L integration prototypes
8
Examples
9
Beyond differences
? different visual and linguistic modalities
involved ? different tasks performed ? different
integration purposes served, but
  • ? similar integration resources are used
  • (though represented and instantiated
    differently)

Integration resources Associations between
Visual and corresponding linguistic information
e.g. words/concepts and visual features or
image models Form lists, integrated KB,
scene/event models in KR Integration mechanisms
KR instantiation, translation rules, media
selection, coordination
10
A descriptive definition
Descriptive Definition a) Intensional
Definition (what the term is e.g. its genus et
differentia) ? b) Extensional Definition
(what the term applies to)
a) Computational Vision-Language Integration is a
process of associating visual and
corresponding linguistic pieces of
information
(indirect back-up from Cognitive Science
cf. notion of learned associations in
Minskys "Society of Mind" 1986, and Jackendoffs
theory of associating concepts and 3D
models, 1987)
b) Computational Vision-Language Integration may
take the form of one of 4 integration
processes according to the integration
purpose to be served
11
The AI quest for V-L Integration
Argument In relying on human created data,
state of the art V-L integration systems avoid
core integration challenges and therefore fail to
perform real integration
  • Simulated or manually abstracted visual input is
    used
  • ? to avoid difficulties in image analysis
  • Applications are restricted to
    blocksworlds/miniworlds
  • ? scaling issues
  • Manually constructed integration resources used
  • ? to avoid difficulties in associating V-L

Difficulties in integration correspondence
problem etc. but, difficulties lie there where
developers intervene...
12
How far can we go?
Challenging current practices in V-L integration
system development requires that an ambitious
system specification is formulated
  • A prototype should
  • work with real visual scenes
  • analyse its visual data automatically
  • associate images and language automatically

Is it feasible to develop such a prototype ???
13
An optimistic answer
VLEMA A Vision-Language intEgration MechAnism
  • Input automatically re-constructed static
    scenes in
  • 3D (VRML format) from RESOLV
    (robot-surveyor)
  • Integration task Medium Translation
  • from images (3D sitting rooms) to text (what
    and where in EN)
  • Domain estates surveillance
  • Horizontal prototype
  • Implemented in shell programming and ProLog

14
The Input
15
System Architecture
OntoVis KB
Object Segmentation
Object Naming
Data Transformations
16
The Output
Wed Jul 7 132222 GMTDT 2004 VLEMA V1.0 Katerina
Pastra_at_University of Sheffield Description of
the automatically constructed VRML file
development-scene.wrl This is a general view
of a room. We can see the front wall, the
left-side wall, the floor, A heater on the lower
part of the front-wall and a sofa with 3
seats. The heater is shorter in length than the
sofa. It is on the right of the sofa.
17
Conclusion
Could occasional reality checks re-direct
(part of) AI research ?
  • Descriptive definition of V-L integration in AI

? a theoretical explanatory one in K. Pastra
(2004), Viewing Vision-Language Integration as a
Double-Grounding Case, Proceedings of the AAAI
Fall Symposium Series, Washington DC.
  • Review and critique of the state of the art in
    AI
  • The VLEMA prototype a baseline for future
  • research that will challenge current
    practices
Write a Comment
User Comments (0)
About PowerShow.com