CONFUCIUS: an Intelligent MultiMedia storytelling interpretation - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

CONFUCIUS: an Intelligent MultiMedia storytelling interpretation

Description:

To interpret natural language story and movie (drama) script input and to ... Biped kinematics, e.g. 'walk', 'swim', & other motion models like 'fly' ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 25
Provided by: informatic9
Category:

less

Transcript and Presenter's Notes

Title: CONFUCIUS: an Intelligent MultiMedia storytelling interpretation


1
CONFUCIUS an Intelligent MultiMedia
storytelling interpretation presentation system
  • Minhua Eunice Ma
  • Supervisor Prof. Paul Mc Kevitt
  • School of Computing and Intelligent Systems
  • Faculty of Informatics
  • University of Ulster, Magee

2
Objectives of CONFUCIUS
  • To interpret natural language story and movie
    (drama) script input and to extract conceptual
    semantics from the natural language
  • To generate 3D animation and virtual worlds
    automatically from natural language
  • To integrate 3D animation with speech and
    non-speech audio, to form an intelligent
    multimedia storytelling system for presenting
    multimodal stories

3
CONFUCIUS context diagram
4
Previous systems
  • Schanks CD Theory (1972)
  • Primitive scripts
  • SAM PAM
  • Automatic Text-to-Graphics Systems
  • WordsEye (Coyne Sproat, 2001)
  • Micons and CD-based language animation
    (Narayanan et al. 1995)
  • Spoken Image (Ó Nualláin Smith, 1994) its
    successor SONAS (Kelleher et al. 2000)

5
  • MultiModal interactive storytelling
  • AesopWorld
  • KidsRoom
  • Larsen Petersens Interactive Storytelling
  • Oz
  • Computer games
  • Virtual humans embodied agents
  • BEAT (Cassell et al., 2000)
  • Jack (University of Pennsylvania)
  • Improv (Perlin and Goldberg, 1996)
  • SimHuman
  • Gandalf
  • PPP persona

6
Architecture of CONFUCIUS
Natural language stories
Script writer
Script parser
Prefabricated objects (knowledge base)
lexicon grammar etc
Natural Language Processing
Text To Speech
Sound effects
Language knowledge
3D authoring tools, existing 3D models
character models
semantic representations
mapping
Animation generation
visual knowledge
visual knowledge (3D graphic library)
Synchronizing fusion
3D world with audio in VRML
7
Semantic representations
8
MultiModal semantic representation
Multimodal semantics
High-level multimodal semantic representation XML
/frame-based
Media-independent representation
Visual media-dependent representation
Intermediate level
Audio media-dependent representation
Non-speech audio modality
Language modality
Visual modality
9
Mental imagery meaning processing
Meanings, communicable ideas, thoughts,
manifestable messages, proverbs, examples,
parables, etc.
Simulation presentation via language or other
modalities
Mental world
Mental world
Communication
Simulation Image recognition
Simulation Language understanding
Cognition
Re-cognition
Physical world
Virtual world
10
Knowledge base of CONFUCIUS
knowledge base
Semantic knowledge - lexicons (eg.
WordNet) Syntactic knowledge - grammars Statistica
l models of language Associations between words
Language knowledge
Object model (nouns) Functional
information Internal coordinate axes (for spatial
reasoning) Associations between objects
Event model (event verbs, describes the motion of
objects)
Visual knowledge
World knowledge
Spatial qualitative reasoning knowledge
11
Graphic library
objects/props
characters
geometry joint hierarchy files
Simple geometry files
instantiation
motions
animation library (key frames)
12
Data Flow Diagram
Primitives library
Natural language processor
Animation generator
Visual semantics
VRML without sound nodes
SceneActor descriptions
Media coordination
Synthesized animation
TTS
dialogues
Script parser
script
Non-speech audio
Sound effect driver
script
Script writer
story
Music library
13
Animation generator
LCS representation
verb semantic analysis
use lexical relations (WordNet) to replace
synonyms, scripts application, etc.
match basic motions in library?
Y
N
motion decomposition
animation controller
motion instantiation
environment placement
VRML format of the virtual story world examples
demo
14
Categories of events
  • Atomic entities
  • Change physical location such as position and
    orientation, e.g. bounce, turn
  • Change intrinsic attributes such as shape, size,
    color, and texture, e.g. bend, and even
    visibility, e.g. disappear, fade (in/out)
  • Non-atomic entities
  • Non-character events
  • Two or more individual objects fuse together,
    e.g. melt (in)
  • One object divides into two or more individual
    parts, e.g. break (into pieces)
  • Change sub-components (their position, size,
    color), e.g. blossom
  • Environment events (weather verbs), e.g. snow,
    rain
  • Character events
  • Action verbs
  • Intransitive verbs
  • Transitive verbs
  • Non-action verbs (stative, emotion, possession,
    mental activities, cognition perception)
  • Idioms metaphor verbs

15
Categories of action verbs
  • Intransitive verbs
  • Biped kinematics, e.g. walk, swim, other
    motion models like fly
  • Face expressions, e.g. laugh, anger
  • Lip movement, e.g. speak, say
  • Transitive verbs
  • single object, e.g. throw, push, kick
  • multiple objects
  • direct and indirect objects, e.g. give, pass,
    show
  • indirect object the instrument, e.g. cut,
    hammer

16
Visual definition word sense
polysemy
verb
word sense
visual definition entry
mapping
synonymy
  • a normal door (rotation on y axis)
  • a sliding door (moving on x axis)
  • a rolling shutter door (a combination of rotation
    on x axis and moving on y axis)

Example close (a door)
word sense -- minimal complete unit of meaning in
the language modality visual definition entry --
minimal complete unit of meaning in the visual
modality
17
Troponyms verbs derived from adjectives/nouns
  • troponym
  • elaborates the manners of a base verb (Fellbaum
    1998)
  • examples trot-walk (fast), gulp-eat
    (quickly)
  • base verb adverb
  • present the base verb modify the manner
    (speed, the agents state, duration of the
    activity, iteration, etc.)
  • Verbs derived from adjectives or nouns
  • change objects properties (size, color, shape)
    or the world state
  • verbs with affixes such as en, -ify, or ize,
    e.g. lengthen
  • using predicates scale(), squash() or changing
    the corresponding property fields of the object
    in VRML

18
Representing active passive voice
  • active and passive voice
  • converse verb pairs such as give/take,
    buy/sell, lend/borrow
  • same activity from different point of view
  • use of VRML Viewpoint node

19
Implementation semantics?VRML
DEF ball Transform translation 0 0 0
children DEF ball-TIMER TimeSensor loop
TRUE cycleInterval 0.5 , DEF ball-POS-INTERP
PositionInterpolator key 0, 0.5, 1
keyValue 0 0 0, 0 20 0, 0 0 0 ,
Shape appearance Appearance
material Material geometry
Sphere radius 5 ROUTE
ball-TIMER.fraction_changed TO
ball-POS-INTERP.set_fraction ROUTE
ball-POS-INTERP.value_changed TO
ball.set_translation (c) Output ? VRML code of
a bouncing ball
Example A ball is bouncing
bounce(ball)- moveTo(ball, 0,0,0),
moveTo(ball,0,20,0)L. (a) visual definition of
bounce
DEF ball Transform translation 0 0 0
children Shape appearance
Appearance material Material
geometry Sphere radius 5
(b) VRML code of a static ball
20
Categories of adjectives
Objects attributes/states dark/light,
large/small, big/little, white/black (color
adj.), long/short, new/old, high/low, full/empty,
open/closed
Visually observable
Feelings happy/sad, angry, excited, surprised,
terrified
Observable human attributes
Others old/young, beautiful/ugly, strong/weak,
poor/rich, fat/thin
Relational adj. nasal (nose), mural (wall),
dental (teeth)
Perceivable by other modalities wet/dry,
warm/cold, coarse/smooth, hard/soft, heavy/light
Unobservable human attributes (virtue)
good/evil, kind, mean, ambitious
Visually unobservable
Abstract attributes
Others easy/difficult, real, important,
particular, right/wrong, early/late
Reference-modifying adj. possible/impossible,
former, past/present, last, other, different/same
21
Software Analysis
  • Java programming language
  • parsing intermediate representation
  • changing VRML code to create/modify animation
  • integrating modules
  • Natural language processing tools
  • Gate (pre-processing)
  • PC-PARSE (morphologic and syntax analysis)
  • WordNet (lexicon, semantic inference)
  • 3D graphic modelling
  • existing 3D models on the Internet
  • 3D Studio Max (props stage)
  • VRML (Virtual Reality Modelling Language) 97,
    H-anim 2001 spec.
  • The Actors using embodied agents
  • Microsoft Agent (the narrator and minor actors)
  • Character Studio, Internet Character Animator
    (protagonists)

22
Natural Language Processing
PC-PARSER
FEATURES
Semantic inference
WordNet 1.6
23
Contribution prospective applications
  • multimodal semantic representation of natural
    language
  • automatic animation generation
  • multimodal fusion and coordination
  • Childrens education
  • Multimedia presentation
  • Movie/drama production
  • Script writing
  • Computer games
  • Virtual Reality

24
Conclusion
  • The objectives of CONFUCIUS meet the challenging
    problems in language visualisation
  • formalizes meaning of action verbs and states
  • mapping language primitives with visual
    primitives
  • a reusable common sense knowledge base for
    other systems
  • sophisticated spatial and temporal reasoning
  • representing stories by temporal multimedia
    requires significant coordination
Write a Comment
User Comments (0)
About PowerShow.com