WordsEye: From Text To Pictures - PowerPoint PPT Presentation

1 / 98

About This Presentation

Title:

WordsEye: From Text To Pictures

Description:

... (interpret-sentence 'the boys on the beach said that the fish swam to island') Parse: ... John holds the red shark. Bloopers: Jack carried the television ... – PowerPoint PPT presentation

Number of Views:172

Avg rating:3.0/5.0

Slides: 99

Provided by: www1CsC

Learn more at: http://www1.cs.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: WordsEye: From Text To Pictures

1
WordsEye From Text To Pictures
The very humongous silver sphere is fifty feet
above the ground. The silver castle is in the
sphere. The castle is 80 feet wide. The ground is
black. The sky is partly cloudy.
2
Why is it hard to create 3D graphics?
3
The tools are complex
4
Too much detail
5
Involves training, artistic skill, and expense
6
Pictures from Language

No GUI bottlenecks - Just describe it!
Low entry barrier - no special skill or training
required
Give up detailed direct manipulation for speed
and economy of expression
Language expresses constraints
Bypass rigid, pre-defined paths of expression
(dialogs, menus, etc) as defined by GUI
Objects vs Polygons draw upon objects in
pre-made 3D and 2D libraries
Enable novel applications in education, gaming,
online communication, . . .
Using language is fun and stimulates imagination
Semantics
3D scenes provide an intuitive representation of
meaning by making explicit the contextual
elements implicit in our mental models.

7
WordsEye Initial Version (with Richard Sproat)

Developed at ATT Labs
Graphics Mirai 3D animation system on Windows NT
Church Tagger, Collins Parser on Linux
WordNet (http//wordnet.princeton.edu/)
Viewpoint 3D model library
NLP (linux) and depiction/graphics (Linux)
communicate via sockets
WordsEye code in Common Lisp
Siggraph paper (August 2001)

8
New Version (with Richard Sproat)

Rewrote software from scratch
Linux and CMUCL
Custom Parser/Tagger
OpenGL for 3D preview display
Radiance Renderer
ImageMagic, Gimp for 2D post-effects
Different subset of functionality
No verbs/poses yet
Web interface (www.wordseye.com)
Webserver and multiple backend text-to-scene
servers
Gallery/Forum/E-Cards/PIctureBooks/2D effects

9
A tiny grey manatee is in the aquarium. It is
facing right. The manatee is six inches below the
top of the aquarium. The ground is tile. There is
a large brick wall behind the aquarium.
10
A silver head of time is on the grassy ground.
The blossom is next to the head. the blossom is
in the ground. the green light is three feet
above the blossom. the yellow light is 3 feet
above the head. The large wasp is behind the
blossom. the wasp is facing the head.
11
The humongous white shiny bear is on the American
mountain range. The mountain range is 100 feet
tall. The ground is water. The sky is partly
cloudy. The airplane is 90 feet in front of the
nose of the bear. The airplane is facing right.
12
A microphone is in front of a clown. The
microphone is three feet above the ground. The
microphone is facing the clown. A brick wall is
behind the clown. The light is on the ground and
in front of the clown.
13
(No Transcript)
14
Mary uses the crossbow. She rides the horse by
the store. The store is under the large willow.
The small allosaurus is in front of the horse.
The dinosaur faces Mary. The gigantic teacup is
in front of the store. The gigantic mushroom is
in the teacup. The castle is to the right of the
store.
15
Web Interface preview mode
16
Web Interface rendered (raytraced)
17
WordsEye Overview

Linguistic Analysis
Parsing
Create dependency-tree representation
Anaphora resolution
Interpretation
Add implicit objects, relations
Resolve semantics and references
Depiction
Database of 3D objects, poses, textures
Depiction rules generate graphical constraints
Apply constraints to create scene

18
Linguistic Analysis

Tag part-of-speech
Parse
Generate semantic representation
WordNet-like dictionary for nouns
Anaphora resolution

19
Example John said that the cat is on the table.
20
Parse tree for John said that the cat was on the
table.
21
Nouns Hierarchical Dictionary
22
WordNet problems

Inheritance conflates functional and lexical
relations
Terrace is a plateau
Spoon is a container
Crossing Guard is a traffic cop
Bellybutton is a point
Lack of multiple inheritance between synsets
Princess is an aristocrat, but not a female
"ceramic-ware" is grouped under "utensil" and has
"earthenware", etc under it. But there are no
dishes, plates, under it because those are
categorized elsewhere under "tableware"
Lacks relations other than ISA. Thesaurus vs
dictionary.
Snowball made-of snow
Italian resident-of Italy
Cluttered with obscure words and word senses
Spoon as a type of golf club
Create our own dictionary to address these
problems

23
Semantic Representation for John said that the
blue cat was on the table.

1. Object mr-happy (John)
2. Object cat-vp39798 (cat)
3. Object table-vp6204 (table)
4. Action say
subject ltelement 1gt
direct-object ltelements 2,3,5,6gt
tense PAST
5. Attribute blue
object ltelement 2gt
6. Spatial-Relation on
figure ltelement 2gt
ground ltelement 3gt

24
Anaphora resolution The duck is in the sea. It
is upside down. The sea is shiny and transparent.
The ground is invisible. The apple is 3 inches
below the duck. It is in front of the duck. The
yellow illuminator is 3 feet above the apple. The
cyan illuminator is 6 inches to the left of it.
The magenta illuminator is 6 inches to the right
of it. It is partly cloudy.
25
Indexical Reference Three dogs are on the table.
The first dog is blue. The first dog is 5 feet
tall. The second dog is red. The third dog is
purple.
26
Interpretation

Interpret semantic representation
Object selection
Resolve semantic relations/properties based on
object types
Answer Who? What? When? Where? How?
Disambiguate/normalize relations and actions
Identify and resolve references to implicit
objects

27
Object Selection When object is missing or
doesn't exist . . .
28
Object attribute interpretation (modify versus
selection)
29
Semantic Interpretation of Of
30
Implicit objects references

Mary rode by the store. Her motorcycle was red.
Verb resolution Identify implicit vehicle
Functional properties of objects
Reference
Motorcycle matches the vehicle
Her matches with Mary

31
Implicit Reference Mary rode by the store. Her
motorcycle was red.
32
Depiction

3D object and image database
Graphical constraints
Spatial relations
Attributes
Posing
Shape/Topology changes
Depiction process

33
3D Object Database

2,000 3D polygonal objects
Augmented with
Spatial tags (top surface, base, cup, push
handle, wall, stem, enclosure)
Skeletons
Default size, orientation
Functional properties (vehicle, weapon . . .)
Placement/attribute conventions

34
2000 3D Objects
35
10,000 images and textures
36
3D Objects and Images tagged with semantic info

Spatial tags for 3D object regions
Object type (e.g. WordNet synset)
Is-a
represents
Object size
Object orientation (front, preferred supporting
surface -- wall/top)
Compound object consituents
Other object properties (style, parts, etc.)

37
Spatial Tags
38
Spatial Tags
39
Spatial Tags
40
Spatial Tags
41
Stem in Cup The daisy is in the test tube.
42
Enclosure and top surface The bird is in the
bird cage. The bird cage is on the chair.
43
Spatial Relations

Relative positions
On, under, in, below, off, onto, over, above . .
.
Distance
Sub-region positioning
Left, middle, corner,right, center, top, front,
back
Orientation
facing (object, left, right, front, back, east,
west . . .)
Time-of-day relations

44
Vertical vs Horizontal on, distances,
directions The couch is against the wood wall.
The window is on the wall. The window is next to
the couch. the door is 2 feet to the right of the
window. the man is next to the couch. The animal
wall is to the right of the wood wall. The animal
wall is in front of the wood wall. The animal
wall is facing left. The walls are on the huge
floor. The zebra skin coffee table is two feet in
front of the couch. The lamp is on the table. The
floor is shiny.
45
Attributes

Size
height, width, depth
Aspect ratio (flat, wide, thin . . .)
Surface attributes
Texture database
Color, Texture, Opacity, reflectivity
Applied to objects or textures themselves
Brightness (for lights)

46
Attributes The orange battleship is on the brick
cow. The battleship is 3 feet long.
47
Time of day cloudiness
48
Time of day lighting
49
Poses (original version only -- not yet
implemented in web version)

Represent actions
Database of 500 human poses
Grips
Usage (specialized/generic)
Standalone
Merge poses (upper/lower body, hands)
Gives wide variety by mixnmatch
Dynamic posing/IK

50
Poses
51
Poses
52
Combined poses Mary rides the bicycle. She plays
the trumpet.
53
The Broadway Boogie Woogie vase is on the Richard
Sproat coffee table. The table is in front of the
brick wall. The van Gogh picture is on the wall.
The Matisse sofa is next to the table. Mary is
sitting on the sofa. She is playing the violin.
She is wearing a straw hat.
54
Mary pushes the lawn mower. The lawnmower is 5
feet tall. The cat is 5 feet behind Mary. The cat
is 10 feet tall.
Dynamically defined poses using Inverse
Kinematics (IK)
55
Shape Changes (not implemented in web version)

Deformations
Facial expressions
Happy, angry, sad, confused . . . mixtures
Combined with poses
Topological changes
Slicing

56
Facial Expressions
57
The rose is in the vase. The vase is on the half
dog.
58
Depiction Process

Given a semantic representation
Generate graphical constraints
Handle implicit and conflicting constraints.
Generate 3d scene from constraints
Add environment, lights, camera
Render scene

59
Example Generate constraints for kick

Case1 No path or recipient Direct object is
large
Pose Actor in kick pose
Position Actor directly behind direct object
Orientation Actor facing direct object
Case2 No path or recipient Direct object is
small
Pose Actor in kick pose
Position Direct object above foot
Case3 Path and Recipient
Poserelations . . . (some tentative)

60
Some varieties of kick
Case1 John kicked the pickup truck
Case3 John kicked the ball to the cat on the
skateboard
Case2John kicked the football
61
Implicit Constraint. The vase is on the
nightstand. The lamp is next to the vase.
62
Figurative Metaphorical Depiction

Textualization
Conventional Icons and emblems
Literalization
Characterization
Personification
Functionalization

63
Textualization The cat is facing the wall.
64
Conventional Icons The blue daisy is not in the
army boot.
65
Literalization Life is a bowl of cherries.
66
Characterization The policeman ran by the
parking meter
67
Functionalization The hippo flies over the church
68
Future/Ongoing Work

Build/use scenario-based lexical resource
Word knowledge (dictionary)
Frame knowledge
For verbs and event nouns
Finer-grained representation of prepositions and
spatial relations
Contextual knowledge
Default verb arguments
Default constituents and spatial relations in
settings/environments
Decompose actions into poses and spatial
relations
Learn contextual knowledge from corpora
Graphics/output support
Add dynamic posing of characters to depict
actions
Handle more complex, natural text
Handle object parts
Add more 2D/3D content (including user uploadable
3D objects)
Physics, animation, sound, and speech

69
FrameNet Digital lexical resource
http//framenet.icsi.berkeley.edu/

947 hierarchically defined frames
10,000 lexical entries (Verbs, nouns, adjectives)
Relations between frame (perspective-on,
subframe, using, )
Annotated sentences for each lexical unit

70
Lexical Units in Revenge Frame
71
Frame elements for avenge.v
72
Annotations for avenge.v
73
Relations between frames
74
Frame element mappings between frames

Core vs Peripheral
Inheritance
Renaming (eg. agent -gt helper)

75
Valence patterns for verb sell (commerce_sell
frame) and two related frames

ltLU-2986 "sell.v" Commerce_sellgt patterns (33
((Seller Ext) (Goods Obj))) (11 ((Goods Ext)))
(7 ((Seller Ext) (Goods Obj) (Buyer Dep(to))))
(4 ((Seller Ext))) (2 ((Goods Ext) (Buyer
Dep(to))))
ltframe Commerce_buygt patterns (91 ((Buyer
Ext) (Goods Obj))) (27 ((Buyer Ext) (Goods Obj)
(Seller Dep(from)))) (11 ((Buyer Ext))) (2
((Buyer Ext) (Goods Obj) (Seller Dep(at)))) (2
((Buyer Ext) (Seller Dep(from)))) (2 ((Goods
Obj)))
ltframe Expensivenessgt patterns (17 ((Goods
Ext) (Money Dep(NP)))) (8 ((Goods Ext))) (4
((Goods Ext) (Money Dep(between)))) (4 ((Goods
Ext) (Money Dep(from)))) (2 ((Goods Ext) (Money
Dep(under)))) (1 ((Goods Ext) (Money
Dep(just)))) (1 ((Goods Ext) (Money Dep(NP))
(Seller Dep(from))))

76
Parsing and generating semantic relations using
FrameNet

NLPgt (interpret-sentence "the boys on the beach
said that the fish swam to island)Parse(S
(NP (NP (DT "the") (NN2 (NNS "boys"))) (PREPP
(PREPP (IN "on") (NP (DT "the") (NN2 (NN
"beach")))))) (VP (VP1 (VERB (VBD "said")))
(COMP "that") (S (NP (DT "the") (NN2 (NN
"fish"))) (VP (VP1 (VERB (VBD "swam")))
(PREPP (PREPP (TO "to") (NP (NN2 (NN
"island"))))))))) Word Dependency((ltnoun
"boy" (Plural) ID18gt (DEP ltprep "on"
ID19gt)) (ltprep "on" ID19gt (DEP ltnoun
"beach" ID21gt)) (ltverb "said" ID22gt
(SUBJECT ltnoun "boy" (Plural) ID18gt)
(DIRECT-OBJECT ltverb "swam" ID26gt))
(ltverb "swam" ID26gt (SUBJECT ltnoun "fish"
ID25gt) (DEP ltprep "to" ID27gt))
(ltprep "to" ID27gt (DEP ltnoun "island"
ID28gt))) Frame Dependency((ltrelation
CN-SPATIAL-RELATION-ON ID19gt (FIGURE
ltnoun "boy" (Plural) ID18gt) (GROUND
ltnoun "beach" ID21gt)) (ltaction "say.v"
ID22gt (ltframe-element "Text" ID29gt
ltaction "swim.v" ID26gt) (ltframe-element
"Author" ID30gt ltnoun "boy" (Plural) ID18gt))
(ltaction "swim.v" ID26gt (ltframe-element
"Self_mover" ID31gt ltnoun "fish" ID25gt)
(ltframe-element ("Goal") ID32gt ltprep "to"
ID27gt)) (ltprep "to" ID27gt (DEP ltnoun
"island" ID28gt)))

77
Acquiring contextual knowledge

Where does eating breakfast take place?
Inferring the environment in a text-to-scene
conversion system. K-CAP 2001 Richard Sproat
Default locations and spatial relations (by Gino
Miceli)
Project Gutenberg corpus of online English prose
(http//www.gutenberg.org/),
Use seed-object pairs to extract other pairs with
equivalent spatial relations (e.g. cups are
(typically) on tables, while books are on desks).
Leverage verb/preposition semantics as well as
simple syntactic structure to identify spatial
templates based on verb/preposition,particle
plus intervening modifiers.

78
Pragmatic Ambiguity The lamp is next to the vase
on the nightstand . . .
79
Syntactic Ambiguity Prepositional phrase
attachment
John looks at the cat on the skateboard.
80
Potential Applications

Online communications Electronic postcards,
visual chat/IM, social networks
Gaming, virtual environments
Storytelling/comic books/art
Education (ESL, reading, disabled learning,
graphics arts)
Graphics authoring/prototyping tool
Visual summarization and/or translation of text
Embedded in toys

81
Storytelling The stagecoach is in front of the
old west hotel. Mary is next to the stagecoach.
She plays the guitar. Edward exercises in front
of the stagecoach. The large sunflower is to the
left of the stagecoach.
82
Scenes within scenes . . .
83
Greeting Cards
84
1st grade homework The duck sat on a hen the
hen sat on a pig...
85
Conclusion

New approach to scene generation
Low overhead (skill, training . . .)
Immediacy
Usable with minimal hardware text or speech
input device and display screen.
Work is ongoing
Available as experimental web service

86
Related Work

Adorni, Di Manzo, Giunchiglia, 1984
Put Clay and Wilhelms, 1996
PAR Badler et al., 2000
CarSim Dupuy et al., 2000
SHRDLU Winograd, 1972

87
Bloopers John said the cat is on the table
88
Bloopers Mary says the cat is blue.
89
Bloopers John wears the axe. He plays the violin.
90
Bloopers Happy John holds the red shark
91
Bloopers Jack carried the television
92
Web Interface - Entry Page (www.wordseye.com)

Registration
Login
Learn more
Example pictures

93
Web Interface - Public Gallery
94
Web Interface - Add Comments to Picture
95
Web Interface - Link Pictures into Stories Games
96
The tall granite mountain range is 300 feet
wide. The enormous umbrella is on the mountain
range. The gray elephant is under the
umbrella. The chicken cube is 6 feet to the right
of the gray elephant. The cube is 5 feet tall.
The cube is on the mountain range. A clown is on
the elephant. The large sewing machine is on the
cube. A die is on the clown. It is 3 feet tall.
97
(No Transcript)
98
(No Transcript)

Write a Comment

User Comments (0)