Title: WordsEye: From Text To Pictures
1WordsEye From Text To Pictures
The very humongous silver sphere is fifty feet
above the ground. The silver castle is in the
sphere. The castle is 80 feet wide. The ground is
black. The sky is partly cloudy.
2Why is it hard to create 3D graphics?
3The tools are complex
4Too much detail
5Involves training, artistic skill, and expense
6Pictures from Language
- No GUI bottlenecks - Just describe it!
- Low entry barrier - no special skill or training
required - Give up detailed direct manipulation for speed
and economy of expression - Language expresses constraints
- Bypass rigid, pre-defined paths of expression
(dialogs, menus, etc) as defined by GUI - Objects vs Polygons draw upon objects in
pre-made 3D and 2D libraries - Enable novel applications in education, gaming,
online communication, . . . - Using language is fun and stimulates imagination
- Semantics
- 3D scenes provide an intuitive representation of
meaning by making explicit the contextual
elements implicit in our mental models.
7WordsEye Initial Version (with Richard Sproat)
- Developed at ATT Labs
- Graphics Mirai 3D animation system on Windows NT
- Church Tagger, Collins Parser on Linux
- WordNet (http//wordnet.princeton.edu/)
- Viewpoint 3D model library
- NLP (linux) and depiction/graphics (Linux)
communicate via sockets - WordsEye code in Common Lisp
- Siggraph paper (August 2001)
8New Version (with Richard Sproat)
- Rewrote software from scratch
- Linux and CMUCL
- Custom Parser/Tagger
- OpenGL for 3D preview display
- Radiance Renderer
- ImageMagic, Gimp for 2D post-effects
- Different subset of functionality
- No verbs/poses yet
- Web interface (www.wordseye.com)
- Webserver and multiple backend text-to-scene
servers - Gallery/Forum/E-Cards/PIctureBooks/2D effects
9A tiny grey manatee is in the aquarium. It is
facing right. The manatee is six inches below the
top of the aquarium. The ground is tile. There is
a large brick wall behind the aquarium.
10A silver head of time is on the grassy ground.
The blossom is next to the head. the blossom is
in the ground. the green light is three feet
above the blossom. the yellow light is 3 feet
above the head. The large wasp is behind the
blossom. the wasp is facing the head.
11The humongous white shiny bear is on the American
mountain range. The mountain range is 100 feet
tall. The ground is water. The sky is partly
cloudy. The airplane is 90 feet in front of the
nose of the bear. The airplane is facing right.
12A microphone is in front of a clown. The
microphone is three feet above the ground. The
microphone is facing the clown. A brick wall is
behind the clown. The light is on the ground and
in front of the clown.
13(No Transcript)
14Mary uses the crossbow. She rides the horse by
the store. The store is under the large willow.
The small allosaurus is in front of the horse.
The dinosaur faces Mary. The gigantic teacup is
in front of the store. The gigantic mushroom is
in the teacup. The castle is to the right of the
store.
15Web Interface preview mode
16Web Interface rendered (raytraced)
17WordsEye Overview
- Linguistic Analysis
- Parsing
- Create dependency-tree representation
- Anaphora resolution
- Interpretation
- Add implicit objects, relations
- Resolve semantics and references
- Depiction
- Database of 3D objects, poses, textures
- Depiction rules generate graphical constraints
- Apply constraints to create scene
18Linguistic Analysis
- Tag part-of-speech
- Parse
- Generate semantic representation
- WordNet-like dictionary for nouns
- Anaphora resolution
19Example John said that the cat is on the table.
20Parse tree for John said that the cat was on the
table.
21Nouns Hierarchical Dictionary
22WordNet problems
- Inheritance conflates functional and lexical
relations - Terrace is a plateau
- Spoon is a container
- Crossing Guard is a traffic cop
- Bellybutton is a point
- Lack of multiple inheritance between synsets
- Princess is an aristocrat, but not a female
- "ceramic-ware" is grouped under "utensil" and has
"earthenware", etc under it. But there are no
dishes, plates, under it because those are
categorized elsewhere under "tableware" - Lacks relations other than ISA. Thesaurus vs
dictionary. - Snowball made-of snow
- Italian resident-of Italy
- Cluttered with obscure words and word senses
- Spoon as a type of golf club
- Create our own dictionary to address these
problems
23Semantic Representation for John said that the
blue cat was on the table.
- 1. Object mr-happy (John)
- 2. Object cat-vp39798 (cat)
- 3. Object table-vp6204 (table)
- 4. Action say
- subject ltelement 1gt
- direct-object ltelements 2,3,5,6gt
- tense PAST
- 5. Attribute blue
- object ltelement 2gt
- 6. Spatial-Relation on
- figure ltelement 2gt
- ground ltelement 3gt
24Anaphora resolution The duck is in the sea. It
is upside down. The sea is shiny and transparent.
The ground is invisible. The apple is 3 inches
below the duck. It is in front of the duck. The
yellow illuminator is 3 feet above the apple. The
cyan illuminator is 6 inches to the left of it.
The magenta illuminator is 6 inches to the right
of it. It is partly cloudy.
25Indexical Reference Three dogs are on the table.
The first dog is blue. The first dog is 5 feet
tall. The second dog is red. The third dog is
purple.
26Interpretation
- Interpret semantic representation
- Object selection
- Resolve semantic relations/properties based on
object types - Answer Who? What? When? Where? How?
- Disambiguate/normalize relations and actions
- Identify and resolve references to implicit
objects
27Object Selection When object is missing or
doesn't exist . . .
28Object attribute interpretation (modify versus
selection)
29Semantic Interpretation of Of
30Implicit objects references
- Mary rode by the store. Her motorcycle was red.
- Verb resolution Identify implicit vehicle
- Functional properties of objects
- Reference
- Motorcycle matches the vehicle
- Her matches with Mary
31Implicit Reference Mary rode by the store. Her
motorcycle was red.
32Depiction
- 3D object and image database
- Graphical constraints
- Spatial relations
- Attributes
- Posing
- Shape/Topology changes
- Depiction process
333D Object Database
- 2,000 3D polygonal objects
- Augmented with
- Spatial tags (top surface, base, cup, push
handle, wall, stem, enclosure) - Skeletons
- Default size, orientation
- Functional properties (vehicle, weapon . . .)
- Placement/attribute conventions
342000 3D Objects
3510,000 images and textures
363D Objects and Images tagged with semantic info
- Spatial tags for 3D object regions
- Object type (e.g. WordNet synset)
- Is-a
- represents
- Object size
- Object orientation (front, preferred supporting
surface -- wall/top) - Compound object consituents
- Other object properties (style, parts, etc.)
37Spatial Tags
38Spatial Tags
39Spatial Tags
40Spatial Tags
41Stem in Cup The daisy is in the test tube.
42Enclosure and top surface The bird is in the
bird cage. The bird cage is on the chair.
43Spatial Relations
- Relative positions
- On, under, in, below, off, onto, over, above . .
. - Distance
- Sub-region positioning
- Left, middle, corner,right, center, top, front,
back - Orientation
- facing (object, left, right, front, back, east,
west . . .) - Time-of-day relations
44Vertical vs Horizontal on, distances,
directions The couch is against the wood wall.
The window is on the wall. The window is next to
the couch. the door is 2 feet to the right of the
window. the man is next to the couch. The animal
wall is to the right of the wood wall. The animal
wall is in front of the wood wall. The animal
wall is facing left. The walls are on the huge
floor. The zebra skin coffee table is two feet in
front of the couch. The lamp is on the table. The
floor is shiny.
45Attributes
- Size
- height, width, depth
- Aspect ratio (flat, wide, thin . . .)
- Surface attributes
- Texture database
- Color, Texture, Opacity, reflectivity
- Applied to objects or textures themselves
- Brightness (for lights)
46Attributes The orange battleship is on the brick
cow. The battleship is 3 feet long.
47Time of day cloudiness
48Time of day lighting
49Poses (original version only -- not yet
implemented in web version)
- Represent actions
- Database of 500 human poses
- Grips
- Usage (specialized/generic)
- Standalone
- Merge poses (upper/lower body, hands)
- Gives wide variety by mixnmatch
- Dynamic posing/IK
50Poses
51Poses
52Combined poses Mary rides the bicycle. She plays
the trumpet.
53The Broadway Boogie Woogie vase is on the Richard
Sproat coffee table. The table is in front of the
brick wall. The van Gogh picture is on the wall.
The Matisse sofa is next to the table. Mary is
sitting on the sofa. She is playing the violin.
She is wearing a straw hat.
54Mary pushes the lawn mower. The lawnmower is 5
feet tall. The cat is 5 feet behind Mary. The cat
is 10 feet tall.
Dynamically defined poses using Inverse
Kinematics (IK)
55Shape Changes (not implemented in web version)
- Deformations
- Facial expressions
- Happy, angry, sad, confused . . . mixtures
- Combined with poses
- Topological changes
- Slicing
56Facial Expressions
57The rose is in the vase. The vase is on the half
dog.
58Depiction Process
- Given a semantic representation
- Generate graphical constraints
- Handle implicit and conflicting constraints.
- Generate 3d scene from constraints
- Add environment, lights, camera
- Render scene
59Example Generate constraints for kick
- Case1 No path or recipient Direct object is
large - Pose Actor in kick pose
- Position Actor directly behind direct object
- Orientation Actor facing direct object
- Case2 No path or recipient Direct object is
small - Pose Actor in kick pose
- Position Direct object above foot
- Case3 Path and Recipient
- Poserelations . . . (some tentative)
60Some varieties of kick
Case1 John kicked the pickup truck
Case3 John kicked the ball to the cat on the
skateboard
Case2John kicked the football
61Implicit Constraint. The vase is on the
nightstand. The lamp is next to the vase.
62Figurative Metaphorical Depiction
- Textualization
- Conventional Icons and emblems
- Literalization
- Characterization
- Personification
- Functionalization
63Textualization The cat is facing the wall.
64Conventional Icons The blue daisy is not in the
army boot.
65Literalization Life is a bowl of cherries.
66Characterization The policeman ran by the
parking meter
67Functionalization The hippo flies over the church
68Future/Ongoing Work
- Build/use scenario-based lexical resource
- Word knowledge (dictionary)
- Frame knowledge
- For verbs and event nouns
- Finer-grained representation of prepositions and
spatial relations - Contextual knowledge
- Default verb arguments
- Default constituents and spatial relations in
settings/environments - Decompose actions into poses and spatial
relations - Learn contextual knowledge from corpora
- Graphics/output support
- Add dynamic posing of characters to depict
actions - Handle more complex, natural text
- Handle object parts
- Add more 2D/3D content (including user uploadable
3D objects) - Physics, animation, sound, and speech
69FrameNet Digital lexical resource
http//framenet.icsi.berkeley.edu/
- 947 hierarchically defined frames
- 10,000 lexical entries (Verbs, nouns, adjectives)
- Relations between frame (perspective-on,
subframe, using, ) - Annotated sentences for each lexical unit
70Lexical Units in Revenge Frame
71Frame elements for avenge.v
Frame Element Core Type
Degree Core
Depictive Peripheral
Injured_party Extra_thematic
Injury Core
Instrument Core
Manner Peripheral
Offender Peripheral
Place Core
Punishment Peripheral
Purpose Core
Result Extra_thematic
Time Peripheral
72Annotations for avenge.v
73Relations between frames
74Frame element mappings between frames
- Core vs Peripheral
- Inheritance
- Renaming (eg. agent -gt helper)
75Valence patterns for verb sell (commerce_sell
frame) and two related frames
- ltLU-2986 "sell.v" Commerce_sellgt patterns (33
((Seller Ext) (Goods Obj))) (11 ((Goods Ext)))
(7 ((Seller Ext) (Goods Obj) (Buyer Dep(to))))
(4 ((Seller Ext))) (2 ((Goods Ext) (Buyer
Dep(to)))) - ltframe Commerce_buygt patterns (91 ((Buyer
Ext) (Goods Obj))) (27 ((Buyer Ext) (Goods Obj)
(Seller Dep(from)))) (11 ((Buyer Ext))) (2
((Buyer Ext) (Goods Obj) (Seller Dep(at)))) (2
((Buyer Ext) (Seller Dep(from)))) (2 ((Goods
Obj))) - ltframe Expensivenessgt patterns (17 ((Goods
Ext) (Money Dep(NP)))) (8 ((Goods Ext))) (4
((Goods Ext) (Money Dep(between)))) (4 ((Goods
Ext) (Money Dep(from)))) (2 ((Goods Ext) (Money
Dep(under)))) (1 ((Goods Ext) (Money
Dep(just)))) (1 ((Goods Ext) (Money Dep(NP))
(Seller Dep(from))))
76Parsing and generating semantic relations using
FrameNet
- NLPgt (interpret-sentence "the boys on the beach
said that the fish swam to island)Parse(S
(NP (NP (DT "the") (NN2 (NNS "boys"))) (PREPP
(PREPP (IN "on") (NP (DT "the") (NN2 (NN
"beach")))))) (VP (VP1 (VERB (VBD "said")))
(COMP "that") (S (NP (DT "the") (NN2 (NN
"fish"))) (VP (VP1 (VERB (VBD "swam")))
(PREPP (PREPP (TO "to") (NP (NN2 (NN
"island"))))))))) Word Dependency((ltnoun
"boy" (Plural) ID18gt (DEP ltprep "on"
ID19gt)) (ltprep "on" ID19gt (DEP ltnoun
"beach" ID21gt)) (ltverb "said" ID22gt
(SUBJECT ltnoun "boy" (Plural) ID18gt)
(DIRECT-OBJECT ltverb "swam" ID26gt))
(ltverb "swam" ID26gt (SUBJECT ltnoun "fish"
ID25gt) (DEP ltprep "to" ID27gt))
(ltprep "to" ID27gt (DEP ltnoun "island"
ID28gt))) Frame Dependency((ltrelation
CN-SPATIAL-RELATION-ON ID19gt (FIGURE
ltnoun "boy" (Plural) ID18gt) (GROUND
ltnoun "beach" ID21gt)) (ltaction "say.v"
ID22gt (ltframe-element "Text" ID29gt
ltaction "swim.v" ID26gt) (ltframe-element
"Author" ID30gt ltnoun "boy" (Plural) ID18gt))
(ltaction "swim.v" ID26gt (ltframe-element
"Self_mover" ID31gt ltnoun "fish" ID25gt)
(ltframe-element ("Goal") ID32gt ltprep "to"
ID27gt)) (ltprep "to" ID27gt (DEP ltnoun
"island" ID28gt)))
77Acquiring contextual knowledge
- Where does eating breakfast take place?
- Inferring the environment in a text-to-scene
conversion system. K-CAP 2001 Richard Sproat - Default locations and spatial relations (by Gino
Miceli) - Project Gutenberg corpus of online English prose
(http//www.gutenberg.org/), - Use seed-object pairs to extract other pairs with
equivalent spatial relations (e.g. cups are
(typically) on tables, while books are on desks).
- Leverage verb/preposition semantics as well as
simple syntactic structure to identify spatial
templates based on verb/preposition,particle
plus intervening modifiers.
78Pragmatic Ambiguity The lamp is next to the vase
on the nightstand . . .
79Syntactic Ambiguity Prepositional phrase
attachment
John looks at the cat on the skateboard.
80Potential Applications
- Online communications Electronic postcards,
visual chat/IM, social networks - Gaming, virtual environments
- Storytelling/comic books/art
- Education (ESL, reading, disabled learning,
graphics arts) - Graphics authoring/prototyping tool
- Visual summarization and/or translation of text
- Embedded in toys
81Storytelling The stagecoach is in front of the
old west hotel. Mary is next to the stagecoach.
She plays the guitar. Edward exercises in front
of the stagecoach. The large sunflower is to the
left of the stagecoach.
82Scenes within scenes . . .
83Greeting Cards
841st grade homework The duck sat on a hen the
hen sat on a pig...
85Conclusion
- New approach to scene generation
- Low overhead (skill, training . . .)
- Immediacy
- Usable with minimal hardware text or speech
input device and display screen. - Work is ongoing
- Available as experimental web service
86Related Work
- Adorni, Di Manzo, Giunchiglia, 1984
- Put Clay and Wilhelms, 1996
- PAR Badler et al., 2000
- CarSim Dupuy et al., 2000
- SHRDLU Winograd, 1972
87Bloopers John said the cat is on the table
88Bloopers Mary says the cat is blue.
89Bloopers John wears the axe. He plays the violin.
90Bloopers Happy John holds the red shark
91Bloopers Jack carried the television
92Web Interface - Entry Page (www.wordseye.com)
- Registration
- Login
- Learn more
- Example pictures
93Web Interface - Public Gallery
94Web Interface - Add Comments to Picture
95Web Interface - Link Pictures into Stories Games
96The tall granite mountain range is 300 feet
wide. The enormous umbrella is on the mountain
range. The gray elephant is under the
umbrella. The chicken cube is 6 feet to the right
of the gray elephant. The cube is 5 feet tall.
The cube is on the mountain range. A clown is on
the elephant. The large sewing machine is on the
cube. A die is on the clown. It is 3 feet tall.
97(No Transcript)
98(No Transcript)