Title: A Probabilistic Approach to Semantic Representation
1A Probabilistic Approach to Semantic
Representation
- Tom Griffiths
- Mark Steyvers
- Josh Tenenbaum
2- How do we store the meanings of words?
- question of representation
- requires efficient abstraction
3- How do we store the meanings of words?
- question of representation
- requires efficient abstraction
- Why do we store this information?
- function of semantic memory
- predictive structure
4Latent Semantic Analysis(Landauer Dumais, 1997)
co-occurrence matrix
high dimensional space
SVD
X
U D V T
5Mechanistic Claim
- Some component of word meaning can be extracted
from co-occurrence statistics -
6Mechanistic Claim
- Some component of word meaning can be extracted
from co-occurrence statistics - But
- Why should this be true?
- Is the SVD the best way to treat these data?
- What assumptions are we making about meaning?
-
7Mechanism and Function
- Some component of word meaning can be extracted
from co-occurrence statistics - Semantic memory is structured to aid
retrieval via context-specific prediction -
8Functional Claim
- Semantic memory is structured to aid
retrieval via context-specific prediction - Motivates sensitivity to co-occurrence statistics
- Identifies how co-occurrence data should be used
- Allows the role of meaning to be specified
exactly, and finds a meaningful decomposition of
language
9A Probabilistic Approach
- The function of semantic memory
- The psychological problem of meaning
- One approach to meaning
- Solving the statistical problem of meaning
- Maximum likelihood estimation
- Bayesian statistics
- Comparisons with Latent Semantic Analysis
- Quantitative
- Qualitative
10A Probabilistic Approach
- The function of semantic memory
- The psychological problem of meaning
- One approach to meaning
- Solving the statistical problem of meaning
- Maximum likelihood estimation
- Bayesian statistics
- Comparisons with Latent Semantic Analysis
- Quantitative
- Qualitative
11The Function of Semantic Memory
- To predict what concepts are likely to be needed
in a context, and thereby ease their retrieval - Similar to rational accounts of categorization
and memory (Anderson, 1990) - Same principle appears in semantic networks
(Collins Quillian, 1969 Collins Loftus, 1975)
12The Psychological Problem of Meaning
- Simply memorizing whole word-document
co-occurrence matrix does not help - Generalization requires abstraction, and this
abstraction identifies the nature of meaning - Specifying a generative model for documents
allows inference and generalization
13One Approach to Meaning
- Each document a mixture of topics
- Each word chosen from a single topic
- from parameters
- from parameters
14One Approach to Meaning
w P(wz 1) f (1)
w P(wz 2) f (2)
HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY
0.2 SCIENTIFIC 0.0 KNOWLEDGE 0.0 WORK
0.0 RESEARCH 0.0 MATHEMATICS 0.0
HEART 0.0 LOVE 0.0 SOUL 0.0 TEARS 0.0 JOY
0.0 SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK
0.2 RESEARCH 0.2 MATHEMATICS 0.2
topic 1
topic 2
15One Approach to Meaning
Choose mixture weights for each document,
generate bag of words
q P(z 1), P(z 2) 0, 1 0.25,
0.75 0.5, 0.5 0.75, 0.25 1, 0
MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS
RESEARCH WORK SCIENTIFIC MATHEMATICS WORK
SCIENTIFIC KNOWLEDGE MATHEMATICS SCIENTIFIC
HEART LOVE TEARS KNOWLEDGE HEART
MATHEMATICS HEART RESEARCH LOVE MATHEMATICS WORK
TEARS SOUL KNOWLEDGE HEART
WORK JOY SOUL TEARS MATHEMATICS TEARS LOVE LOVE
LOVE SOUL
TEARS LOVE JOY SOUL LOVE TEARS SOUL SOUL TEARS JOY
16One Approach to Meaning
q
z
- Generative model for co-occurrence data
- Introduced by Blei, Ng, and Jordan (2002)
- Clarifies pLSI (Hofmann, 1999)
w
17Matrix Interpretation
documents
topics
documents
C F Q
topics
words
words
normalized co-occurrence matrix
mixture weights
mixture components
A form of non-negative matrix factorization
18Matrix Interpretation
documents
topics
documents
C F Q
topics
words
words
documents
vectors
vectors
documents
C U D VT
words
words
vectors
vectors
19The Function of Semantic Memory
- Prediction of needed concepts aids retrieval
- Generalization aided by a generative model
- One generative model mixtures of topics
- Gives non-negative, non-orthogonal factorization
of word-document co-occurrence matrix
20A Probabilistic Approach
- The function of semantic memory
- The psychological problem of meaning
- One approach to meaning
- Solving the statistical problem of meaning
- Maximum likelihood estimation
- Bayesian statistics
- Comparisons with Latent Semantic Analysis
- Quantitative
- Qualitative
21The Statistical Problem of Meaning
- Generating data from parameters easy
- Learning parameters from data is hard
- Two approaches to this problem
- Maximum likelihood estimation
- Bayesian statistics
22Inverting the Generative Model
- Maximum likelihood estimation
- Variational EM (Blei, Ng Jordan, 2002)
- Bayesian inference
WT DT parameters
WT T parameters
0 parameters
23Bayesian Inference
- Sum in the denominator over Tn terms
- Full posterior only tractable to a constant
24Markov Chain Monte Carlo
- Sample from a Markov chain which converges to
target distribution - Allows sampling from an unnormalized posterior
distribution - Can compute approximate statistics from
intractable distributions
(MacKay, 2002)
25Gibbs Sampling
- For variables x1, x2, , xn
- Draw xi(t) from P(xix-i)
- x-i x1(t), x2(t),, xi-1(t), xi1(t-1), ,
xn(t-1)
26Gibbs Sampling
(MacKay, 2002)
27Gibbs Sampling
- Need full conditional distributions for variables
- Since we only sample z we need
number of times word w assigned to topic j
number of times topic j used in document d
28Gibbs Sampling
iteration 1
29Gibbs Sampling
iteration 1 2
30Gibbs Sampling
iteration 1 2
31Gibbs Sampling
iteration 1 2
32Gibbs Sampling
iteration 1 2
33Gibbs Sampling
iteration 1 2
34Gibbs Sampling
iteration 1 2
35Gibbs Sampling
iteration 1 2
36Gibbs Sampling
iteration 1 2
1000
37A Visual Example Bars
sample each pixel from a mixture of topics
pixel word image document
38A Visual Example Bars
39From 1000 Images
40Interpretable Decomposition
- SVD gives a basis for the data, but not an
interpretable one - The true basis is not orthogonal, so rotation
does no good
41Application to Corpus Data
- TASA corpus text from first grade to college
- Vocabulary of 26414 words
- Set of 36999 documents
- Approximately 6 million words in corpus
42A Selection of Topics
THIRD FIRST SECOND THREE FOURTH FOUR GRADE TWO FIF
TH SEVENTH SIXTH EIGHTH HALF SEVEN SIX SINGLE NINT
H END TENTH ANOTHER
BRAIN NERVE SENSE SENSES ARE NERVOUS NERVES BODY S
MELL TASTE TOUCH MESSAGES IMPULSES CORD ORGANS SPI
NAL FIBERS SENSORY PAIN IS
CURRENT ELECTRICITY ELECTRIC CIRCUIT IS ELECTRICAL
VOLTAGE FLOW BATTERY WIRE WIRES SWITCH CONNECTED
ELECTRONS RESISTANCE POWER CONDUCTORS CIRCUITS TUB
E NEGATIVE
NATURE WORLD HUMAN PHILOSOPHY MORAL KNOWLEDGE THOU
GHT REASON SENSE OUR TRUTH NATURAL EXISTENCE BEING
LIFE MIND ARISTOTLE BELIEVED EXPERIENCE REALITY
ART PAINT ARTIST PAINTING PAINTED ARTISTS MUSEUM W
ORK PAINTINGS STYLE PICTURES WORKS OWN SCULPTURE P
AINTER ARTS BEAUTIFUL DESIGNS PORTRAIT PAINTERS
STUDENTS TEACHER STUDENT TEACHERS TEACHING CLASS C
LASSROOM SCHOOL LEARNING PUPILS CONTENT INSTRUCTIO
N TAUGHT GROUP GRADE SHOULD GRADES CLASSES PUPIL G
IVEN
SPACE EARTH MOON PLANET ROCKET MARS ORBIT ASTRONAU
TS FIRST SPACECRAFT JUPITER SATELLITE SATELLITES A
TMOSPHERE SPACESHIP SURFACE SCIENTISTS ASTRONAUT S
ATURN MILES
THEORY SCIENTISTS EXPERIMENT OBSERVATIONS SCIENTIF
IC EXPERIMENTS HYPOTHESIS EXPLAIN SCIENTIST OBSERV
ED EXPLANATION BASED OBSERVATION IDEA EVIDENCE THE
ORIES BELIEVED DISCOVERED OBSERVE FACTS
43A Selection of Topics
JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTU
NITIES WORKING TRAINING SKILLS CAREERS POSITIONS F
IND POSITION FIELD OCCUPATIONS REQUIRE OPPORTUNITY
EARN ABLE
SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK
RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BI
OLOGY FIELD PHYSICS LABORATORY STUDIES WORLD SCIEN
TIST STUDYING SCIENCES
BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIEL
D PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNI
S TEAMS GAMES SPORTS BAT TERRY
FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POL
ES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORC
E MAGNETS BE MAGNETISM POLE INDUCED
STORY STORIES TELL CHARACTER CHARACTERS AUTHOR REA
D TOLD SETTING TALES PLOT TELLING SHORT FICTION AC
TION TRUE EVENTS TELLS TALE NOVEL
MIND WORLD DREAM DREAMS THOUGHT IMAGINATION MOMENT
THOUGHTS OWN REAL LIFE IMAGINE SENSE CONSCIOUSNES
S STRANGE FEELING WHOLE BEING MIGHT HOPE
DISEASE BACTERIA DISEASES GERMS FEVER CAUSE CAUSED
SPREAD VIRUSES INFECTION VIRUS MICROORGANISMS PER
SON INFECTIOUS COMMON CAUSING SMALLPOX BODY INFECT
IONS CERTAIN
WATER FISH SEA SWIM SWIMMING POOL LIKE SHELL SHARK
TANK SHELLS SHARKS DIVING DOLPHINS SWAM LONG SEAL
DIVE DOLPHIN UNDERWATER
44A Selection of Topics
JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTU
NITIES WORKING TRAINING SKILLS CAREERS POSITIONS F
IND POSITION FIELD OCCUPATIONS REQUIRE OPPORTUNITY
EARN ABLE
SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK
RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BI
OLOGY FIELD PHYSICS LABORATORY STUDIES WORLD SCIEN
TIST STUDYING SCIENCES
BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIEL
D PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNI
S TEAMS GAMES SPORTS BAT TERRY
FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POL
ES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORC
E MAGNETS BE MAGNETISM POLE INDUCED
STORY STORIES TELL CHARACTER CHARACTERS AUTHOR REA
D TOLD SETTING TALES PLOT TELLING SHORT FICTION AC
TION TRUE EVENTS TELLS TALE NOVEL
MIND WORLD DREAM DREAMS THOUGHT IMAGINATION MOMENT
THOUGHTS OWN REAL LIFE IMAGINE SENSE CONSCIOUSNES
S STRANGE FEELING WHOLE BEING MIGHT HOPE
DISEASE BACTERIA DISEASES GERMS FEVER CAUSE CAUSED
SPREAD VIRUSES INFECTION VIRUS MICROORGANISMS PER
SON INFECTIOUS COMMON CAUSING SMALLPOX BODY INFECT
IONS CERTAIN
WATER FISH SEA SWIM SWIMMING POOL LIKE SHELL SHARK
TANK SHELLS SHARKS DIVING DOLPHINS SWAM LONG SEAL
DIVE DOLPHIN UNDERWATER
45A Probabilistic Approach
- The function of semantic memory
- The psychological problem of meaning
- One approach to meaning
- Solving the statistical problem of meaning
- Maximum likelihood estimation
- Bayesian statistics
- Comparisons with Latent Semantic Analysis
- Quantitative
- Qualitative
46Probabilistic Queries
- can be computed in different ways
- Fixed topic assumption
- Multiple samples
47Quantitative Comparisons
- Two types of task
- general semantic tasks dictionary, thesaurus
- prediction of memory data
- All tests use LSA with 400 vectors, and a
probabilistic model with 100 samples each using
500 topics
48Fill in the Blank
- 12856 sentences extracted from WordNet
- Overall performance
- LSA gives median rank of 3393
- Probabilistic model gives median rank of 3344
his cold deprived him of his sense of _ silence
broken by dogs barking _ a _ hybrid accent
49Fill in the Blank
50Synonyms
- 280 sets of five synonyms from WordNet, ordered
by number of senses - Two tasks
- Predict first synonym
- Predict last synonym
- Increasing number of synonyms
BREAK (78) EXPOSE (9) DISCOVER (8) DECLARE
(7) REVEAL (3) CUT (72) REDUCE
(19) CONTRACT (12) SHORTEN (5) ABRIDGE
(1) RUN (53) GO (34) WORK (25)
FUNCTION (9) OPERATE (7)
51First Synonym
52Last Synonym
53Synonyms and Word Frequency
54Synonyms and Word Frequency
Probabilistic
LSA
55Synonyms and Word Frequency
Probabilistic
LSA
56Word Frequency and Filling Blanks
LSA
Probabilistic
57Performance on Semantic Tasks
- Performance comparable, neither great
- Difference in effects of word frequency due to
treatment of co-occurrence data - Probabilistic approach useful in addressing
psychological data frequency important
58Intrusions in Free Recall
CHAIR FOOD DESK TOP LEG EAT CLOTH DISH WOOD DINNER
MARBLE TENNIS
- Intrusion rates from Deese (1959)
- Used average word vectors in LSA, P(wordlist) in
probabilistic model - Favors LSA, since probabilistic combination can
be multimodal
59Intrusions in Free Recall
60Intrusions in Free Recall
word frequency
models
61Word Frequency is Not Enough
- An explanation needs to address two questions
- Why do these words intrude?
- Why do other words not intrude?
62Word Frequency is Not Enough
- An explanation needs to address two questions
- Why do these words intrude?
- Why do other words not intrude?
- Median word frequency rank 1698.5
- Median rank in model 21
63Word Association
- Word association norms from Nelson et al. (1998)
PLANETS
people EARTH STARS SPACE
SUN MARS UNIVERSE SATURN GALAXY
model STARS STAR SUN
EARTH SPACE SKY PLANET UNIVERSE
associate number 1 2 3 4 5 6 7 8
64Word Association
65Performance on Memory Tasks
- Outperforms LSA on simple memory tasks, both far
better at predicting memory data - Improvement due to role of word frequency
- Not a complete account, but can form a part of
more complex memory models
66Qualitative Comparisons
- Naturally deals with complications for LSA
- Polysemy
- Asymmetry
- Respects natural statistics of language
- Easily extends to other models of meaning
67Beyond the Bag of Words
q
z
z
z
w
w
w
68Beyond the Bag of Words
q
q
z
z
z
z
z
z
w
w
w
w
w
w
s
s
s
69Semantic categories
PLANTS PLANT LEAVES SEEDS SOIL ROOTS FLOWERS WATER
FOOD GREEN SEED STEMS FLOWER STEM LEAF ANIMALS RO
OT POLLEN GROWING GROW
GOLD IRON SILVER COPPER METAL METALS STEEL CLAY LE
AD ADAM ORE ALUMINUM MINERAL MINE STONE MINERALS P
OT MINING MINERS TIN
BEHAVIOR SELF INDIVIDUAL PERSONALITY RESPONSE SOCI
AL EMOTIONAL LEARNING FEELINGS PSYCHOLOGISTS INDIV
IDUALS PSYCHOLOGICAL EXPERIENCES ENVIRONMENT HUMAN
RESPONSES BEHAVIORS ATTITUDES PSYCHOLOGY PERSON
CELLS CELL ORGANISMS ALGAE BACTERIA MICROSCOPE MEM
BRANE ORGANISM FOOD LIVING FUNGI MOLD MATERIALS NU
CLEUS CELLED STRUCTURES MATERIAL STRUCTURE GREEN M
OLDS
DOCTOR PATIENT HEALTH HOSPITAL MEDICAL CARE PATIEN
TS NURSE DOCTORS MEDICINE NURSING TREATMENT NURSES
PHYSICIAN HOSPITALS DR SICK ASSISTANT EMERGENCY P
RACTICE
BOOK BOOKS READING INFORMATION LIBRARY REPORT PAGE
TITLE SUBJECT PAGES GUIDE WORDS MATERIAL ARTICLE
ARTICLES WORD FACTS AUTHOR REFERENCE NOTE
MAP NORTH EARTH SOUTH POLE MAPS EQUATOR WEST LINES
EAST AUSTRALIA GLOBE POLES HEMISPHERE LATITUDE PL
ACES LAND WORLD COMPASS CONTINENTS
FOOD FOODS BODY NUTRIENTS DIET FAT SUGAR ENERGY MI
LK EATING FRUITS VEGETABLES WEIGHT FATS NEEDS CARB
OHYDRATES VITAMINS CALORIES PROTEIN MINERALS
70Syntactic categories
BE MAKE GET HAVE GO TAKE DO FIND USE SEE HELP KEEP
GIVE LOOK COME WORK MOVE LIVE EAT BECOME
ONE SOME MANY TWO EACH ALL MOST ANY THREE THIS EVE
RY SEVERAL FOUR FIVE BOTH TEN SIX MUCH TWENTY EIGH
T
HE YOU THEY I SHE WE IT PEOPLE EVERYONE OTHERS SCI
ENTISTS SOMEONE WHO NOBODY ONE SOMETHING ANYONE EV
ERYBODY SOME THEN
MORE SUCH LESS MUCH KNOWN JUST BETTER RATHER GREAT
ER HIGHER LARGER LONGER FASTER EXACTLY SMALLER SOM
ETHING BIGGER FEWER LOWER ALMOST
ON AT INTO FROM WITH THROUGH OVER AROUND AGAINST A
CROSS UPON TOWARD UNDER ALONG NEAR BEHIND OFF ABOV
E DOWN BEFORE
THE HIS THEIR YOUR HER ITS MY OUR THIS THESE A AN
THAT NEW THOSE EACH MR ANY MRS ALL
GOOD SMALL NEW IMPORTANT GREAT LITTLE LARGE BIG
LONG HIGH DIFFERENT SPECIAL OLD STRONG YOUNG COMMO
N WHITE SINGLE CERTAIN
SAID ASKED THOUGHT TOLD SAYS MEANS CALLED CRIED SH
OWS ANSWERED TELLS REPLIED SHOUTED EXPLAINED LAUGH
ED MEANT WROTE SHOWED BELIEVED WHISPERED
71Sentence generation
RESEARCH S THE CHIEF WICKED SELECTION OF
RESEARCH IN THE BIG MONTHS S EXPLANATIONS S
IN THE PHYSICISTS EXPERIMENTS S HE MUST QUIT
THE USE OF THE CONCLUSIONS S ASTRONOMY PEERED
UPON YOUR SCIENTISTS DOOR S ANATOMY ESTABLISHED
WITH PRINCIPLES EXPECTED IN BIOLOGY S ONCE BUT
KNOWLEDGE MAY GROW S HE DECIDED THE MODERATE
SCIENCE LANGUAGE S RESEARCHERS GIVE THE
SPEECH S THE SOUND FEEL NO LISTENERS S WHICH
WAS TO BE MEANING S HER VOCABULARIES STOPPED
WORDS S HE EXPRESSLY WANTED THAT BETTER VOWEL
72Sentence generation
LAW S BUT THE CRIME HAD BEEN SEVERELY POLITE
OR CONFUSED S CUSTODY ON ENFORCEMENT RIGHTS IS
PLENTIFUL CLOTHING S WEALTHY COTTON PORTFOLIO
WAS OUT OF ALL SMALL SUITS S HE IS CONNECTING
SNEAKERS S THUS CLOTHING ARE THOSE OF
CORDUROY S THE FIRST AMOUNTS OF FASHION IN THE
SKIRT S GET TIGHT TO GET THE EXTENT OF THE
BELTS S ANY WARDROBE CHOOSES TWO SHOES THE
ARTS S SHE INFURIATED THE MUSIC S ACTORS
WILL MANAGE FLOATING FOR JOY S THEY ARE A SCENE
AWAY WITH MY THINKER S IT MEANS A CONCLUSION
73Conclusion
- Taking a probabilistic approach can clarify
some of the central issues in semantic
representation - Motivates sensitivity to co-occurrence statistics
- Identifies how co-occurrence data should be used
- Allows the role of meaning to be specified
exactly, and finds a meaningful decomposition of
language
74(No Transcript)
75Probabilities and Inner Products
- Single word
- List of words
w
76(No Transcript)
77Model Selection
- How many topics does a language contain?
- Major issue for parametric models
- Not so much for non-parametric models
- Dirichlet process mixtures
- Expect more topics than tractable
- Choice of number is choice of scale
78(No Transcript)
79(No Transcript)
80Gibbs Sampling and EM
- How many topics does a language contain?
- EM finds fixed set of topics, single estimate
- Sampling allows for multiple sets of topics, and
multimodal posterior distributions
81(No Transcript)
82(No Transcript)
83Natural Statistics
- Treating co-occurrence data as frequencies
preserves the natural statistics of language - Word frequency
- Zipfs Law of Meaning
84Natural Statistics
85Natural Statistics
86Natural Statistics
87(No Transcript)
88Word Association
CROWN
people KING JEWEL QUEEN
HEAD HAT TOP ROYAL THRONE
model KING TEETH HAIR
TOOTH ENGLAND MOUTH QUEEN PRINCE
89Word Association
SANTA
people CHRISTMAS TOYS LIE
model MEXICO SPANISH
CALIFORNIA
90(No Transcript)