Latent Semantic Analysis: - PowerPoint PPT Presentation

About This Presentation

Title:

Latent Semantic Analysis:

Description:

How can Plato's problem be solved? What kind of solution do we need? ... homographs: i.e. testing for 'mole' (the animal) versus 'mole' (the beauty mark) ... – PowerPoint PPT presentation

Number of Views:75

Avg rating:3.0/5.0

Slides: 26

Provided by: chriswe

Category:

more less

Transcript and Presenter's Notes

Title: Latent Semantic Analysis:

1
Latent Semantic Analysis

Is it a solution to Platos problem?
And 10 other questions answers.

2
10 questions

How did this paper change our lives?
What is Plato's problem?
Oh no! Not more philosophy?
How can Platos problem be solved?
What kind of solution do we need?
What is latent semantic analysis?
How is an LSA model constructed?
How is the LSA model used?
Whats a cosine between vectors?
What are some cool empirical findings?
Is LSA psychologically plausible?

3
How did this paper change our lives?

Because I saw a talk by Landauer on this work, I
became interested in latent semantic analysis
LSA
Because I was interested in LSA, I became
interested Curt Burgess's HAL model.
Because I was interested in HAL, I decided to
come to Edmonton, where Lori Buchanan was
working on it
Because I came to Edmonton- here I am teaching
Psych 357.
If Landauer hadnt written this paper, we
probably wouldnt have the mutual pleasure of
knowing each other as we do.

4
What is Plato's problem?

Meno (in the Platonic dialog named after him)
asks How can one ever investigate what one does
not know?
He saw two problems
i.) How can you propose what you do not know as
the object of your search?
ii.) How will you recognize what you do not know
as the thing you did not know if you do (by
chance) find it?
More generally, the problem is that there is a
gap between what we experience and what we know,
with the latter seeming to be larger than the
former is able to support.

5
Oh no! Not more philosophy?

Not at all (indeed, the opposite)
Plato's problem is exactly the poverty of the
stimulus/failure of induction problem
It is thus central to syntactic knowledge as well
as to many other dimensions of linguistic
knowledge (wherever we make fine-grained untaught
distinctions e.g. prosody, phonology, and
semantics).

6
How can Platos problem be solved?

i.) Plato's solution was recollection of
knowledge gained in a previous life, famously
demonstrated in the Meno by showing that a slave
boy 'knows' the Pythagorean Theorem
ii.) Some favour the idea of innate knowledge,
the modern equivalent of recollection of a
previous life
The basic common principle is one we already know
and love in Psych357 we need some source of
strong additional constraints on the problems (
information) to narrow down the size of the
search space.

7
What kind of solution do we need?

That is What properties are desirable in a
scientifically-acceptable explanation of how
constraints on a search space operate?
i.) They must be sufficient.
ii.) They must be well-defined.
iii.) They must be psychologically-plausible

8
What is latent semantic analysis?

LSA is an algorithmically well-defined way of
measuring lexical co-occurrence in some set of
text
The assumption is that co-occurrence says
something about semantics words about the same
things are likely to occur in the same contexts.
If we have many words and contexts, small
differences in co-occurrence probabilities can be
compiled together to give information about
semantics.
Think of 20 questions No single question might
be sufficient to identify an unknown object, but
20 questions usually are sufficient

9
How is an LSA model constructed?

i.) Build a matrix with rows representing words
and columns representing context (a document or
word string)
ii.) Enter in each cell ( a word X document
intersection) a count of many times that word
occurred in that document
iii.) Transform the matrix

i.) Build a matrix with rows representing words
and columns representing context (a document or
word string)

Sonnets Learn C A day at the zoo
dog
zebra
computer

11

ii.) Enter in each cell ( a word X document
intersection) a count of many times that word
occurred in that document

Sonnets Learn C A day at the zoo
dog 6 1 7
zebra 0 2 46
computer 0 123 0

12
(No Transcript)
13
(No Transcript)
14

iii.) Transform the matrix
a.) Control for word frequency
The log transform compresses the effects of
frequency
b.) Control for the number of contexts each word
appeared in
Words that occur in few contexts are more
informative about those contexts ( reduce
uncertainty about their context more) than words
that appear in many different contexts
Eg. Knowing the word computer was common
places more constraints on what the document is
about than knowing the word the was common

iii.) Transform the matrix
c.) Singular value decomposition
This reduces dimensionality by 'projecting' the
tens of thousands of context dimensions onto a
smaller number (roughly 300).
A mathematical projection is roughly the same as
real projection Think of shining a light through
a three dimensional pattern and tracing the
shadow it casts to get a two-dimensional
projection
The 'discarded' dimensions are those that are
least informative have low variance are
redundant (e.g. a word like 'the' occurred in
every context or a word like 'anti-disestablishmen
tarianism' occurred in hardly any contexts).

16
How is the LSA model used?

To get a measure of how related a word is to
another word, measure the distance between the
columns containing the two words.
This gives you a measure of how different the
contexts of the two words were that is, how
often a word occurred a different number of times
in each context
You can also take the distance between two
document vectors to get a measure of how related
they are.
You can measure distance by taking the cosine
between two vectors

17
Huh? Whats a cosine between vectors?

They probably forget to mention in your Grade 9
trigonometry class (as they did in mine) that
cosine is extensible to dimensions above 2
Typical teaching always the special case, never
the general.
The dot product of two vectors is the sum of the
products of corresponding entries in the two
vectors i.e. (x1x2) (y1y2) (z1z2), for
two vectors of length 3.
The dot product of two vectors is the cosine of
the angle between those two vectors, multiplied
by the lengths of those vectors.
Therefore, cosine is dot product divided by
divided by the product of the two vector lengths

18
What are some cool empirical findings?

i.) LSA models can pass the TOEFL
ii.) LSA can learn the meanings of words it has
never encountered
iii.) LSA can explain some priming effects
iv.) LSA replicates human number judgments
v.) LSA can mark essays
vi.) LSA-like measures predict LD RTs

19
i.) LSA models can pass the TOEFL

On a 4-possibility multiple choice TOEFL, the
model got 51.5 correct (corrected for guessing)
Chance score is 25
Real foreigners hoping to attend American
universities averaged 52.7

20
ii.) LSA can learn the meanings of words it had
never encountered

So can children!
By substituting words with nonsense words and
controlling access, they showed that the model
could learn the meanings of words it had never
encountered
This replicated (and explained) an odd result
which had been found in human children- and
estimated that most word knowledge was inductive
rather than direct.
The result is not odd when you consider that the
meaning of a word is distributed across all
vectors with which it shares contexts.
You can learn a lot about lions, even if you have
never heard of them before, by knowing they are
something like tigers.

21
iii.) LSA can explain some priming effects

The model can explain some priming work using
homographs i.e. testing for 'mole' (the animal)
versus 'mole' (the beauty mark).
If context is marked by (either phonological or
orthographic word form), then these words will
indeed get over-lapping contexts even though they
are semantically different

22
iv.) LSA replicates human number judgments

Previous work has shown that judgments about
number size are best represented on the
assumption that numbers are represented as their
log of their values.
That is, people scale down large numbers
LSA got the same representation using their
contextual occurrences.

23
v.) LSA can mark essays

LSA judgments of the quality of sentences
correlate at r 0.81 with expert ratings
LSA can judge how good an essay (on a
well-defined set topic) is by computing the
average distance between the essay to be marked
and a set of model essays
The correlation are equal to between-human
correlations
If you wrote a good essay and scrambled the
words you would get a good grade," Landauer said.
"But try to get the good words without writing a
good essay!

24
vi.) LSA-like measures predict LD RTs

An LSA-like measure for single words can predict
human RTs in lexical decision
We used 10 words each side of the target word as
a document and got distances between all words
Words close to their nearest neighbours are
recognized more quickly than words far away from
them, after controlling for other known variables

25
Is LSA psychologically plausible?

Well, the above evidence suggests they might be,
and is nicely consistent with much of our talk
about mapping between schemas
Neuro-philosopher Paul Churchland has written
"Explanatory understanding consists of the
activation of a specific prototype vector in a
well-trained network. It consists in the
apprehension of the problematic case as an
instance of a general type, a type for which the
creature has a detailed and well-informed
representation. Such a representation allows the
creature to anticipate aspects of the case so far
unperceived, and to deploy practical techniques
appropriate to the case at hand."
Paul Churchland
A Neurocomputational Perspective The Nature Of
Mind and The Structure Of Science