Latent Semantic Analysis: - PowerPoint PPT Presentation

About This Presentation
Title:

Latent Semantic Analysis:

Description:

How can Plato's problem be solved? What kind of solution do we need? ... homographs: i.e. testing for 'mole' (the animal) versus 'mole' (the beauty mark) ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 26
Provided by: chriswe
Category:

less

Transcript and Presenter's Notes

Title: Latent Semantic Analysis:


1
Latent Semantic Analysis
  • Is it a solution to Platos problem?
  • And 10 other questions answers.

2
10 questions
  • How did this paper change our lives?
  • What is Plato's problem?
  • Oh no! Not more philosophy?
  • How can Platos problem be solved?
  • What kind of solution do we need?
  • What is latent semantic analysis?
  • How is an LSA model constructed?
  • How is the LSA model used?
  • Whats a cosine between vectors?
  • What are some cool empirical findings?
  • Is LSA psychologically plausible?

3
How did this paper change our lives?
  • Because I saw a talk by Landauer on this work, I
    became interested in latent semantic analysis
    LSA
  • Because I was interested in LSA, I became
    interested Curt Burgess's HAL model.
  • Because I was interested in HAL, I decided to
    come to Edmonton, where Lori Buchanan was
    working on it
  • Because I came to Edmonton- here I am teaching
    Psych 357.
  • If Landauer hadnt written this paper, we
    probably wouldnt have the mutual pleasure of
    knowing each other as we do.

4
What is Plato's problem?
  • Meno (in the Platonic dialog named after him)
    asks How can one ever investigate what one does
    not know?
  • He saw two problems
  • i.) How can you propose what you do not know as
    the object of your search?
  • ii.) How will you recognize what you do not know
    as the thing you did not know if you do (by
    chance) find it?
  • More generally, the problem is that there is a
    gap between what we experience and what we know,
    with the latter seeming to be larger than the
    former is able to support.

5
Oh no! Not more philosophy?
  • Not at all (indeed, the opposite)
  • Plato's problem is exactly the poverty of the
    stimulus/failure of induction problem
  • It is thus central to syntactic knowledge as well
    as to many other dimensions of linguistic
    knowledge (wherever we make fine-grained untaught
    distinctions e.g. prosody, phonology, and
    semantics).

6
How can Platos problem be solved?
  • i.) Plato's solution was recollection of
    knowledge gained in a previous life, famously
    demonstrated in the Meno by showing that a slave
    boy 'knows' the Pythagorean Theorem
  • ii.) Some favour the idea of innate knowledge,
    the modern equivalent of recollection of a
    previous life
  • The basic common principle is one we already know
    and love in Psych357 we need some source of
    strong additional constraints on the problems (
    information) to narrow down the size of the
    search space.

7
What kind of solution do we need?
  • That is What properties are desirable in a
    scientifically-acceptable explanation of how
    constraints on a search space operate?
  • i.) They must be sufficient.
  • ii.) They must be well-defined.
  • iii.) They must be psychologically-plausible

8
What is latent semantic analysis?
  • LSA is an algorithmically well-defined way of
    measuring lexical co-occurrence in some set of
    text
  • The assumption is that co-occurrence says
    something about semantics words about the same
    things are likely to occur in the same contexts.
  • If we have many words and contexts, small
    differences in co-occurrence probabilities can be
    compiled together to give information about
    semantics.
  • Think of 20 questions No single question might
    be sufficient to identify an unknown object, but
    20 questions usually are sufficient

9
How is an LSA model constructed?
  • i.) Build a matrix with rows representing words
    and columns representing context (a document or
    word string)
  • ii.) Enter in each cell ( a word X document
    intersection) a count of many times that word
    occurred in that document
  • iii.) Transform the matrix

10
  • i.) Build a matrix with rows representing words
    and columns representing context (a document or
    word string)

Sonnets Learn C A day at the zoo
dog
zebra
computer

11
  • ii.) Enter in each cell ( a word X document
    intersection) a count of many times that word
    occurred in that document

Sonnets Learn C A day at the zoo
dog 6 1 7
zebra 0 2 46
computer 0 123 0

12
(No Transcript)
13
(No Transcript)
14
  • iii.) Transform the matrix
  • a.) Control for word frequency
  • The log transform compresses the effects of
    frequency
  • b.) Control for the number of contexts each word
    appeared in
  • Words that occur in few contexts are more
    informative about those contexts ( reduce
    uncertainty about their context more) than words
    that appear in many different contexts
  • Eg. Knowing the word computer was common
    places more constraints on what the document is
    about than knowing the word the was common

15
  • iii.) Transform the matrix
  • c.) Singular value decomposition
  • This reduces dimensionality by 'projecting' the
    tens of thousands of context dimensions onto a
    smaller number (roughly 300).
  • A mathematical projection is roughly the same as
    real projection Think of shining a light through
    a three dimensional pattern and tracing the
    shadow it casts to get a two-dimensional
    projection
  • The 'discarded' dimensions are those that are
    least informative have low variance are
    redundant (e.g. a word like 'the' occurred in
    every context or a word like 'anti-disestablishmen
    tarianism' occurred in hardly any contexts).

16
How is the LSA model used?
  • To get a measure of how related a word is to
    another word, measure the distance between the
    columns containing the two words.
  • This gives you a measure of how different the
    contexts of the two words were that is, how
    often a word occurred a different number of times
    in each context
  • You can also take the distance between two
    document vectors to get a measure of how related
    they are.
  • You can measure distance by taking the cosine
    between two vectors

17
Huh? Whats a cosine between vectors?
  • They probably forget to mention in your Grade 9
    trigonometry class (as they did in mine) that
    cosine is extensible to dimensions above 2
  • Typical teaching always the special case, never
    the general.
  • The dot product of two vectors is the sum of the
    products of corresponding entries in the two
    vectors i.e. (x1x2) (y1y2) (z1z2), for
    two vectors of length 3.
  • The dot product of two vectors is the cosine of
    the angle between those two vectors, multiplied
    by the lengths of those vectors.
  • Therefore, cosine is dot product divided by
    divided by the product of the two vector lengths

18
What are some cool empirical findings?
  • i.) LSA models can pass the TOEFL
  • ii.) LSA can learn the meanings of words it has
    never encountered
  • iii.) LSA can explain some priming effects
  • iv.) LSA replicates human number judgments
  • v.) LSA can mark essays
  • vi.) LSA-like measures predict LD RTs

19
i.) LSA models can pass the TOEFL
  • On a 4-possibility multiple choice TOEFL, the
    model got 51.5 correct (corrected for guessing)
  • Chance score is 25
  • Real foreigners hoping to attend American
    universities averaged 52.7

20
ii.) LSA can learn the meanings of words it had
never encountered
  • So can children!
  • By substituting words with nonsense words and
    controlling access, they showed that the model
    could learn the meanings of words it had never
    encountered
  • This replicated (and explained) an odd result
    which had been found in human children- and
    estimated that most word knowledge was inductive
    rather than direct.
  • The result is not odd when you consider that the
    meaning of a word is distributed across all
    vectors with which it shares contexts.
  • You can learn a lot about lions, even if you have
    never heard of them before, by knowing they are
    something like tigers.

21
iii.) LSA can explain some priming effects
  • The model can explain some priming work using
    homographs i.e. testing for 'mole' (the animal)
    versus 'mole' (the beauty mark).
  • If context is marked by (either phonological or
    orthographic word form), then these words will
    indeed get over-lapping contexts even though they
    are semantically different

22
iv.) LSA replicates human number judgments
  • Previous work has shown that judgments about
    number size are best represented on the
    assumption that numbers are represented as their
    log of their values.
  • That is, people scale down large numbers
  • LSA got the same representation using their
    contextual occurrences.

23
v.) LSA can mark essays
  • LSA judgments of the quality of sentences
    correlate at r 0.81 with expert ratings
  • LSA can judge how good an essay (on a
    well-defined set topic) is by computing the
    average distance between the essay to be marked
    and a set of model essays
  • The correlation are equal to between-human
    correlations
  • If you wrote a good essay and scrambled the
    words you would get a good grade," Landauer said.
    "But try to get the good words without writing a
    good essay!

24
vi.) LSA-like measures predict LD RTs
  • An LSA-like measure for single words can predict
    human RTs in lexical decision
  • We used 10 words each side of the target word as
    a document and got distances between all words
  • Words close to their nearest neighbours are
    recognized more quickly than words far away from
    them, after controlling for other known variables

25
Is LSA psychologically plausible?
  • Well, the above evidence suggests they might be,
    and is nicely consistent with much of our talk
    about mapping between schemas
  • Neuro-philosopher Paul Churchland has written
  • "Explanatory understanding consists of the
    activation of a specific prototype vector in a
    well-trained network. It consists in the
    apprehension of the problematic case as an
    instance of a general type, a type for which the
    creature has a detailed and well-informed
    representation. Such a representation allows the
    creature to anticipate aspects of the case so far
    unperceived, and to deploy practical techniques
    appropriate to the case at hand."
  • Paul Churchland
  • A Neurocomputational Perspective The Nature Of
    Mind and The Structure Of Science
Write a Comment
User Comments (0)
About PowerShow.com