Nonparametric Bayes and human cognition - PowerPoint PPT Presentation

About This Presentation

Title:

Nonparametric Bayes and human cognition

Description:

Title: Topics in statistical language modeling Author: Tom Griffiths Last modified by: Thomas Griffiths Created Date: 10/12/2003 4:57:45 PM Document presentation format – PowerPoint PPT presentation

Number of Views:110

Avg rating:3.0/5.0

Slides: 34

Provided by: TomG65

Learn more at: https://cocosci.princeton.edu

Category:

more less

Transcript and Presenter's Notes

Title: Nonparametric Bayes and human cognition

1
Nonparametric Bayes and human cognition

Tom Griffiths
Department of Psychology
Program in Cognitive Science
University of California, Berkeley

2
(No Transcript)
3
Analyzing psychological data

Dirichlet process mixture models for capturing
individual differences
(Navarro, Griffiths, Steyvers, Lee, 2006)
Infinite latent feature models
for features influencing similarity
(Navarro Griffiths, 2007 2008)
for features influencing decisions
()

4
(No Transcript)
5
Flexible mental representations

Dirichlet

6
Categorization

How do people represent categories?

7
Prototypes
cat
cat
cat
cat
cat
(Posner Keele, 1968 Reed, 1972)
8
Exemplars
cat
cat
cat
Store every instance (exemplar) in memory
cat
cat
(Medin Schaffer, 1978 Nosofsky, 1986)
9
Something in between
cat
cat
cat
cat
cat
(Love et al., 2004 Vanpaemel et al., 2005)
10
A computational problem

Categorization is a classic inductive problem
data stimulus x
hypotheses category c
We can apply Bayes rule
and choose c such that P(cx) is maximized

11
Density estimation

We need to estimate some probability
distributions
what is P(c)?
what is p(xc)?
Two approaches
parametric
nonparametric
These approaches correspond to prototype and
exemplar models respectively
(Ashby Alfonso-Reese, 1995)

12
Parametric density estimation

Assume that p(xc) has a simple form,
characterized by parameters ? (indicating the
prototype)

Probability density
x
13
Nonparametric density estimation
Approximate a probability distribution as a
sum of many kernels (one per data point)
estimated function individual kernels true
function
n 10
Probability density
x
14
Something in between
Use a mixture distribution, with more than
one component per data point
mixture distribution mixture components
Probability
x
(Rosseel, 2002)
15
Andersons rational model(Anderson, 1990, 1991)

Treat category labels like any other feature
Define a joint distribution p(x,c) on features
using a mixture model, breaking objects into
clusters
Allow the number of clusters to vary

a Dirichlet process mixture model (Neal, 1998
Sanborn et al., 2006)
16
A unifying rational model

Density estimation is a unifying framework
a way of viewing models of categorization
We can go beyond this to define a unifying model
one model, of which all others are special cases
Learners can adopt different representations by
adaptively selecting between these cases
Basic tool two interacting levels of clusters
results from the hierarchical Dirichlet process
(Teh, Jordan, Beal, Blei, 2004)

17
The hierarchical Dirichlet process
18
A unifying rational model
cluster
exemplar
category
19
HDP,? and Smith Minda (1998)

HDP,? will automatically infer a representation
using exemplars, prototypes, or something in
between (with ? being learned from the data)
Test on Smith Minda (1998, Experiment 2)

111111 011111 101111 110111 111011 111110 000100
000000 100000 010000 001000 000010 000001 111101
Category A
Category B
20
HDP,? and Smith Minda (1998)
prototype
Probability of A
exemplar
HDP
21
The promise of HDP,

In HDP,, clusters are shared between categories
a property of hierarchical Bayesian models
Learning one category has a direct effect on the
prior on probability densities for the next
category

22
Learning the features of objects

Most models of human cognition assume objects are
represented in terms of abstract features
What are the features of this object?
What determines what features we identify?

(Austerweil Griffiths, submitted)
23
(No Transcript)
24
(No Transcript)
25
Binary matrix factorization
?
26
Binary matrix factorization
?
27
The nonparametric approach

Assume that the total number of features is
unbounded, but only a finite number will be
expressed in any finite dataset

?
Use the Indian buffet process as a prior on Z
(Griffiths Ghahramani, 2006)
28
(Austerweil Griffiths, submitted)
29
An experiment
Training
Testing
Seen
Correlated
Unseen
Factorial
Shuffled
(Austerweil Griffiths, submitted)
30
Results
(Austerweil Griffiths, submitted)
31
Conclusions