Generative Models - PowerPoint PPT Presentation

About This Presentation
Title:

Generative Models

Description:

Generative Models – PowerPoint PPT presentation

Number of Views:124
Avg rating:3.0/5.0
Slides: 116
Provided by: tamar4
Category:

less

Transcript and Presenter's Notes

Title: Generative Models


1
Generative Models
  • Tamara L Berg
  • Stony Brook University

2
Generative vs Discriminative
  • Discriminative version build a classifier to
    discriminate between monkeys and non-monkeys.

P(monkeyimage)
3
Generative vs Discriminative
  • Generative version - build a model that generates
    images containing monkeys

P(imagemonkey)
P(imagenot monkey)
4
Generative vs Discriminative
  • Can use Bayes rule to compute p(monkeyimage) if
    we know p(imagemonkey)

5
Generative vs Discriminative
  • Can use Bayes rule to compute p(monkeyimage) if
    we know p(imagemonkey)

Discriminative
Generative
6
Talk Outline
  • 1. Quick introduction to graphical models
  • 2. Bag of words models
  • - What are they?
  • - Examples Naïve Bayes, pLSA, LDA

7
Talk Outline
  • 1. Quick introduction to graphical models
  • 2. Bag of words models
  • - What are they?
  • - Examples Naïve Bayes, pLSA, LDA

8
Slide from Dan Klein
9
Random Variables
Random variables
10
Random Variables
Random variables
A random variable is some aspect of the world
about which we (may) have uncertainty. Random
variables can be Binary (e.g. true,false,
spam/ham), Take on a discrete set of values
(e.g. Spring, Summer, Fall, Winter), Or be
continuous (e.g. 0 1).
11
Joint Probability Distribution
Random variables
Also written
Gives a real value for all possible assignments.
12
Queries
Also written
Given a joint distribution, we can reason about
unobserved variables given observations
(evidence)
Stuff you care about
Stuff you already know
13
Representation
Also written
14
Representation
Also written
Graphical Models!
15
Representation
Also written
Graphical models represent joint probability
distributions more economically, using a set of
local relationships among variables.
16
Graphical Models
  • Graphical models offer several useful properties
  • 1. They provide a simple way to visualize the
    structure of a probabilistic model and can be
    used to design and motivate new models.
  • 2. Insights into the properties of the model,
    including conditional independence properties,
    can be obtained by inspection of the graph.
  • 3. Complex computations, required to perform
    inference and learning in sophisticated models,
    can be expressed in terms of graphical
    manipulations, in which underlying mathematical
    expressions are carried along implicitly.

from Chris Bishop
17
Main kinds of models
  • Undirected (also called Markov Random Fields) -
    links express constraints between variables.
  • Directed (also called Bayesian Networks) - have
    a notion of causality -- one can regard an arc
    from A to B as indicating that A "causes" B.

18
Main kinds of models
  • Undirected (also called Markov Random Fields) -
    links express constraints between variables.
  • Directed (also called Bayesian Networks) - have
    a notion of causality -- one can regard an arc
    from A to B as indicating that A "causes" B.

19
Directed Graphical Models
  • Directed Graph, G (X,E)

Nodes
Edges
  • Each node is associated with a random variable

20
Directed Graphical Models
  • Directed Graph, G (X,E)

Nodes
Edges
  • Each node is associated with a random variable
  • Definition of joint probability in a graphical
    model
  • where are the parents of

21
Example
22
Example
Joint Probability
23
Example
24
Conditional Independence
  • Independence
  • Conditional Independence

Or,
25
Conditional Independence
26
Conditional Independence
By Chain Rule (using the usual arithmetic
ordering)
27
Example
Joint Probability
28
Conditional Independence
By Chain Rule (using the usual arithmetic
ordering)
Joint distribution from the example graph
29
Conditional Independence
By Chain Rule (using the usual arithmetic
ordering)
Joint distribution from the example graph
Missing variables in the local conditional
probability functions correspond to missing edges
in the underlying graph. Removing an edge into
node i eliminates an argument from the
conditional probability factor
30
Observations
  • Graphs can have observed (shaded) and unobserved
    nodes. If nodes are always unobserved they are
    called hidden or latent variables
  • Probabilistic inference in graphical models is
    the problem of computing a conditional
    probability distribution over the values of some
    of the nodes (the hidden or unobserved
    nodes), given the values of other nodes (the
    evidence or observed nodes).

31
Inference computing conditional probabilities
Conditional Probabilities
Marginalization
32
Inference Algorithms
  • Exact algorithms
  • Elimination algorithm
  • Sum-product algorithm
  • Junction tree algorithm
  • Sampling algorithms
  • Importance sampling
  • Markov chain Monte Carlo
  • Variational algorithms
  • Mean field methods
  • Sum-product algorithm and variations
  • Semidefinite relaxations

33
Talk Outline
  • 1. Quick introduction to graphical models
  • 2. Bag of words models
  • - What are they?
  • - Examples Naïve Bayes, pLSA, LDA

34
Exchangeability
  • De Finetti Theorem of exchangeability (bag of
    words theorem) the joint probability
    distribution underlying the data is invariant to
    permutation.

35
Plates
  • Plates - "macro" that allows subgraphs to be
    replicated (graphical representation of the De
    Finetti theorem).

36
Bag of words for text
  • Represent documents as a bags of words

37
Example
  • Doc1 the quick brown fox jumped
  • Doc2 brown quick jumped fox the
  • Would a bag of words model represent these two
    documents differently?

38
Bag of words for images
  • Represent images as a bag of words

39
Talk Outline
  • 1. Quick introduction to graphical models
  • 2. Bag of words models
  • - What are they?
  • - Examples Naïve Bayes, pLSA, LDA

40
A Simple Example Naïve Bayes
C Class F - Features
We only specify (parameters) prior
over class labels how
each feature depends on the class
41
A Simple Example Naïve Bayes
C Class F - Features
n
We only specify (parameters) prior
over class labels how
each feature depends on the class
42
Slide from Dan Klein
43
Slide from Dan Klein
44
Slide from Dan Klein
45
Percentage of documents in training set labeled
as spam/ham
Slide from Dan Klein
46
In the documents labeled as spam, occurrence
percentage of each word (e.g. times the
occurred/ total words).
Slide from Dan Klein
47
In the documents labeled as ham, occurrence
percentage of each word (e.g. times the
occurred/ total words).
Slide from Dan Klein
48
Classification
The class that maximizes
49
Classification
  • In practice
  • Multiplying lots of small probabilities can
    result in floating point underflow

50
Classification
  • In practice
  • Multiplying lots of small probabilities can
    result in floating point underflow
  • Since log(xy) log(x) log(y), we can sum log
    probabilities instead of multiplying
    probabilities.

51
Classification
  • In practice
  • Multiplying lots of small probabilities can
    result in floating point underflow
  • Since log(xy) log(x) log(y), we can sum log
    probabilities instead of multiplying
    probabilities.
  • Since log is a monotonic function, the class with
    the highest score does not change.

52
Classification
  • In practice
  • Multiplying lots of small probabilities can
    result in floating point underflow
  • Since log(xy) log(x) log(y), we can sum log
    probabilities instead of multiplying
    probabilities.
  • Since log is a monotonic function, the class with
    the highest score does not change.
  • So, what we usually compute in practice is

53
Naïve Bayes for modeling text/metadata topics
54
Harvesting Image Databases from the Web Schroff,
F. , Criminisi, A. and Zisserman, A.
  • Download images from the web via a search query
    (e.g. penguin).
  • Re-rank images using a naïve Bayes model trained
    on text surrounding the images and meta-data
    features (image alt tag, image title tag, image
    filename).
  • Top ranked images used to train an SVM classifier
    to further improve ranking.

55
Results
56
Results
57
Naive Bayes is Not So Naive
  • Naïve Bayes First and Second place in KDD-CUP 97
    competition, among 16 (then) state of the art
    algorithms
  • Goal Financial services industry direct mail
    response prediction model Predict if the
    recipient of mail will actually respond to the
    advertisement 750,000 records.
  • Robust to Irrelevant Features
  • Irrelevant Features cancel each other without
    affecting results
  • Very good in Domains with many equally important
    features
  • A good dependable baseline for text
    classification (but not the best)!
  • Optimal if the Independence Assumptions hold If
    assumed independence is correct, then it is the
    Bayes Optimal Classifier for problem
  • Very Fast Learning with one pass over the data
    testing linear in the number of attributes, and
    document collection size
  • Low Storage requirements

Slide from Mitch Marcus
58
Naïve Bayes on images
59
Visual Categorization with Bags of
KeypointsGabriella Csurka, Christopher R. Dance,
Lixin Fan, Jutta Willamowski, Cédric Bray
60
Method
  • Steps
  • Detect and describe of image patches
  • Assign patch descriptors to a set of
    predetermined clusters (a vocabulary) with a
    vector quantization algorithm
  • Construct a bag of keypoints, which counts the
    number of patches assigned to each cluster
  • Apply a multi-class classifier (naïve Bayes),
    treating the bag of keypoints as the feature
    vector, and thus determine which category or
    categories to assign to the image.

61
Naïve Bayes
C Class F - Features
We only specify (parameters) prior
over class labels how
each feature depends on the class
62
Naive Bayes Parameters
  • Problem Categorize images as one of 7 object
    classes using Naïve Bayes classifier
  • Classes object categories (face, car, bicycle,
    etc)
  • Features Images represented as a histogram
    where bins are the cluster centers or visual word
    vocabulary. Features are vocabulary counts.
  • treated as uniform.
  • learned from training data images
    labeled with category.

63
Results
64
Salient Object LocalizationBerg Berg
Class independent model to predict saliency of a
given fg/bg division Naïve Bayes Model
65
Perceptual contrast Cues
texture
focus
saturation
hue
value
66
Spatial Cues
Object size and location
67
Naïve Bayes
C Class F - Features
We only specify (parameters) prior
over class labels how
each feature depends on the class
68
Naïve Bayes features
Classes salient/not salient Features Cues
computed on foreground regions Cues computed on
background regions Chi-square distance (contrast)
between foreground and background cues.
69
Naïve Bayes features
Parameters Prior over classes
treated as uniform. Computed from
labeled training data (no overlap with test
categories).
70
Classification
  • For test images (of any category)

Compute likelihood of salient object over all
possible rectangular windows in the image.
Select the best region for each image
Example image
71
Talk Outline
  • 1. Quick introduction to graphical models
  • 2. Bag of words models
  • - What are they?
  • - Examples Naïve Bayes, pLSA, LDA

72
pLSA
73
pLSA
74
Joint Probability
Marginalizing over topics determines the
conditional probability
75
Fitting the model
Need to Determine the topic vectors common to
all documents. Determine the mixture components
specific to each document. Goal a model that
gives high probability to the words that appear
in the corpus. Maximum likelihood estimation of
the parameters is obtained by maximizing the
objective function
76
pLSA on images
77
Discovering objects and their location in
imagesJosef Sivic, Bryan C. Russell, Alexei A.
Efros, Andrew Zisserman, William T. Freeman
Documents Images Words visual words (vector
quantized SIFT descriptors) Topics object
categories Images are modeled as a mixture of
topics (objects).
78
Goals
  • They investigate three areas
  • (i) topic discovery, where categories are
    discovered by pLSA clustering on all available
    images.
  • (ii) classification of unseen images, where
    topics corresponding to object categories are
    learnt on one set of images, and then used to
    determine the object categories present in
    another set.
  • (iii) object detection, where you want to
    determine the location and approximate
    segmentation of object(s) in each image.

79
(i) Topic Discovery
Most likely words for 4 learnt topics (face,
motorbike, airplane, car)
80
(ii) Image Classification
Confusion table for unseen test images against
pLSA trained on images containing four object
categories, but no background images.
81
(ii) Image Classification
Confusion table for unseen test images against
pLSA trained on images containing four object
categories, and background images. Performance is
not quite as good.
82
(iii) Topic Segmentation
83
(iii) Topic Segmentation
84
(iii) Topic Segmentation
85
Talk Outline
  • 1. Quick introduction to graphical models
  • 2. Bag of words models
  • - What are they?
  • - Examples Naïve Bayes, pLSA, LDA

86
LDADavid M Blei, Andrew Y Ng, Michael Jordan
87
LDA
Per-document topic proportions
Per-word topic assignment
Observed word
88
LDA
pLSA
Per-document topic proportions
Per-word topic assignment
Observed word
89
LDA
Per-document topic proportions
Per-word topic assignment
Observed word
Dirichlet parameter
90
LDA
topics
Dirichlet parameter
Per-document topic proportions
Per-word topic assignment
Observed word
91
Generating Documents
92
Joint Distribution
Joint distribution
93
(No Transcript)
94
LDA on text
Topic discovery from a text corpus. Highly ranked
words for 4 topics.
95
LDA in Animals on the Web Tamara L Berg, David
Forsyth
96
Animals on the Web Outline
Harvest pictures of animals from the web using
Google Text Search. Select visual exemplars
using text based information LDA
Use vision and text cues to extend to similar
images.
97
Text Model
Latent Dirichlet Allocation (LDA) on the words
in collected web pages to discover 10 latent
topics for each category. Each topic defines a
distribution over words. Select the 50 most
likely words for each topic.
Example Frog Topics
1.) frog frogs water tree toad leopard green
southern music king irish eggs folk princess
river ball range eyes game species legs golden
bullfrog session head spring book deep spotted de
am free mouse information round poison yellow
upon collection nature paper pond re lived center
talk buy arrow common prince
2.) frog information january links common red
transparent music king water hop tree pictures
pond green people available book call press toad
funny pottery toads section eggs bullet photo
nature march movies commercial november re clear
eyed survey link news boston list frogs bull
sites butterfly court legs type dot blue
98
Select Exemplars
Rank images according to whether they have these
likely words near the image in the associated
page. Select up to 30 images per topic as
exemplars.
2.) frog information january links common red
transparent music king water hop tree pictures
pond green people available book call press ...
1.) frog frogs water tree toad leopard green
southern music king irish eggs folk princess
river ball range eyes game species legs golden
bullfrog session head ...
99
Extensions to LDA for pictures
100
A Bayesian Hierarchical Model for Learning
Natural Scene CategoriesFei-Fei Li, Pietro Perona
An unsupervised approach to learn and recognize
natural scene categories. A scene is represented
by a collection of local regions. Each region is
represented as part of a theme (e.g. rock,
grass etc) learned from data.
101
Generating Scenes
  • 1.) Choose a category label (e.g. mountain
    scene).
  • 2.) Given the mountain class, draw a probability
    vector that will determine what intermediate
    theme(s) (grass rock etc) to select while
    generating each patch of the scene.
  • 3.) For creating each patch in the image, first
    determine a particular theme out of the mixture
    of possible themes, and then draw a codeword
    given this theme. For example, if a rock theme
    is selected, this will in turn privilege some
    codewords that occur more frequently in rocks
    (e.g. slanted lines).
  • 4.) Repeat the process of drawing both the theme
    and codeword many times, eventually forming an
    entire bag of patches that would construct a
    scene of mountains.

102
Results modeling themes
Left - distribution of the 40 intermediate
themes. Right - distribution of codewords as
well as the appearance of 10 codewords selected
from the top 20 most likely codewords for this
category model.
103
Results modeling themes
Left - distribution of the 40 intermediate
themes. Right - distribution of codewords as
well as the appearance of 10 codewords selected
from the top 20 most likely codewords for this
category model.
104
Results Scene Classification
correct
incorrect
105
Results Scene Classification
correct
incorrect
106
LDA for words and pictures
107
Matching Words and PicturesKobus Barnard, Pinar
Duygulu, David Forsyth, Nando de Freitas, David
Blei and Michael Jordan
  • Present a multi-modal extension to mixture of
    latent Dirichlet allocation (MoM-LDA).
  • Apply the model to predicting words associated
    with whole images (auto-annotation) and
    corresponding to particular image regions (region
    naming).

108
MoM-LDA
109
Results
110
Results
111
Results
112
(No Transcript)
113
(No Transcript)
114
(No Transcript)
115
Generative vs Discriminative
  • Generative version - build a model that generates
    images containing monkeys and images not
    containing monkeys

P(imagemonkey)
P(imagenot monkey)
Write a Comment
User Comments (0)
About PowerShow.com