Introduction to Statistical Modeling

About This Presentation

Title:

Introduction to Statistical Modeling

Description:

Introduction to Statistical Modeling Rong Jin * – PowerPoint PPT presentation

Number of Views:88

Avg rating:3.0/5.0

Slides: 26

Provided by: rongjin

Learn more at: http://www.cse.msu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Statistical Modeling

1
Introduction to Statistical Modeling

Rong Jin

2
Why Statistical Modeling?

Vector space model for information retrieval
Both documents and queries are vectors in the
term space
Relevance is measured by the similarity between
document vectors and query vector
Many problems with vector space model
Ad-hoc term weighting schemes
Ad-hoc basis vectors
Ad-hoc similarity measurement
We need something that is much more principled !

3
A Simple Example (I)

Consider you have three coins C1, C2, C3
Alex picked up one of the coins and flipped it
six times.
You didnt see which coin he picked out. But, you
observed the results of flipping coins
t, h, t, h, t, t
Question how to guess which coin Alex choose?

4
A Simple Example (II)

You experimented with the three coins, say 6
times
C1 h, h, h, t, h, t
C2 t, t, h, t, t, t
C3 t, h, t, t, t, h
Given t, h, t, h, t, t
Now, what one you think Alex choose?

5
A Simple Example (III)

q t, h, t, h, t, t ? bias bq 1/3
C1 h, h, h, t, h ? bias b1 5/6
C2 t, t, h, t, t, t ? bias b2 1/6
C3 t, h, t, t, t, h ? bias b3 1/3
So, which coin you think Alex select?
A more principled approach
Compute the likelihood p(qCi) for each coin

6
A Simple Example (IV)

p(qC1) p(t, h, t, h, t, t C1)
p(tC1)p(hC1)p(tC1)p(hC1)p(tC1
)p(tC1)
1/6 5/6 1/6 5/6 1/6 1/6
5.310-4
Compute p(qC2) and p(qC3)
Which coin has the largest likelihood ?

7
A Simple Example (IV)

p(qC1) p(t, h, t, h, t, t C1)
p(tC1)p(hC1)p(tC1)p(hC1)p(tC1
)p(tC1)
1/6 5/6 1/6 5/6 1/6 1/6
5.310-4
Compute p(qC2) and p(qC3)
p(qC2) 0.013, p(qC3) 0.02
Which coin has the largest likelihood ?

8
An Information Retrieval View

Query (q) t, h, t, h, t, t
Doc1(C1) h, h, h, t, h
Doc2(C2) t, t, h, t, t, t
Doc3(C3) t, h, t, t, t, h
Which document is ranked first if we use the
vector space model?

9
An Information Retrieval View

Query (q) t, h, t, h, t, t
Doc1(C1) h, h, h, t, h
Doc2(C2) t, t, h, t, t, t
Doc3(C3) t, h, t, t, t, h
Which document is ranked first if we use the
vector space model?

10
An Information Retrieval View

Query (q) t, h, t, h, t, t
Doc1(C1) h, h, h, t, h sim(D1)
1/35/62/31/6 0.39
Doc2(C2) t, t, h, t, t, t sim(D2)
1/31/62/35/6 0.61
Doc3(C3) t, h, t, t, t, h sim(D3)
1/31/32/32/3 0.56
Which document is ranked first if we use the
vector space model?

11
A Simple Example Summary
?
?
?
12
A Simple Example Summary
Estimating likelihood p(qbias)
Estimating bias for each coin
13
A Probabilistic Framework for Information
Retrieval
Estimating likelihood p(q ?)
Estimating some statistics ? for each document
14
A Probabilistic Framework for Information
Retrieval

Three fundamental questions
What statistics ? should be chosen to describe
the characteristics of documents ?
How to estimate this statistics ?
How to compute the likelihood of generating
queries given the statistics ??

15
Unigram Language Model

Probabilities for single word p(w)
?p(w) for any word w in vocabulary V
Estimate an unigram language model
Simple counting
Given a document d, count term frequency c(w,d)
for each word w. Then, p(w) c(w,d)/d
How to estimate the likelihood p(q?)?

16
Estimate p(q?)

qw1, w2, , wk
Similar to the example of flipping coins
E.g. qbush, kerry
?dp(bush)0.001, p(kerry)0.02
p(q?d)0.001 0.02 2 10-5
What if the document didnt mention word bush,
instead it used phrase president of united
states ?

17
Estimate p(q?)

qw1, w2, , wk
Similar to the example of flipping coins
E.g. qbush, kerry
?dp(bush)0.001, p(kerry)0.02
p(q?d)0.001 0.02 2 10-5
What if the document didnt mention word bush,
instead it used phrase president of united
states ?

18
Illustration of Language Models for Information
Retrieval
Estimating likelihood p(q?)p(h)2p(t)4
?2 p(h)1/2, p(t)1/2
?1 p(h)1/3, p(t)2/3
Estimating language models by counting
19
A Simple Example Summary
Estimating likelihood p(q?)p(h)2p(t)4
?2 p(h)1/2, p(t)1/2
?1 p(h)1/3, p(t)2/3
Estimating language models by counting
20
A Simple Example Summary
Estimating likelihood p(q?)p(h)2p(t)4
?2 p(h)1/6, p(t)5/6
?3 p(h)1/3, p(t)2/3
Estimating language models by counting
Problems?
21
Problems With Unigram LM

Unigram probabilities
Insufficient for representing true documents
Simple counting for estimating unigram
probabilities
It does not account for variance in documents
If you ask the same person to write the same
story twice, it will be different
Most words will have zero probabilities
Sparse data problem

22
Sparse Data Problems

Shrinkage
Maximum a posterior (MAP) estimation
Bayesian approach

23
Shrinkage Jelinek Mercer Smoothing

Linearly interpolate between document language
model and the collection language model

0 lt ? lt 1 is a smoothing parameter
24
Smoothing TF-IDF Weighting
Are they totally irrelevant ?
25
Smoothing TF-IDF Weighting
Similar to TF.IDF weighting
irrelevant to documents

Write a Comment

User Comments (0)

About PowerShow.com

Introduction to Statistical Modeling - PowerPoint PPT Presentation

Introduction to Statistical Modeling

Introduction to Statistical Modeling Rong Jin * – PowerPoint PPT presentation