Introduction to Statistical Modeling - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to Statistical Modeling

Description:

Introduction to Statistical Modeling Rong Jin * – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 26
Provided by: rongjin
Learn more at: http://www.cse.msu.edu
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Statistical Modeling


1
Introduction to Statistical Modeling
  • Rong Jin

2
Why Statistical Modeling?
  • Vector space model for information retrieval
  • Both documents and queries are vectors in the
    term space
  • Relevance is measured by the similarity between
    document vectors and query vector
  • Many problems with vector space model
  • Ad-hoc term weighting schemes
  • Ad-hoc basis vectors
  • Ad-hoc similarity measurement
  • We need something that is much more principled !

3
A Simple Example (I)
  • Consider you have three coins C1, C2, C3
  • Alex picked up one of the coins and flipped it
    six times.
  • You didnt see which coin he picked out. But, you
    observed the results of flipping coins
  • t, h, t, h, t, t
  • Question how to guess which coin Alex choose?

4
A Simple Example (II)
  • You experimented with the three coins, say 6
    times
  • C1 h, h, h, t, h, t
  • C2 t, t, h, t, t, t
  • C3 t, h, t, t, t, h
  • Given t, h, t, h, t, t
  • Now, what one you think Alex choose?

5
A Simple Example (III)
  • q t, h, t, h, t, t ? bias bq 1/3
  • C1 h, h, h, t, h ? bias b1 5/6
  • C2 t, t, h, t, t, t ? bias b2 1/6
  • C3 t, h, t, t, t, h ? bias b3 1/3
  • So, which coin you think Alex select?
  • A more principled approach
  • Compute the likelihood p(qCi) for each coin

6
A Simple Example (IV)
  • p(qC1) p(t, h, t, h, t, t C1)
  • p(tC1)p(hC1)p(tC1)p(hC1)p(tC1
    )p(tC1)
  • 1/6 5/6 1/6 5/6 1/6 1/6
    5.310-4
  • Compute p(qC2) and p(qC3)
  • Which coin has the largest likelihood ?

7
A Simple Example (IV)
  • p(qC1) p(t, h, t, h, t, t C1)
  • p(tC1)p(hC1)p(tC1)p(hC1)p(tC1
    )p(tC1)
  • 1/6 5/6 1/6 5/6 1/6 1/6
    5.310-4
  • Compute p(qC2) and p(qC3)
  • p(qC2) 0.013, p(qC3) 0.02
  • Which coin has the largest likelihood ?

8
An Information Retrieval View
  • Query (q) t, h, t, h, t, t
  • Doc1(C1) h, h, h, t, h
  • Doc2(C2) t, t, h, t, t, t
  • Doc3(C3) t, h, t, t, t, h
  • Which document is ranked first if we use the
    vector space model?

9
An Information Retrieval View
  • Query (q) t, h, t, h, t, t
  • Doc1(C1) h, h, h, t, h
  • Doc2(C2) t, t, h, t, t, t
  • Doc3(C3) t, h, t, t, t, h
  • Which document is ranked first if we use the
    vector space model?

10
An Information Retrieval View
  • Query (q) t, h, t, h, t, t
  • Doc1(C1) h, h, h, t, h sim(D1)
    1/35/62/31/6 0.39
  • Doc2(C2) t, t, h, t, t, t sim(D2)
    1/31/62/35/6 0.61
  • Doc3(C3) t, h, t, t, t, h sim(D3)
    1/31/32/32/3 0.56
  • Which document is ranked first if we use the
    vector space model?

11
A Simple Example Summary
?
?
?
12
A Simple Example Summary
Estimating likelihood p(qbias)
Estimating bias for each coin
13
A Probabilistic Framework for Information
Retrieval
Estimating likelihood p(q ?)
Estimating some statistics ? for each document
14
A Probabilistic Framework for Information
Retrieval
  • Three fundamental questions
  • What statistics ? should be chosen to describe
    the characteristics of documents ?
  • How to estimate this statistics ?
  • How to compute the likelihood of generating
    queries given the statistics ??

15
Unigram Language Model
  • Probabilities for single word p(w)
  • ?p(w) for any word w in vocabulary V
  • Estimate an unigram language model
  • Simple counting
  • Given a document d, count term frequency c(w,d)
    for each word w. Then, p(w) c(w,d)/d
  • How to estimate the likelihood p(q?)?

16
Estimate p(q?)
  • qw1, w2, , wk
  • Similar to the example of flipping coins
  • E.g. qbush, kerry
  • ?dp(bush)0.001, p(kerry)0.02
  • p(q?d)0.001 0.02 2 10-5
  • What if the document didnt mention word bush,
    instead it used phrase president of united
    states ?

17
Estimate p(q?)
  • qw1, w2, , wk
  • Similar to the example of flipping coins
  • E.g. qbush, kerry
  • ?dp(bush)0.001, p(kerry)0.02
  • p(q?d)0.001 0.02 2 10-5
  • What if the document didnt mention word bush,
    instead it used phrase president of united
    states ?

18
Illustration of Language Models for Information
Retrieval
Estimating likelihood p(q?)p(h)2p(t)4
?2 p(h)1/2, p(t)1/2
?1 p(h)1/3, p(t)2/3
Estimating language models by counting
19
A Simple Example Summary
Estimating likelihood p(q?)p(h)2p(t)4
?2 p(h)1/2, p(t)1/2
?1 p(h)1/3, p(t)2/3
Estimating language models by counting
20
A Simple Example Summary
Estimating likelihood p(q?)p(h)2p(t)4
?2 p(h)1/6, p(t)5/6
?3 p(h)1/3, p(t)2/3
Estimating language models by counting
Problems?
21
Problems With Unigram LM
  • Unigram probabilities
  • Insufficient for representing true documents
  • Simple counting for estimating unigram
    probabilities
  • It does not account for variance in documents
  • If you ask the same person to write the same
    story twice, it will be different
  • Most words will have zero probabilities
  • Sparse data problem

22
Sparse Data Problems
  • Shrinkage
  • Maximum a posterior (MAP) estimation
  • Bayesian approach

23
Shrinkage Jelinek Mercer Smoothing
  • Linearly interpolate between document language
    model and the collection language model

0 lt ? lt 1 is a smoothing parameter
24
Smoothing TF-IDF Weighting
Are they totally irrelevant ?
25
Smoothing TF-IDF Weighting
Similar to TF.IDF weighting
irrelevant to documents
Write a Comment
User Comments (0)
About PowerShow.com