Discrete Math CS 280 - PowerPoint PPT Presentation

About This Presentation
Title:

Discrete Math CS 280

Description:

(e.g., email message contains words 'sale' and 'bargain' ... Let pn be the probability that no people share a birthday among n people in a room. ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 44
Provided by: csCor
Category:

less

Transcript and Presenter's Notes

Title: Discrete Math CS 280


1
Discrete MathCS 280
  • Prof. Bart Selman
  • selman_at_cs.cornell.edu
  • Module
  • Probability --- Part b)
  • Bayes Rule
  • Random Variables

2
Bayes Theorem
  • How to assess the probability that a particular
    event will occur on the
  • basis of partial evidence?
  • Examples
  • What is the likelihood that people who test
    positive to a particular disease
  • (e.g., HIV), actually have the disease?
  • What is the probability that an e-mail message is
    spam?
  • Key idea one should factor in additional
    information regarding
  • occurrence of events.

3
  • Assume that with respect to events F and E (E
    for Evidence)
  • We know P(F) probability that event F
    occurs
  • (e.g. probability
    that email message is spam
  • this is
    given by what fraction of email is spam)
  • We also know event E has occurred.
  • (e.g., email
    message contains words sale and bargain)
  • Therefore the probability conditional probability
    that F occurs given
  • that E occurs, P(FE), is a more realistic
    estimate that F occurs than P(F).
  • How do we compute P(FE)?
  • E.g., based on P(F), P(EF), and P(E F)

4
E
Evidence
Bayesian Inference
Modified Belief
Original Belief (Prior Probability)
Hypothesis
Theory
P(FE)
P(F)
5
Box A
Box B
Experiment Pick one box at random (p 0.5) and
than a ball at random from
that box. Assume you got a red ball. Whats
the probability that it came form the left box?
Define E you choose a red ball. (therefore E
you choose the green ball) F you choose the
left box. (therefore F you choose the right
box) We want to
know P(FE)
6
E red color F left box
P(FE)?
  • What we know
  • P(EF)
  • P(EF)
  • Given that the boxes are selected at random P(F)
    P(F)1/2
  • P(FE) P(EnF)/P(E) ? so we need to compute
    P(EnF) and P(E).

7/9
3/7
We know P(EF) P(EnF)/P(F).
So, P(EnF) P(EF) P(F) 7/9 1/2
7/18. What about P(E)? Note that P(E) P(E?F) P
(En F). Why? Note also that P (E n F) P(F)
P(EF) 1/2 3/7 3/14 So, P(E) P(E?F) P
(En F) 7/18 3/14 38/63 And therefore
P(FE) P(EnF)/P(E) (7/18) / (38/63) 49/76
?0.645
7
Original Belief there is a 0.5 that you will
pick left box (P(F)).
Concrete (new) Evidence Red ball picked (E)
Bayesian Inference
Modified Belief Increased to 0.65 Probability
(P(FE))
8
Theorem Bayes Theorem Suppose that E and F are
events from a sample space S such that P(E) ?0
and P(F)?0. Then
Proof
9
  • Example
  • Suppose that 1 person in 100,000 has a particular
    rare disease. There is an
  • accurate test for the disease that is correct in
    99 of the time when given
  • to someone with the disease it is correct in
    99.5 of the time when given
  • to someone without the disease.
  • Find
  • a) Probability that someone who tests positive
    has the disease.
  • b) Probability that someone who tests negative
    does not have the disease.

10
  • Solution
  • a)

Always start by defining the events!
F the person has the disease E the person
tests positive to the disease P(FE)
probability of having the disease given positive
test P(F)1/100,000 0.00001 P(FC)
0.99999 P(EF) 0.99 P(ECF) 0.01 P(EFC)
0.005
Note These are the probabilities most easily
measured!
Only 0.2 of people who test positive actually
have the disease!!!
11
  • b) F the person has the disease
  • E the person tests positive to the disease
  • P(FCEC) probability of not having the
    disease given negative test
  • P(F)1/100,0000.00001 P(FC)0.99999
  • P(EF)0.99 P(ECF)0.01
  • P(EFC)0.005

Thats pretty good!
12
Marbles
TOYS R US sells two kinds of bags of
marbles (1) Bags of all black marbles, and (2)
Bags of mixed marbles in which 20 of the marbles
are black. The bags are opaque and wrapped in
plastic, and I have no idea which bag is more
common. I buy a bag and figure there is a 5050
chance that the bag I purchased contains all
black marbles. A guess! I pull a marble out of
the bag and see that it is black. How should
this new evidence affect the 5050 assessment I
assigned to the probability of my having
purchased an all black bag of marbles? (as
previous example)
F bag of all black marbles FC bag with 20
black marbles E black marble
13
Marbles
Posterior Belief Probability that my bag of
marbles is all black 0.833 P(FE).
Prior Belief There is a 1/2 chance that I have an
all-black bag of marbles a guess (P(F))
0.5 chance of all-black (100) marble bag.
0.5 chance of 0.2 black marble bag.
14
Marbles
Warning Correct but slightly informal! Instead
of changing prior, we could consider new
experiment and evidence drawing two marbles.
I put the marble back, shake the bag, and draw
another marble. It is black? What happens now
that my new prior probability is 0.83?
Prior Belief 0.83
New Belief 0.96
0.17 chance of 0.2 black marble bag.
0.83 chance of all-black (1) marble bag.
Remember, I dont know which type of marble bag
is most popular Wal-Mart may have 100 bags of
mixed marbles on the shelf for every bag of all
black marbles. Bayes Theorem doesnt tell me
the probability of my marble bag being all black
it only tells me how I should revise my initial
best guess based on the newly obtained
information.
15
Original Belief I shrug my shoulders and guess is
that there is a 0.5 chance that my bag contains
all black marbles.
Bayesian Inference
Concrete Evidence 1st Black Marble
Modified Belief Increased to 0.83 Probability
Modified Belief Increased to 0.96
Bayesian Inference
Concrete Evidence 2nd Black Marble
16
Generalized Bayes Theorem
  • Suppose that E is an event from a sample space S
    and F1, F2 ,, Fn are
  • mutually exclusive events such that
  • Asume that P(E) ? 0 and P(Fi) ? 0 for i1, 2,,
    n. Then

P(E)
Compare
17
Bayesian Spam Filters
17
18
Applying Bayes TheoremSPAM or HAM?
  • Let our sample space or universe be the set of
    emails. (So, were sampling from
  • the space of possible emails.)
  • Let S be the event a message is spam hence
    is the event a message is not spam
  • Let E be the event a message contains a word w.

Since we have no idea of likelihood of SPAM, we
assume P(S)P(SC)1/2.
Can we do better?
19
Estimations
Note these are estimates based on frequencies in
samples.
19
20
Estimation Continued
So,
becomes
So, what do we want for p(w) and q(w) ??
So, a quite straightforward formula for our first
Bayesian spam filter!
20
Note P(S) P(SC) ½ divides out.
21
Spam based on single words?
  • Probabilities based on single words Bad Idea
  • False positives AND false negatives a plenty
  • Calculate based on n words, assuming each event
    EiS (EiSC) is independent
  • P(S) P(SC).

Derivation see Sect. 6.3.
21
22
Final Approximation
Compare to single word
22
23
How do we use this?
  • User must train the filter based on messages in
    his/her inbox to estimate probabilities.
  • The program or user must define a threshold
    probability r
  • If , the message is
    considered spam.
  • Gmail Train on all users! (note report spam
    button)

23
24
Example
  • Suppose the filter has the following data
  • Threshold Probability .9
  • Nigeria occurs in 250 of 2000 spam messages
  • Nigeria occurs in only 5 of 1000 non-spam
    messages
  • Lets try to estimate the probability, using the
    process we just defined

24
25
Example Cont.
  • Step 1 Find the probability that the message has
    the word Nigeria in it and is spam.
  • p(Nigeria) 250 / 2000 0.125
  • Step 2 Find the probability that the message has
    the word Nigeria in it and is not spam.
  • q(Nigeria) 5 / 1000 0.005

25
26
Example Cont.
  • Since we are assuming that it is equally likely
    that an incoming message is or is not spam, we
    can estimate the probability with this equation
  • r(Nigeria) p(Nigeria)
  • p(Nigeria) q(Nigeria)

26
27
Example Cont.
0.125____ 0.125 0.005
  • 0.125
  • 0.130
  • 0.962
  • Since r(Nigeria) is greater than the threshold of
    0.9, we can reject
  • this message as spam.

27
28
Multiple Words
  • 2000 Spam messages 1000 real messages
  • Nigeria appears in 400 spam messages
  • Nigeria appears in 60 real messages
  • bank appears in 200 spam and 25 real messages
  • Threshold Probability .9
  • Lets calculate the probability that message with
    Nigeria and bank is spam.

28
29
Example Cont.
  • Step 1 Find the probability that the message has
    the word Nigeria in it and is spam.
  • p(Nigeria) 400 / 2000 0.2
  • Step 2 Find the probability that the message has
    the word Nigeria and is not spam.
  • q(Nigeria) 60 / 1000 0.06
  • Step 3 Find the probability that the message
    contains the word bank and is spam.
  • p(bank) 200 / 2000 0.1
  • Step 4 Find the probability that the message
    contains the word bank and is not spam.
  • q(bank) 25 / 1000 0.025

29
30
Example Cont
  • Using our approximation, we have
  • r(Nigeria,bank) p(Nigeria)
    p(bank)
  • p(Nigeria) p(bank) q(Nigeria)
    q(bank)

30
31
Example Cont.
  • Using our approximation, we have
  • r(Nigeria,bank) p(Nigeria)
    p(bank)
  • p(Nigeria) p(bank) q(Nigeria)
    q(bank)
  • r(Nigeria,bank) (0.2)(0.1)
  • (0.2)(0.1)
    (0.6)(0.025)
  • 0.930
  • This message will be rejected however since we
    set the threshold probability at 0.9.

Concludes Bayes Reasoning
31
32
Probability Paradox I
33
Magic Dice Or How to Win Every Time!
  • a) You select any one of the four dice (A, B, C,
    or D).
  • b) Ill select another.
  • Both dice are thrown, highest number wins throw.
  • Do series of 10 throws. The person with the most
  • highest throws wins the series. (I.e. die more
    likely to get a higher number wins.)
  • Claim In a game of 'The Best of Ten Throws, I
    will almost certainly win
  • --- no matter which die you pick!!
  • Why is this strange?
  • Say, you pick die A. Lets assume, die B is
    better. So, I pick B.
  • But, then, next game next person picks B.
    Lets assume C is better. Ill select C.
  • Next person, will pick C. Ill pick D.
  • Next person, will pick D Hmm
  • Ill pick A and will win!!
  • A lt B lt C lt D . lt A !! Failure of transitivity!

But, could such a set of dice exist?
Surprisingly, yes!
34
Magic Dice
Prob(D wins over C) 2/3 2/6 (4/6) (1/2) 4/6
D
Prob(A wins over D) 4/6 2/3
A lt B lt C lt D . lt A !!
C
A
Prob(C wins over B) 2/3 since 3/6 (3/6)
(2/6) 4/6
B
Prob(B wins over A) 4/6 2/3 (i.e. Prob(A wins
over B) 1/3)
However transitivity in expected value of dice
throw
EB lt EA EC lt ED 16/6 lt 18/6 18/6 lt
20/6
Non-transitive dice http//www.sciencenews.org/2
0020420/mathtrek.asp
35
Random Variables and Distributions
36
Random Variables
For a given sample space S, a random variable
(r.v.) is any real valued function on S, i.e.,
a random variable is a function that assigns a
real number to each possible outcome
Numbers
Sample space
Suppose our experiment is a roll of 2 dice. S is
set of pairs. Example random variables
X sum of two dice. X((2,3)) 5 Y
difference between two dice. Y((2,3)) 1 Z
max of two dice. Z((2,3)) 3
37
Random variable
  • Suppose a coin is flipped three times. Let X(t)
    be the random variable that equals the number of
    heads that appear when t is the outcome.
  • X(HHH) 3
  • X(HHT) X(HTH)X(THH)2
  • X(TTH)X(THT)X(HTT)1
  • X(TTT)0

Note we generally drop the argument! Well just
say the random
variable X. And write e.g. P(X 2) for the
probability that the random variable X(t) takes
on the value 2. Or P(Xx) for the probability
that the random variable X(t) takes on the value
x.
38
Distribution of Random Variable
  • Definition
  • The distribution of a random variable X on a
    sample space S is the set of
  • pairs (r, p(Xr)) for all r ? X(S), where p(Xr)
    is the probability that X
  • takes the value r.
  • A distribution is usually described specifying
    p(Xr) for each r ? X(S).

A probability distribution on a r.v. X is just an
allocation of the total probability mass, 1,
over the possible values of X.
39
Random Variables
Example Do you ever play the game
Racko? Suppose you are playing a game with cards
labeled 1 to 20, and you draw 3 cards. We bet
that the maximum card has value 17 or greater.
Whats the probability we win the bet?
Let r.v. X denote the maximum card value. The
possible values for X are 3, 4, 5, , 20.
i 3 4 5 6 7 8 9 20
Pr(X i) ? ? ? ? ? ? ? ?
Filling in this box would be a pain. We look for
a general formula.
40
Random Variables
X is value of the highest card among the 3
selected. 20 cards are labeled 1 through 20.
We want Pr(X i), i 3,20.
Denominator first How many ways are there to
select the 3 cards?
How many choices are there that result in a max
card whose value is i?
Pr(X i) C(i-1, 2) / C(20,3)
We win the bet is the max card, X is 17 or
greater. Whats the probability we win?
41
The Birthday Paradox
42
Birthdays
A 23
How many people have to be in a room to assure
that the probability that at least two of them
have the same birthday is greater than 1/2?
Let pn be the probability that no people share a
birthday among n people in a room.
For L options answer is in the order of sqrt(L) ?
Then 1 - pn is the probability that 2 or more
share a birthday.
We want the smallest n so that 1 - pn gt 1/2.
Informally, why??
Hmm. Why does such an n exist? Upper-bound?
43
Birthdays
Assumption Birthdays of the people are
independent. Each birthday is equally likely and
that there are 366 days/year
Let pn be the probability that no-one shares a
birthday among n people in a room.
What is pn? (brute force is fine)
Assume that people come in certain order the
probability that the second person has a
birthday. Different than the first is 365/366
the probability that the third person has a
different birthday. Form the two previous ones is
364/366.. For the jth person we have
(366-(j-1))/366.
44
So,
After several tries, when n22 1 pn 0.475.
n23 1-pn 0.506
Relevant to hashing. Why?
45
From Birthday Problem to Hashing Functions
  • Probability of a Collision in Hashing Functions
  • A hashing function h(k) is a mapping of the keys
    (or records, e.g., SSN, around
  • 300x 106 in the US) to a much smaller storage
    location. A good hashing fucntio
  • yields few collisions. What is the probability
    that no two keys are mapped
  • to the same location by a hashing function?
  • Assume m is the number available storage
    locations, so the probability
  • of mapping a key to a location is 1/m.
  • Assuming the keys are k1, k2, kn, the probability
    of mapping the jth record to a
  • free location is after the first (j-1) records is
    (m-(j-1))/m.

m 10,000, gives n 117. Not that many!
Write a Comment
User Comments (0)
About PowerShow.com