Discrete Math CS 280 - PowerPoint PPT Presentation

About This Presentation

Title:

Discrete Math CS 280

Description:

(e.g., email message contains words 'sale' and 'bargain' ... Let pn be the probability that no people share a birthday among n people in a room. ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 44

Provided by: csCor

Learn more at: http://www.cs.cornell.edu

Category:

more less

Transcript and Presenter's Notes

Title: Discrete Math CS 280

1
Discrete MathCS 280

Prof. Bart Selman
selman_at_cs.cornell.edu
Module
Probability --- Part b)
Bayes Rule
Random Variables

2
Bayes Theorem

How to assess the probability that a particular
event will occur on the
basis of partial evidence?
Examples
What is the likelihood that people who test
positive to a particular disease
(e.g., HIV), actually have the disease?
What is the probability that an e-mail message is
spam?
Key idea one should factor in additional
information regarding
occurrence of events.

Assume that with respect to events F and E (E
for Evidence)
We know P(F) probability that event F
occurs
(e.g. probability
that email message is spam
this is
given by what fraction of email is spam)
We also know event E has occurred.
(e.g., email
message contains words sale and bargain)
Therefore the probability conditional probability
that F occurs given
that E occurs, P(FE), is a more realistic
estimate that F occurs than P(F).
How do we compute P(FE)?
E.g., based on P(F), P(EF), and P(E F)

4
E
Evidence
Bayesian Inference
Modified Belief
Original Belief (Prior Probability)
Hypothesis
Theory
P(FE)
P(F)
5
Box A
Box B
Experiment Pick one box at random (p 0.5) and
than a ball at random from
that box. Assume you got a red ball. Whats
the probability that it came form the left box?
Define E you choose a red ball. (therefore E
you choose the green ball) F you choose the
left box. (therefore F you choose the right
box) We want to
know P(FE)
6
E red color F left box
P(FE)?

What we know
P(EF)
P(EF)
Given that the boxes are selected at random P(F)
P(F)1/2
P(FE) P(EnF)/P(E) ? so we need to compute
P(EnF) and P(E).

7/9
3/7
We know P(EF) P(EnF)/P(F).
So, P(EnF) P(EF) P(F) 7/9 1/2
7/18. What about P(E)? Note that P(E) P(E?F) P
(En F). Why? Note also that P (E n F) P(F)
P(EF) 1/2 3/7 3/14 So, P(E) P(E?F) P
(En F) 7/18 3/14 38/63 And therefore
P(FE) P(EnF)/P(E) (7/18) / (38/63) 49/76
?0.645
7
Original Belief there is a 0.5 that you will
pick left box (P(F)).
Concrete (new) Evidence Red ball picked (E)
Bayesian Inference
Modified Belief Increased to 0.65 Probability
(P(FE))
8
Theorem Bayes Theorem Suppose that E and F are
events from a sample space S such that P(E) ?0
and P(F)?0. Then
Proof
9

Example
Suppose that 1 person in 100,000 has a particular
rare disease. There is an
accurate test for the disease that is correct in
99 of the time when given
to someone with the disease it is correct in
99.5 of the time when given
to someone without the disease.
Find
a) Probability that someone who tests positive
has the disease.
b) Probability that someone who tests negative
does not have the disease.

Solution
a)

Always start by defining the events!
F the person has the disease E the person
tests positive to the disease P(FE)
probability of having the disease given positive
test P(F)1/100,000 0.00001 P(FC)
0.99999 P(EF) 0.99 P(ECF) 0.01 P(EFC)
0.005
Note These are the probabilities most easily
measured!
Only 0.2 of people who test positive actually
have the disease!!!
11

b) F the person has the disease
E the person tests positive to the disease
P(FCEC) probability of not having the
disease given negative test
P(F)1/100,0000.00001 P(FC)0.99999
P(EF)0.99 P(ECF)0.01
P(EFC)0.005

Thats pretty good!
12
Marbles
TOYS R US sells two kinds of bags of
marbles (1) Bags of all black marbles, and (2)
Bags of mixed marbles in which 20 of the marbles
are black. The bags are opaque and wrapped in
plastic, and I have no idea which bag is more
common. I buy a bag and figure there is a 5050
chance that the bag I purchased contains all
black marbles. A guess! I pull a marble out of
the bag and see that it is black. How should
this new evidence affect the 5050 assessment I
assigned to the probability of my having
purchased an all black bag of marbles? (as
previous example)
F bag of all black marbles FC bag with 20
black marbles E black marble
13
Marbles
Posterior Belief Probability that my bag of
marbles is all black 0.833 P(FE).
Prior Belief There is a 1/2 chance that I have an
all-black bag of marbles a guess (P(F))
0.5 chance of all-black (100) marble bag.
0.5 chance of 0.2 black marble bag.
14
Marbles
Warning Correct but slightly informal! Instead
of changing prior, we could consider new
experiment and evidence drawing two marbles.
I put the marble back, shake the bag, and draw
another marble. It is black? What happens now
that my new prior probability is 0.83?
Prior Belief 0.83
New Belief 0.96
0.17 chance of 0.2 black marble bag.
0.83 chance of all-black (1) marble bag.
Remember, I dont know which type of marble bag
is most popular Wal-Mart may have 100 bags of
mixed marbles on the shelf for every bag of all
black marbles. Bayes Theorem doesnt tell me
the probability of my marble bag being all black
it only tells me how I should revise my initial
best guess based on the newly obtained
information.
15
Original Belief I shrug my shoulders and guess is
that there is a 0.5 chance that my bag contains
all black marbles.
Bayesian Inference
Concrete Evidence 1st Black Marble
Modified Belief Increased to 0.83 Probability
Modified Belief Increased to 0.96
Bayesian Inference
Concrete Evidence 2nd Black Marble
16
Generalized Bayes Theorem

Suppose that E is an event from a sample space S
and F1, F2 ,, Fn are
mutually exclusive events such that
Asume that P(E) ? 0 and P(Fi) ? 0 for i1, 2,,
n. Then

P(E)
Compare
17
Bayesian Spam Filters
17
18
Applying Bayes TheoremSPAM or HAM?

Let our sample space or universe be the set of
emails. (So, were sampling from
the space of possible emails.)
Let S be the event a message is spam hence
is the event a message is not spam
Let E be the event a message contains a word w.

Since we have no idea of likelihood of SPAM, we
assume P(S)P(SC)1/2.
Can we do better?
19
Estimations
Note these are estimates based on frequencies in
samples.
19
20
Estimation Continued
So,
becomes
So, what do we want for p(w) and q(w) ??
So, a quite straightforward formula for our first
Bayesian spam filter!
20
Note P(S) P(SC) ½ divides out.
21
Spam based on single words?

Probabilities based on single words Bad Idea
False positives AND false negatives a plenty
Calculate based on n words, assuming each event
EiS (EiSC) is independent
P(S) P(SC).

Derivation see Sect. 6.3.
21
22
Final Approximation
Compare to single word
22
23
How do we use this?

User must train the filter based on messages in
his/her inbox to estimate probabilities.
The program or user must define a threshold
probability r
If , the message is
considered spam.
Gmail Train on all users! (note report spam
button)

23
24
Example

Suppose the filter has the following data
Threshold Probability .9
Nigeria occurs in 250 of 2000 spam messages
Nigeria occurs in only 5 of 1000 non-spam
messages
Lets try to estimate the probability, using the
process we just defined

24
25
Example Cont.

Step 1 Find the probability that the message has
the word Nigeria in it and is spam.
p(Nigeria) 250 / 2000 0.125
Step 2 Find the probability that the message has
the word Nigeria in it and is not spam.
q(Nigeria) 5 / 1000 0.005

25
26
Example Cont.

Since we are assuming that it is equally likely
that an incoming message is or is not spam, we
can estimate the probability with this equation
r(Nigeria) p(Nigeria)
p(Nigeria) q(Nigeria)

26
27
Example Cont.
0.125____ 0.125 0.005

0.125
0.130
0.962
Since r(Nigeria) is greater than the threshold of
0.9, we can reject
this message as spam.

27
28
Multiple Words

2000 Spam messages 1000 real messages
Nigeria appears in 400 spam messages
Nigeria appears in 60 real messages
bank appears in 200 spam and 25 real messages
Threshold Probability .9
Lets calculate the probability that message with
Nigeria and bank is spam.

28
29
Example Cont.

Step 1 Find the probability that the message has
the word Nigeria in it and is spam.
p(Nigeria) 400 / 2000 0.2
Step 2 Find the probability that the message has
the word Nigeria and is not spam.
q(Nigeria) 60 / 1000 0.06
Step 3 Find the probability that the message
contains the word bank and is spam.
p(bank) 200 / 2000 0.1
Step 4 Find the probability that the message
contains the word bank and is not spam.
q(bank) 25 / 1000 0.025

29
30
Example Cont

Using our approximation, we have
r(Nigeria,bank) p(Nigeria)
p(bank)
p(Nigeria) p(bank) q(Nigeria)
q(bank)

30
31
Example Cont.

Using our approximation, we have
r(Nigeria,bank) p(Nigeria)
p(bank)
p(Nigeria) p(bank) q(Nigeria)
q(bank)
r(Nigeria,bank) (0.2)(0.1)
(0.2)(0.1)
(0.6)(0.025)
0.930
This message will be rejected however since we
set the threshold probability at 0.9.

Concludes Bayes Reasoning
31
32
Probability Paradox I
33
Magic Dice Or How to Win Every Time!

a) You select any one of the four dice (A, B, C,
or D).
b) Ill select another.
Both dice are thrown, highest number wins throw.
Do series of 10 throws. The person with the most
highest throws wins the series. (I.e. die more
likely to get a higher number wins.)
Claim In a game of 'The Best of Ten Throws, I
will almost certainly win
--- no matter which die you pick!!
Why is this strange?
Say, you pick die A. Lets assume, die B is
better. So, I pick B.
But, then, next game next person picks B.
Lets assume C is better. Ill select C.
Next person, will pick C. Ill pick D.
Next person, will pick D Hmm
Ill pick A and will win!!
A lt B lt C lt D . lt A !! Failure of transitivity!

But, could such a set of dice exist?
Surprisingly, yes!
34
Magic Dice
Prob(D wins over C) 2/3 2/6 (4/6) (1/2) 4/6
D
Prob(A wins over D) 4/6 2/3
A lt B lt C lt D . lt A !!
C
A
Prob(C wins over B) 2/3 since 3/6 (3/6)
(2/6) 4/6
B
Prob(B wins over A) 4/6 2/3 (i.e. Prob(A wins
over B) 1/3)
However transitivity in expected value of dice
throw
EB lt EA EC lt ED 16/6 lt 18/6 18/6 lt
20/6
Non-transitive dice http//www.sciencenews.org/2
0020420/mathtrek.asp
35
Random Variables and Distributions
36
Random Variables
For a given sample space S, a random variable
(r.v.) is any real valued function on S, i.e.,
a random variable is a function that assigns a
real number to each possible outcome
Numbers
Sample space
Suppose our experiment is a roll of 2 dice. S is
set of pairs. Example random variables
X sum of two dice. X((2,3)) 5 Y
difference between two dice. Y((2,3)) 1 Z
max of two dice. Z((2,3)) 3
37
Random variable

Suppose a coin is flipped three times. Let X(t)
be the random variable that equals the number of
heads that appear when t is the outcome.
X(HHH) 3
X(HHT) X(HTH)X(THH)2
X(TTH)X(THT)X(HTT)1
X(TTT)0

Note we generally drop the argument! Well just
say the random
variable X. And write e.g. P(X 2) for the
probability that the random variable X(t) takes
on the value 2. Or P(Xx) for the probability
that the random variable X(t) takes on the value
x.
38
Distribution of Random Variable

Definition
The distribution of a random variable X on a
sample space S is the set of
pairs (r, p(Xr)) for all r ? X(S), where p(Xr)
is the probability that X
takes the value r.
A distribution is usually described specifying
p(Xr) for each r ? X(S).

A probability distribution on a r.v. X is just an
allocation of the total probability mass, 1,
over the possible values of X.
39
Random Variables
Example Do you ever play the game
Racko? Suppose you are playing a game with cards
labeled 1 to 20, and you draw 3 cards. We bet
that the maximum card has value 17 or greater.
Whats the probability we win the bet?
Let r.v. X denote the maximum card value. The
possible values for X are 3, 4, 5, , 20.
i 3 4 5 6 7 8 9 20
Pr(X i) ? ? ? ? ? ? ? ?
Filling in this box would be a pain. We look for
a general formula.
40
Random Variables
X is value of the highest card among the 3
selected. 20 cards are labeled 1 through 20.
We want Pr(X i), i 3,20.
Denominator first How many ways are there to
select the 3 cards?
How many choices are there that result in a max
card whose value is i?
Pr(X i) C(i-1, 2) / C(20,3)
We win the bet is the max card, X is 17 or
greater. Whats the probability we win?
41
The Birthday Paradox
42
Birthdays
A 23
How many people have to be in a room to assure
that the probability that at least two of them
have the same birthday is greater than 1/2?
Let pn be the probability that no people share a
birthday among n people in a room.
For L options answer is in the order of sqrt(L) ?
Then 1 - pn is the probability that 2 or more
share a birthday.
We want the smallest n so that 1 - pn gt 1/2.
Informally, why??
Hmm. Why does such an n exist? Upper-bound?
43
Birthdays
Assumption Birthdays of the people are
independent. Each birthday is equally likely and
that there are 366 days/year
Let pn be the probability that no-one shares a
birthday among n people in a room.
What is pn? (brute force is fine)
Assume that people come in certain order the
probability that the second person has a
birthday. Different than the first is 365/366
the probability that the third person has a
different birthday. Form the two previous ones is
364/366.. For the jth person we have
(366-(j-1))/366.
44
So,
After several tries, when n22 1 pn 0.475.
n23 1-pn 0.506
Relevant to hashing. Why?
45
From Birthday Problem to Hashing Functions

Probability of a Collision in Hashing Functions
A hashing function h(k) is a mapping of the keys
(or records, e.g., SSN, around
300x 106 in the US) to a much smaller storage
location. A good hashing fucntio
yields few collisions. What is the probability
that no two keys are mapped
to the same location by a hashing function?
Assume m is the number available storage
locations, so the probability
of mapping a key to a location is 1/m.
Assuming the keys are k1, k2, kn, the probability
of mapping the jth record to a
free location is after the first (j-1) records is
(m-(j-1))/m.