Probability - PowerPoint PPT Presentation

About This Presentation

Title:

Probability

Description:

it has no set value, until you realize' it. its properties are described by a ... SpMip(xp xp) SqMjq (xq xq) p(x1...xN) dx1... SpMip SqMjq Cxpq. So Cy ... – PowerPoint PPT presentation

Number of Views:16

Avg rating:3.0/5.0

Slides: 65

Provided by: billm7

Learn more at: https://www.ldeo.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: Probability

1
Lecture 4

Probability
and what it has to do with
data analysis

2
Please Read

Doug Martinsons
Chapter 2 Probability Theory
Available on Courseworks

3
Abstraction

Random variable, x
it has no set value, until you realize it
its properties are described by a distribution,
p(x)

4
Probability density distribution

When you realize x
the probability that the value you get is
between x and xdx
is p(x) dx

the probability, P, that the value you get is
is between x1 and x2
is P ?x1x2 p(x) dx
Note that it is written with a capital P
And represented by a fraction between
0 never
And
1 always

6
Probability P that x is between x1 and x2 is
proportional to this area
p(x)
x
x2
x1
7
Probability that x is between -? and ? is unity,
so total area 1
p(x)
x

the probability that the value you get is
is something is unity
?-?? p(x) dx 1

Or whatever the allowable range of x is
8
Why all this is relevant

Any measurement is that contains noise is treated
as a random variable, x
The distribution p(x) embodies both the true
value of the quantity being measured and the
measurement noise
All quantities derived from a random variable are
themselves random variables, so
The algebra of random variables allows you to
understand how measurement noise affects
inferences made from the data

9
Basic Description of Distributions
10
Mode x at which distribution has peak most-likely
value of x
peak
p(x)
x
xmode
11
But modes can be deceptive
100 realizations of x
x N 0-1 3 1-2 18 2-3 11 3-4 8 4-5 11 5-6 14 6-7 8
7-8 7 8-9 11 9-10 9
Sure, the 1-2 range has the most counts, but most
of the measurements are bigger than 2!
peak
p(x)
x
0
10
xmode
12
Median 50 chance x is smaller than xmedian 50
chance x is bigger than xmedian
No special reason the median needs to coincide
with the peak
p(x)
50
50
x
xmedian
13
Expected value or mean x you would get if you
took the mean of lots of realizations of x
Lets examine a discrete distribution, for
simplicity ...
4
3
p(x)
2
1
0
1
2
3
x
14
Hypothetical table of 140 realizations of x

x N
20
80
40
Total 140

mean 20 ? 1 80 ? 2 40 ? 3 /
140 (20/140) ? 1 (80/140) ?
2 (40/140) ? 3 p(1) ? 1 p(2) ? 2 p(3)
? 3 Si p(xi) xi
15
by analogyfor a smooth distribution

Expected value of x
E(x) ?-?? x p(x) dx

16
by the way

You can compute the expected (mean) value of
any function of x this way
E(x) ?-?? x p(x) dx
E(x2) ?-?? x2 p(x) dx
E(?x) ?-?? ?x p(x) dx
etc.

17
Beware

E(x2) ?E(x)2
E(x) ?E(?x)2
and so forth

18
Width of a distribution Heres a perfectly
sensible way to define the width of a
distribution
p(x)
50
25
25
x
W50
its not used much, though
19
Width of a distribution Heres another way
Parabola x-E(x)2
p(x)
x
E(x)
multiply and integrate
20
Idea is that if distribution is narrow, then most
of the probability lines up with the low spot of
the parabola
x-E(x)2
p(x)
x
E(x)
But if it is wide, then some of the probability
lines up with the high parts of the parabola
x-E(x)2 p(x)
Compute this total area
x
E(x)
Variance s2 ?-?? x-E(x)2 p(x) dx
21
?variance s A measure of width
p(x)
s
x
E(x)
we dont immediately know its relationship to
area, though
22
the Gaussian or normal distributionp(x)
exp - (x-x)2 / 2s2 )
variance
expected value
1 ?(2p)s
Memorize me !
23
p(x)
x 1 s 1
Examples of Normal Distributions
x
p(x)
x 3 s 0.5
x
24
Properties of the normal distribution
Expectation Median Mode x 95 of
probability within 2s of the expected value
p(x)
95
x
25
Functions of a random variable
any function of a random variable is itself a
random variable
26
If x has distribution p(x)the y(x) has
distributionp(y) px(y) dx/dy
27
This follows from the rule for transforming
integrals

1 ?x1x2 p(x) dx ?y1y2 px(y) dx/dy dy

Limits so that y1y(x1), etc.
28
example

Let x have a uniform (white) distribution of 0,1

1
p(x)
0
x
1
Uniform probability that x is anywhere between 0
and 1
29

Let y x2
then xy½
y(x0)0
y(x1)1
dx/dy½y-½
px(y)1
So p(y)½y-½ on the interval 0,1

1
30
Numerical testhistogram of 1000 random numbers
Histogram of x, generated with Excels rand()
function which claims to be based upon a uniform
distribution
Plausible that its uniform
Plausible that its proportional to 1/?y
Histogram of x2, generated by squaring xs from
above
31
multivariate distributions
32
example

Liberty island is inhabited by both pigeons and
seagulls
40 of the birds are pigeons
and 60 of the birds are gulls
50 of pigeons are white and 50 are grey
100 of gulls are white

33
Two variables

species s takes two values
pigeon p
and gull g
color c takes two values
white w
and tan t

Of 100 birds, 20 are white pigeons 20 are tan
pigeons 60 are white gulls 0 are tan gulls
34
What is the probability that a bird has species s
and color c ?
a random bird, that is
p
20
20
s
g
60
0
Note sum of all boxes is 100
w
t
c
35
This is called theJoint Probabilityand is
writtenP(s,c)
36
Two continuous variablessay x1 and x2have a
joint probability distributionand writtenp(x1,
x2)with ? ? p(x1, x2) dx1 dx2 1
37
The probability thatx1 is between x1 and x1dx1
and x2 is between x2 and x2dx2 isp(x1, x2)
dx1 dx2so ? ? p(x1, x2) dx1 dx2 1
38
You would contour a joint probability
distributionand it would look something like
x2
x1
39
What is the probability that a bird has color c ?
Of 100 birds, 20 are white pigeons 20 are tan
pigeons 60 are white gulls 0 are tan gulls
start with P(s,c)
p
20
20
s
g
60
0
w
t
and sum columns
c
80
20
To get P(c)
40
What is the probability that a bird has species s
?
start with P(s,c)
p
20
20
40
and sum rows
s
Of 100 birds, 20 are white pigeons 20 are tan
pigeons 60 are white gulls 0 are tan gulls
g
60
0
60
w
t
To get P(s)
c
41
These operations make sense with distributions,
too
x2
x2
x2
x1
x1
p(x2)
p(x1)
x1
p(x1) ? p(x1,x2) dx2
p(x2) ? p(x1,x2) dx1
distribution of x1 (irrespective of x2)
distribution of x2 (irrespective of x1)
42
Given that a bird is species swhat is the
probability that it has color c ?
Of 100 birds, 20 are white pigeons 20 are tan
pigeons 60 are white gulls 0 are tan gulls
Note, all rows sum to 100
43
This is called theConditional Probability of c
given sand is writtenP(cs)similarly
44
Given that a bird is color cwhat is the
probability that it has species s ?
Of 100 birds, 20 are white pigeons 20 are tan
pigeons 60 are white gulls 0 are tan gulls So 25
of white birds are pigeons
p
25
100
s
g
75
0
w
t
Note, all columns sum to 100
c
45
This is called theConditional Probability of s
given cand is writtenP(sc)
46
Beware!P(cs) ? P(sc)
p
p
50
50
25
100
s
s
g
100
0
g
75
0
w
t
w
t
c
c
47
note

P(s,c) P(sc) P(c)

25 of 80 is 20
?

w
t
c
48
and

P(s,c) P(cs) P(s)

50 of 40 is 20
p
?

s
g
49
and if

P(s,c) P(sc) P(c) P(cs) P(s)

then
P(sc) P(cs) P(s) / P(c) and P(cs) P(sc)
P(c) / P(s) which is called Bayes Theorem
50
Why Bayes Theorem is important Consider the
problem of fitting a straight line to data, d,
where the intercept and slope are given by the
vector m. If we guess m and use it to predict d
we are doing something like P(dm) But if we
observe d and use it to estimate m then we are
doing something like P(md) Bayes Theorem
provides a framework for relating what we do to
get P(dm) to what we do to get P(md)
51

Expectation
Variance
And
Covariance
Of a multivariate distribution

52
The expected value of x1 and x2 are calculated in
a fashion analogous to the one-variable
caseE(x1)?? x1 p(x1,x2) dx1dx2 E(x2)?? x2
p(x1,x2) dx1dx2
Note E(x1) ?? x1 p(x1,x2) dx1dx2 ? x1
?p(x1,x2)dx2 dx1 ? x1 p(x1) dx1 So the
formula really is just the expectation of a
one-variable distribution
53
The variance of x1 and x2 are calculated in a
fashion analogous to the one-variable case,
toosx12?? (x1-x1)2p(x1,x2) dx1dx2 with
x1E(x1)and similarly for sx22
Note, once again sx12?? (x1-x1)2p(x1,x2)
dx1dx2 ? (x1-x1)2 ?p(x1,x2) dx2 dx2 ??
(x1-x1)2p(x1) dx1 So the formula really is just
the variance of a one-variable distribution
54
Note that in this distribution if x1 is bigger
than x1, then x2 is bigger than x2 and if x1 is
smaller than x1, then x2 is smaller than x2
x2
This is a positive correlation
x2
x1
x1
Expected value
55
Conversely, in this distribution if x1 is bigger
than x1, then x2 is smaller than x2 and if x1
is smaller than x1, then x2 is smaller than x2
x2
This is a negative correlation
x2
x1
x1
Expected value
56
This correlation can be quantified by multiplying
the distribution by a four-quadrant function
x2
-

x2
-

x1
x1
And then integrating. The function
(x1-x1)(x2-x2) works fine
cov(x1,x2) ?? (x1-x1) (x2-x2) p(x1,x2) dx1dx2
Called the covariance
57
Note that the vector x with elements xi
E(xi)?? xi p(x1,x2) dx1dx2 is the expectation
of xand the matrix Cx with elementsCij ??
(xi-xi) (xj-xj) p(x1,x2) dx1dx2has diagonal
elements equal to the variance of xiCxii
sxi2andoff-diagonal elements equal to the
covariance of xi and xjCxij cov(xi,xj)
58
Center of multivatiate distribution
xWidth and Correlatedness of multivariate
distributionsummarized a lot but not
everything about a multivariate distribution
59
Functions of a set of random variables, x
A vector of of N random variables in a vector, x
60
given y(x)Do you remember how to transform the
integral???? p(x) dNx ???? ? dNy
61
given y(x)then???? p(x) dNx ????
px(y) dx/dy dNy
Jacobian determinant, that is, the determinant of
matrix Jij whose elements are dxi/dyj
62
But heres something thats EASIER

Suppose y(x) is a linear function yMx
Then we can easily calculate the expectation of y
yi E(yi) ? ? yi p(x1 xN) dx1dxN
? ? SMijxj p(x1 xN) dx1 dxN
S Mij ? ? xj p(x1 xN) dx1 dxN
S Mij E(xi) S Mij xi

So yMx
63

And we can easily calculate the covariance
Cyij ? ? (yi yi) (yj yj) p(x1,x2) dx1dx2
? ? SpMip(xp xp) SqMjq (xq xq) p(x1xN)
dx1dxN
SpMip SqMjq ? ? (xp xp) (xq xq) p(x1xN)
dx1 dxN
SpMip SqMjq Cxpq

So Cy M Cx MT
Memorize!
64
Note that these rules work regardless of the
distribution of xif y is linearly related to x,
yMx then yMx (rule for means) Cy M Cx
MT(rule for propagating error)

Write a Comment

User Comments (0)