Bayesian: Single Parameter

About This Presentation

Title:

Bayesian: Single Parameter

Description:

Bayesian: Single Parameter Prof. Nur Iriawan, PhD. Statistika FMIPA ITS, SURABAYA 21 Februari 2006 Frequentist Vs Bayesian (Casella dan Berger, 1987) Grup ... – PowerPoint PPT presentation

Number of Views:267

Avg rating:3.0/5.0

Slides: 98

Provided by: NurIr

Category:

more less

Transcript and Presenter's Notes

Title: Bayesian: Single Parameter

1
Bayesian Single Parameter

Prof. Nur Iriawan, PhD.
Statistika FMIPA ITS, SURABAYA
21 Februari 2006

2
Frequentist Vs Bayesian (Casella dan Berger,
1987)

Grup Frequentist
Grup yang mendasarkan diri pada cara klasik MLE,
Moment, UMVUE, MSE, dll
Pendekatan analitis selalu sebagai solusi
Grup Bayesian
Grup yang mendasarkan diri pada cara Bayesian
Pendekatan numerik serta komputasi secara
intensif
Inference lebih didasarkan pada kemungkinan
muncul terbesar

3
Teorema Bayes(Thomas Bayes, 1702-1761)
4
Model Bayesian(Box dan Tiao, 1973), (Zellner,
1971), (Gelman, Stern, Carlin, dan Rubin, 1995)

Mengacu pada bentuk proporsional
Yang dibentuk sebagai
Bahwa data yang dibentuk sebagai likelihood
digunakan sebagai bahan untuk meng-update
informasi prior menjadi sebuah informasi
posterior yang siap untuk digunakan sebagai bahan
inferensi.

5
Bayesian Parameter juga diperlakukan sebagai
variabel

Dalam Bayesian semua parameter dalam model
diperlakukan sebagai variabel
Prinsip berfikir sebagai bentuk Full Conditional
Distribution digunakan untuk mempelajari
karakteristik setiap parameter
Dibedakan antara simbol penyajian likelihood data
dan Full Conditional Distribution.

6
Motivasi Bayesian

Theorema Bayes
Thomas Bayes
Pada bentuk lain jika adalah
suatu r.v yang independen dengan ? adalah
parameternya, maka

P(B) adalah konstan
7
Example the Icy Road Case

Ice Is there an icy road?
Values Yes, No
Initial Probabilities (.7, .3)
Watson Does Watson have a car crash?
Values Yes, No
Probabilities (.8, .2) if IceYes, (.1, .9) if
IceNo.

8
Icy Road Conditional Probabilities
Watson
No
Yes
Ice
.2
Yes
.8
.9
.1
No
p(Watsonnoiceyes)
p(WatsonyesIceyes)
9
Icy Road Likelihoods
Note 8/1 ratio
Watson
No
Yes
Ice
p(WatsonyesIceyes)
.2
Yes
.8
.9
.1
No
p(WatsonyesIceno)
10
Icy Road Bayes TheoremIf Watson yes --
Before Normalizing
Prior Likelihood µ Posterior
Sum .59. Need to divide through by this
normalizing constant to get probabilities.
11
Icy Road Bayes TheoremIf Watson yes
Prior Likelihood µ Posterior
Posterior probabilities -- each term in the
product divided through by the normalizing
constant .59.
12
Contoh pada kasus Normal

Representasi alami suatu distribusi
Normal(µ,s2) atau N(µ,s2)

?
Mana representasi yang representatif ?
13

Apa perbedaan antara penyajian berikut ini?

?
14
Plot variabel x, µ dan s dalam full conditional
Normal
µ
x
µ
s
s
15
Interval vs Highest Posterior Density (HPD)(Box
dan Tiao, 1973),(Gelman et.al, 1995), (Iriawan,
2001)

Pembentukan interval konfidensi pada frequentist
adalah sbb
Pembentukan interval konfidensi pada Bayesian
didekati dengan HPD.

16
Representasi Kesamaan Densitas(Iriawan, 2001)
17
Compromise dalam Control Chart
18
HPD pada Control Chart Individu
Peta Kendali (1-?) x 100 Batas Kendali Bawah Batas Kendali Atas
95,0 71,3953 109,481
97,5 64,4857 110,915
99,0 55,3356 112,775
19
Contoh Kasus pada Bernoulli

Seperti halnya pada Normal sebelumnya,
xBer(xp) disajikan sbb
dimana pada frequentist, p dianggap konstan
Bagaimana jika karena situasi dan tempat
pengamatan yang berbeda dan diperoleh p
berubah-ubah? Prinsip Bayesian, p akan
diperlakukan menjadi sebuah variabel agar
mempunyai kemampuan akomodatif pada keadaan
seperti di atas.

Anggap p berubah sesuai dengan distribusi
Beta(a,ß), seperti berikut
dengan
apa yang akan terjadi?

Anggap satu pengamatan bernoulli telah dilakukan,
maka posterior distribusinya adalah sbb

Sesuai dengan spesifikasi fungsi Beta, maka
penyebut dapat diproses sbb

Sehingga distribusi posterior untuk p setelah
satu observasi tersebut adalah

24
Estimator Bayes

Bayesian estimate dari p dapat diperoleh dengan
meminimumkan loss function. Beberapa loss
functions dapat digunakan, tetapi disini akan
digunakan quadratic loss function yang konsisten
dengan mean square errors (MSE)
Secara umum, estimasi ? dengan pendekatan Bayes
sbb ((Carlin and Louis, 1996), and (Elfessi and
Reineke, 2001))

Dengan memperlakukan expektasi pada posterior
distribution diperoleh

Seperti sebelumnya, diselesaikan integral
tersebut dengan membuat variabel baru aax1
dan bb-x1. Integralnya akan memberikan hasil
sbb

Dengan menggunakan penyederhanaan seperti berikut
Maka,
Atau

Ingat hasil ini kembali pada saat pembahaan
Compromising Bayesian dengan Classical Approaches
28

Pengembangan hasil ini ke bentuk n buah percobaan
Bernoulli akan menghasilkan sebanyak y sukses
memberikan hasil
Dimana y adalah jumlah sukses dari observasi
setiap bernoulli x. Nilai taksiran y adalah
sebagai berikut

Ingat hasil ini kembali pada saat pembahaan
Compromising Bayesian dengan Classical Approaches
29
Prior dan Metode Bayesian(Gelman et.al, 1995)
Karena parameter ? diperlakukan sebagai variabel
maka dalam Bayesian ? akan mempunyai nilai dalam
domain ?, dengan densitas f ?(?). Dan densitas
inilah yang akan dinamakan sebagai distribusi
prior dari ? . Dengan adanya informasi prior
yang dipadukan dengan data / informasi saat itu,
X, yang digunakan dalam membentuk posterior ? ,
maka penghitungan posteriornya akan semakin
mudah, yaitu hanya dengan menghitung densitas
bersyarat dari ? diberikan oleh Xx . Kritikan
pada Bayesian biasanya terfokus pada legitimacy
dan desirability untuk menggunakan ? sebagai
random variabel dan ketepatan mendefinisikan/memil
ih distribusi prior-nya.
30
Bentuk Prior, Likelihood, dan Posterioryang ideal
?
31
Bagaimana jika pemilihan priornya seperti berikut
ini?
Pemilihan prior seperti ini akan Merupakan sebuah
misleading prior, Sehingga posteriornya tidak
akan Jelas bentuknya.
?
Likelihood
Posterior
Prior
?
32
Prior yang serba sama densitasnya di semua domain
Likelihood
posterior
prior
?
33
Interpretasi distribusi Prior

Sebagai bentuk distribusi frequency
Sebagai bentuk representasi normatif dan
objectif pada suatu parameter yang lebih rasional
untuk dipercayai
Sebagai suatu representasi subjectifitas
seseorang dalam memandang sebuah parameter
menurut penilainnya sendiri

34
Prior sebagai representasi Frequensi Distribusi

Adakalanya nilai suatu parameter dibangkitkan
dari modus pola data sebelumnya baik itu dari
pola simetri ataupun tidak simetri
Dalam sebuah inspeksi dalam proses industri, data
kerusakan pada batch sebelumnya biasanya akan
digunakan sebagai estimasi informasi prior untuk
keadaan batch selanjutnya
Prior biasanya mempunyai arti fisik sesuai dengan
frequensi kejadian data-datanya

35
Interpretasi Normative/Objective dari suatu prior

Permasalahan pokok agar prior dapat interpretatif
adalah bagaimana memilih distribusi prior untuk
suatu parameter yang tidak diketahui namun sesuai
dengan permasalahan fisik yang ada.
Jika ? hanya mempunyai nilai-nilai pada range
yang tertentu saja, hal ini cukup beralasan jika
digunakan prior yang mempunyai densitas serba
sama (equally likelly / uniformly distributed).
Interpretasinya adalah bahwa setiap kondisi
diberi kesempatan yang sama untuk dapat terpilih
sebagai suporter likelihood dalam membentuk
posteriornya.
Prior dapat mempunyai arti yang sangat janggal
jika salah dalam pemilihannya

36
Kasus prior dalam Continuous Parameters

Invariance arguments.
Hal ini akan dapat terjadi, sebagai contoh dalam
kasus Normal mean m, dapat diartikan bahwa semua
titik dalam semua interval (a,ah) harus
mempunyai probabilitas prior untuk semua h dan a
yang diketahui. Hal ini akan memberikan
pengertian bahwa untuk semua titik dalam interval
tersebut mempunyai kesempatan sama terpilih atau
cenderung mempunyai uniform prior (improper
prior)
Untuk parameter, s, dalam interval (a,ka) akan
mempunyai prior probabilitas yang sama, yang hal
ini akan memberikan arti bahwa priornya akan
proportional pada nilai 1/ s. Lagi-lagi hal ini
juga menghasilkan sebuah improper prior.

37
Macam-macam Prior

Conjugate prior vs non-conjugate prior ((Box dan
Tiao, 1973),(Gelman et.al, 1995), (Tanner, 1996),
(Zellner, 1971))
Prior terkait dengan pola model likelihood
datanya
Proper prior vs Improper prior (Jeffreys prior)
Prior terkait pada pemberian pembobotan/ densitas
di setiap titik, uniformly distributed atau tidak
Informative prior vs Non-Informative prior
Prior terkait dengan sudah diketahui
pola/frekuensi distribusi dari datanya atau belum
Pseudo-prior (Carlin dan Chib, 1995)
Prior terkait dengan pemberian nilainya yang
disetarakan dengan hasil elaborasi dari
frequentist (misal regresi dengan OLS)

38
Continuous Parameters

Biasanya digunakan uniform prior (at least if the
parameter space is of finite extent)
Tetapi jika ? adalah uniform, maka suatu bentuk
fungsi non-linear dari ?, g(?), tidak akan
uniform
Contoh jika p(?)1, ?gt0. Re-parameterisasi
sebagai
maka dimana
sehingga
ignorance about ? does not imply ignorance
about g. The notion of prior ignorance may
be untenable (mungkin dapat diperbolehkan)?

Turning this process around slightly, Bayesian
analysis assumes that we can make some kind of
probability statement about parameters before we
start. The sample is then used to update our
prior distribution.

Pertama, anggap bahwa prior yang digunakan dapat
direpresentasikan sebagai probability density
function p(q) dengan q adalah parameter yang
akan dipelajari.
Berdasarkan pada sampel X (likelihood function)
kita akan dapat meng-update distribusi priornya
mengguankan Bayes rule

41
Beberapa Conjugate priors
42
The Jeffreys Prior(single parameter)

Jeffreys prior diberikan sebagai berikut
dimana
adalah expected Fisher Information
This is invariant to transformation in the sense
that all parametrizations lead to the same prior
Can also argue that it is uniform for a
parametrization where the likelihood is
completely determined (see Box and Tiao, 1973,
Section 1.3)

43
Contoh Jeffreys pada Binomial
Hasil ini adalah suatu bentuk distribusi beta
dengan parameters ½ and ½
44
Contoh Jeffreys Priors yang lain
45
Improper Priors ? Trouble Posterior (sometimes)

Suppose Y1, .,Yn are independently normally
distributed with constant variance s2 and with
Suppose it is known that r is in 0,1, r is
uniform on 0,1, and g, b, and s have improper
priors
Then for any observations y, the marginal
posterior density of r is proportional to
where h is bounded and has no zeroes in 0,1.
This posterior is an improper distribution on
0,1!

46
Improper prior usually ? proper posterior
?
47
Contoh lain improper ?proper
48
Subjective Degrees of Belief

Probability represents a subjective degree of
belief held by a particular person at a
particular time
Various techniques for eliciting subjective
priors. For example, Goods device of imaginary
results.
e.g. binomial experiment. beta prior with ab.
Imagine the experiment yields 1 tail and n-1
heads. How large should n be in order that we
would just give odds of 2 to 1 in favor of a head
occurring next? (eg n 4 implies ab1)

49
Problems with Subjectivity

What if the prior and the likelihood disagree
substantially?
The subjective prior cannot be wrong but may be
based on a misconception
The model may be substantially wrong
Often use hierarchical models in practice

50
Hierarchical Model

Contoh pada kasus Binomial

Gamma(c, d)
Gamma(g, h)
Gamma(e, f)
Beta(a, b)
Poisson(?)
Binomial(n, p)
51
General Comments

Determination of subjective priors is difficult
Difficult to assess the usefulness of a
subjective posterior
Dont be misled by the term of subjective
all data analyses involve appreciable personal
elements

52
Once againAn example with a continuous
variable A beta-binomial example

The setup We are flipping a biased coin, where
the probability of heads p could be anywhere
between 0 and 1. We are interested in p. We
will have two sources of information
Prior beliefs, which we will express as a beta
distribution, and
Data, which will come in the form of counts of
heads in 10 independent flips.

53
An example with a continuous variable A
beta-binomial example--the Prior Distribution

The prior distribution
Lets suppose we think it is more likely that
the coin is close to fair, so p is probably
nearer to .5 than it is to either 0 or 1. We
dont have any reason to think it is biased
toward either heads or tails, so well want a
prior distribution that is symmetric around .5.
Were not real sure about what p might be--say
about as sure as only 6 observations. This
corresponds to 3 pseudo-counts of H and 3 of T,
which, if we want to use a beta distribution to
express this belief, corresponds to beta(4,4)

54
An example with a continuous variable A
beta-binomial example--the Prior Distribution

Beta. Defined on 0,1. Conjugate prior for the
probability parameter in Bernoulli binomial
models.
p dbeta(4,4)
Mean(p)
Variance(p)
Mode(p)

PseudoCount of successes
PseudoCount of failures
The variable success probability
The failure probability
Shape, or prior sample info
The success probability
55
An example with a continuous variable A
beta-binomial example--the Likelihood

The likelihood
Next we will flip the coin ten times. Assuming
the same true (but unknown to us) value of p is
in effect for each of ten independent trials, we
can use the binomial distribution to model the
probability of getting any number of heads i.e.,

Count of observed successes
The variable
Count of observed failures
The success probability parameter
The failure probability
The success probability
56
An example with a continuous variable A
beta-binomial example--the Likelihood

The likelihood
We flip the coin ten times, and observe 7 heads
i.e., r7. The likelihood is obtained now using
the same form as in the preceding slide, except
now r is fixed at 7 and we are interested in the
relative value of this function at different
possible values of p

57
An example with a continuous variable Obtaining
the posterior by Bayes Theorem
posterior likelihood prior

General form
In our example, 7 plays the role of x, and p
plays the role of y. Before normalizing
After normalizing

Now, how can we get an idea of what this means we
believe about p after combining our prior belief
and our observations?
58
An example with a continuous variable In pictures
Prior x Likelihood Posterior
59
An example with a continuous variable Using the
fact that we have conjugate distributions
Now
This is just the kernel of a beta(11,7)
distribution. This is rather special. The data
were observed in accordance with a probability
function which would have that same mathematical
form as a likelihood once data are observed. We
chose a prior distribution (in this case, a beta
distribution) which would combine with the
likelihood just so as to produce another
distribution in the same parametric family
(another beta distribution), just with updated
parameters. We can work out its summary
statistics

Mean(p) Variance(p)
Mode(p)
prior was .5
.028
.5

60
An example with a continuous variable Using BUGS
Now
What BUGS does in this simple problem with one
variable is to sample lots of values from the
posterior distribution for p that is, its
distribution as determined first with information
from the prior, but further conditional on the
observed data. Here are the summary statistics
from 50000 draws

Mean(p) Variance(p)
Mode(p)
prior was .5
.028
.5

.11162.0125
61
An example with a continuous variable Using BUGS

BUGS setup for this problem

62
Looking ahead to sampling-based approaches with
many variables

BUGS Bayesian-inference Using Gibbs Sampling
Basic idea Model multi-parameter problem in
terms of assemblies of distributions and
functions for all data and all parameters (taking
advantage of conditional dependence whenever
possible).
E.g., p(Datax,y) p(xz) p(y) p(z). ()
Observe Data Posterior p(x,y,zData) is
proportional to (). Hard to evaluate
normalizing constant, but ...

63
Looking ahead to sampling-based approaches with
many variables

Can draw values from full conditional
distributions
Start with a possible value for each variable in
cycle 0.
In cycle t1,
Draw xt1 from p(xY yt,Z zt,Data)
Draw yt1 from p(yX xt1,Z zt,Data)
Draw zt1 from p(zX xt1,Y yt1,Data)
Under suitable conditions, these series of draws
will come to approximate draws from the actual
true joint posterior for all the parameters.

64
Inference in a chain
Recursive representation
p(u,v,x,y,z) p(zy,x,v,u) p(yx,v,u) p(xv,u)
p(vu) p(u) p(zy)
p(yx) p(xv) p(vu) p(u).
U
V
X
Y
Z
p(zy)
p(yx)
p(xv)
p(vu)
65
Inference in a chain
Suppose we learn the value of X
Start here, by revising belief about X
U
V
X
Y
Z
p(zy)
p(yx)
p(xv)
p(vu)
66
Inference in a chain
Propagate information down the chain using
conditional probabilities
From updated belief about X, use conditional
probability to revise belief about Y
U
V
X
Y
Z
p(zy)
p(yx)
p(xv)
p(vu)
67
Inference in a chain
Propagate information down the chain using
conditional probabilities
From updated belief about Y, use conditional
probability to revise belief about Z
U
V
X
Y
Z
p(zy)
p(yx)
p(xv)
p(vu)
68
Inference in a chain
Propagate information up the chain using Bayes
Theorem
From updated belief about X, use Bayes Theorem to
revise belief about V
U
V
X
Y
Z
p(zy)
p(yx)
p(xv)
p(vu)
69
Inference in a chain
Propagate information up the chain using Bayes
Theorem
From updated belief about V, use Bayes Theorem to
revise belief about U
U
V
X
Y
Z
p(zy)
p(yx)
p(xv)
p(vu)
70
Inference in singly-connected nets
Singly connected There is never more than one
path from one variable to another variable.
Chains and trees are singly connected. Can use
repeated applications of Bayes theorem and
conditional probability to propagate
evidence. (Pearl, early 1980s)
V
U
X
Y
Z
71
Posterior Summaries

Mean, median, mode, percentile, etc.
Central 95 interval versus highest posterior
density region (normal mixture example)

72
Bayesian Confidence Intervals

Apart from providing an alternative procedure for
estimation, the Bayesian approach provides a
direct procedure for the formulation of parameter
confidence intervals.
Returning to the simple case of a single coin
toss, the probability density function of the
estimator becomes

As previously discussed, try to give ab1.4968,
the Bayesian estimator of P is .6252.

However, using the posterior distribution
function, we can also compute the probability
that the value of p is less than .5 given a head
Please verify this result!
Hence, we have a very formal statement of
confidence intervals as P(0.3 lt p lt 0.7).

75
Prediction

Posterior Predictive Density of a future
observation
binomial example, n20, x12, a1, b1

?

y
y
76
Prediction for Univariate Normal
77
Prediction for Univariate Normal

Posterior Predictive Distribution is Normal

78
Prediction for a Poisson
79
On the Compromise of Bayesianto Classical
Estimation(presented on South-East Asia Stat
Math Muslim Society Conference)
Nur IriawanStatistics Department of Institut
Teknologi Sepuluh NopemberJl. Arief Rahman Hakim
Sukolilo, Surabaya 60111, Indonesiairiawann_at_sby.c
entrin.net.id
80
Example on Exponential

Suppose x is exponentially distributed
The MLE of is
81
Using Bayesian approach with prior of is
The likelihood would be
Then the posterior of given the data X is
82
The Bayes estimator for can be derived using
83
(No Transcript)
84
Numerical Calculation
One thousand generated data from Exponential
distribution, then The classical MLE give the
result (using MINITAB) as follows
85
Using WinBUGS, the Bayes estimator is
86
Lihat kembali hasil dari Binomial
Estimator Bayes diperoleh

Cara klasik memberikan hasil bahwa
Bagaimana jika a ß 0? Estimator Bayes akan
menjadi sama dengan cara klasik. Demikian halnya
jika nilai-nilai ini diterapkan pada prior beta,
maka prior tersebut akan berubah menjadi
sebuah Jeffreys prior.
87
Summary
The Bayesian estimator reported as the posterior
mean which is used here is generated from an
improper prior distribution. It has been shown
that when there is no information about the prior
of the parameter of model, a constant or
Jeffreys prior is used, the resulting estimator
will give a compromise result between Bayesian
and Classical estimator.
88
Numerical Integration Monte Carlo Method(Low
dan Kelton, 2000)

Anggap kita akan menghitung integral berikut
Jika g(x) cukup kompleks maka nilai I akan cukup
rumit. Dengan cara numerik seperti beriktu dapat
diperoleh nilai I dengan cukup sederhana.
Caranya adalah sbb

Buat random variabel baru
dengan x bernilai uniform dalam interval (a,b),
atau U(a,b).
Hitung ekspektasi Y dengan cara berikut

Diketahui bahwa
Sehingga nilai integral I dapat didekati secara
numerik oleh
Berarti, bangkitkan data yang
mempunyai distribusi Uniform dan masukkan
nilainya ke fungsi g(x) jumlahkan nilainya dan
hitung rata-ratanya sebagai taksiran nilai
integral yang sedang dicari.