Workshop in R - PowerPoint PPT Presentation

About This Presentation

Title:

Workshop in R

Description:

The best y transformation to optimize the model fit (highest ... Least squares: arcsine transformations. GLMs: use logit (or probit) link with binomial errors ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 29

Provided by: Dia5202

Category:

more less

Transcript and Presenter's Notes

Title: Workshop in R

1
Workshop in R GLMs 3

Diane Srivastava
University of British Columbia
srivast_at_zoology.ubc.ca

2
Housekeeping

ls() asks what variables are in the global
environment
rm(listls()) gets rid of EVERY variable
q() quit, get a prompt to save workspace or
not

3
harddens
4
hard0.45dens
5
log(hard)dens
6
Janka exercise

Conclusion
The best y transformation to optimize the model
fit (highest log likelihood)
..is not the best y transformation for normal
residuals

7
This workshop

Linear, general linear, and generalized linear
models.
Understand how GLMs work Excel simulation
Definitions e.g. deviance, link functions
Poisson GLMsR exercise
Binomial distribution and logistic regression
Fit GLMs in R! Exercise

8
In the beginning there were

Linear models a normally-distributed y fit to a
continuous x

Y x 1.2 0 1.3 0 1.1 1 0.9 1
9
Then there were

General Linear Models a normally-distributed y
fit to a continuous OR categorical x

10
Generalized linear models
No more need for tedious transformations!
All variances are unequal, but some are more
unequal than others
Distribution solution !
Because most things in life arent normal !
11
What linear models do
Y
X

Transform y
Fit line to transformed y
Back transform to linear y

12
What GLMs do
Log fitted values
Y
X
X

Start with an arbitrary fitted line
Back-transform line into linear space
Calculate residuals
Improve fitted line to maximize likelihood

Many iterations
13
Maximum likelihood

Means that an iterative process is used to find
the model equation that has the highest
probability (likelihood) of explaining the y
values given the x values.
Equation for likelihood depends on the error
distribution chosen
Least squares by contrast minimizes
variation from the model.
If the data are normally distributed, maximum
likelihood gives the same answer as least squares.

14
GLM simulation exercise

Simulates fitting a model with normal errors and
a log link to data.
Your task
understand how the spreadsheet works
find through an iterative process the best slope

15
Generalized linear models

In least squares, we fit
ymx b error
In GLM, the model is fit more indirectly
yg(mx b error)
where g is a function, the inverse of which is
called the link function
link fn(expected y) mx b error

16
LMs vs GLMs

Uses maximum likelihood
Specify one of several distributions
Based on deviance
Fits model to untransformed y by means of a link
function

Uses least squares
Assumes normality
Based on Sum of Squares
Fits model to transformed y

17
All that really matters

By using a log link function, we do not need to
calculate log(0).
Be careful! A log link model predicts log y not
y!
Error distribution need not be normal Poisson,
binomial, gamma, Gaussian (normal)

18
Exercise

1. Open up the file Rlecture.csv
dianelt-read.table(file.choose(),sep,",headerTRU
E)
2. Look at dataframe. Make treat a factor
(treat)
3. Fit this model
my.first.glmlt-glm(growthsizetreat,
familypoisson (linklog), datadiane)
summary(my.first.glm)
4. Model dignostics
par(mfrowc(2,2)) plot(my.first.glm)

19
Overdispersion
Underdispersed
Overdispersed
Random
20
Overdispersion

Is your residual deviance residual df
(approx.)?
If residual devgtgtresidual df, overdispersed.
If residual devltltresidual df, underdispersed.
Solution
second.glmlt-glm(growthsizetreat, family
quasipoisson (linklog), datadiane)
summary(second.glm)

21
Options

family default link other links
binomial logit probit, cloglog
gaussian identity
Gamma -- identity,inverse,
log
poisson log identity, sqrt

22
Rlecture.csv
23
Binomial errors

Variance gets constrained near limits binomial
accounts for this
Type 1 Classic example series of trials
resulting in success (value1) or failure
(value0).
Type 2 Also continuous but bounded (e.g.
mortality bounded between 0 and 100).

24
Logistic regression