Workshop in R - PowerPoint PPT Presentation

About This Presentation
Title:

Workshop in R

Description:

The best y transformation to optimize the model fit (highest ... Least squares: arcsine transformations. GLMs: use logit (or probit) link with binomial errors ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 29
Provided by: Dia5202
Category:
Tags: arcsine | workshop

less

Transcript and Presenter's Notes

Title: Workshop in R


1
Workshop in R GLMs 3
  • Diane Srivastava
  • University of British Columbia
  • srivast_at_zoology.ubc.ca

2
Housekeeping
  • ls() asks what variables are in the global
    environment
  • rm(listls()) gets rid of EVERY variable
  • q() quit, get a prompt to save workspace or
    not

3
harddens
4
hard0.45dens
5
log(hard)dens
6
Janka exercise
  • Conclusion
  • The best y transformation to optimize the model
    fit (highest log likelihood)
  • ..is not the best y transformation for normal
    residuals

7
This workshop
  • Linear, general linear, and generalized linear
    models.
  • Understand how GLMs work Excel simulation
  • Definitions e.g. deviance, link functions
  • Poisson GLMsR exercise
  • Binomial distribution and logistic regression
  • Fit GLMs in R! Exercise

8
In the beginning there were
  • Linear models a normally-distributed y fit to a
    continuous x

Y x 1.2 0 1.3 0 1.1 1 0.9 1
9
Then there were
  • General Linear Models a normally-distributed y
    fit to a continuous OR categorical x

10
Generalized linear models
No more need for tedious transformations!
All variances are unequal, but some are more
unequal than others
Distribution solution !
Because most things in life arent normal !
11
What linear models do
Y
X
  • Transform y
  • Fit line to transformed y
  • Back transform to linear y

12
What GLMs do
Log fitted values
Y
X
X
  • Start with an arbitrary fitted line
  • Back-transform line into linear space
  • Calculate residuals
  • Improve fitted line to maximize likelihood

Many iterations
13
Maximum likelihood
  • Means that an iterative process is used to find
    the model equation that has the highest
    probability (likelihood) of explaining the y
    values given the x values.
  • Equation for likelihood depends on the error
    distribution chosen
  • Least squares by contrast minimizes
    variation from the model.
  • If the data are normally distributed, maximum
    likelihood gives the same answer as least squares.

14
GLM simulation exercise
  • Simulates fitting a model with normal errors and
    a log link to data.
  • Your task
  • understand how the spreadsheet works
  • find through an iterative process the best slope

15
Generalized linear models
  • In least squares, we fit
  • ymx b error
  • In GLM, the model is fit more indirectly
  • yg(mx b error)
  • where g is a function, the inverse of which is
    called the link function
  • link fn(expected y) mx b error

16
LMs vs GLMs
  • Uses maximum likelihood
  • Specify one of several distributions
  • Based on deviance
  • Fits model to untransformed y by means of a link
    function
  • Uses least squares
  • Assumes normality
  • Based on Sum of Squares
  • Fits model to transformed y

17
All that really matters
  • By using a log link function, we do not need to
    calculate log(0).
  • Be careful! A log link model predicts log y not
    y!
  • Error distribution need not be normal Poisson,
    binomial, gamma, Gaussian (normal)

18
Exercise
  • 1. Open up the file Rlecture.csv
  • dianelt-read.table(file.choose(),sep,",headerTRU
    E)
  • 2. Look at dataframe. Make treat a factor
    (treat)
  • 3. Fit this model
  • my.first.glmlt-glm(growthsizetreat,
    familypoisson (linklog), datadiane)
    summary(my.first.glm)
  • 4. Model dignostics
  • par(mfrowc(2,2)) plot(my.first.glm)

19
Overdispersion
Underdispersed
Overdispersed
Random
20
Overdispersion
  • Is your residual deviance residual df
    (approx.)?
  • If residual devgtgtresidual df, overdispersed.
  • If residual devltltresidual df, underdispersed.
  • Solution
  • second.glmlt-glm(growthsizetreat, family
    quasipoisson (linklog), datadiane)
    summary(second.glm)

21
Options
  • family default link other links
  • binomial logit probit, cloglog
  • gaussian identity
  • Gamma -- identity,inverse,
  • log
  • poisson log identity, sqrt

22
Rlecture.csv
23
Binomial errors
  • Variance gets constrained near limits binomial
    accounts for this
  • Type 1 Classic example series of trials
    resulting in success (value1) or failure
    (value0).
  • Type 2 Also continuous but bounded (e.g.
    mortality bounded between 0 and 100).

24
Logistic regression
  • Least squares arcsine transformations
  • GLMs use logit (or probit) link with binomial
    errors

y
x
25
Logit
  • p proportion of successes
  • If p eaxb / (1 eaxb) calculate
  • loge(p/1-p)

26
Logits continued
  • Output from logistic regression with logit link
    predicted loge (p/1-p) abx
  • To obtain any expected values of p, need to input
    a and b in original equation
  • p eaxb / (1 eaxb)

27
Binomial GLMs
  • Type 1 binomial
  • Simply set family binomial (linklogit)
  • Type 2 binomial
  • First create a vector of not parasitized.
  • Then cbind into a matrix ( parasitized, not
    parasitized)
  • Then run your binomial glm (link logit) with
    the matrix as your y.

28
Homework
  • Fit the binomial glm survival sizetreat
  • 2. Fit the bionomial glm parasitism sizetreat
  • 3. Predict what size has 50 parasitism in
    treatment 0
Write a Comment
User Comments (0)
About PowerShow.com