Statistical Modelling - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Statistical Modelling

Description:

Suppose you have a response variable, y, that varies when independent factors or ... also lme(y~x), nls(y~x), nlme(y~x), loess(y~x), tree(y~x) Modelling Demonstration ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 19
Provided by: harry8
Category:

less

Transcript and Presenter's Notes

Title: Statistical Modelling


1
Statistical Modelling
  • Harry R. Erwin, PhD
  • School of Computing and Technology
  • University of Sunderland

2
Resources
  • Crawley, MJ (2005) Statistics An Introduction
    Using R. Wiley.
  • Gonick, L., and Woollcott Smith (1993) A Cartoon
    Guide to Statistics. HarperResource (for fun).

3
Statistical Modelling
  • Suppose you have a response variable, y, that
    varies when independent factors or measurements,
    x1, x2, , xN, vary. (One covariate can be
    written x.)
  • What you want is a model that predicts the value
    of y as a function of the xi.
  • Statistical models are written
  • g( h(y) f(xi) )
  • Where g() describes the modellm() or aov() is
    usual.
  • Where h() describes how to transform the response
    variable.
  • Where f(xi) describes what covariates you want in
    (and out) of the model. To add a covariate, you
    use , to remove it, -.

4
Fitting Models to Data is what R is designed to do
  • There are five kinds of models
  • The saturated modelone parameter per data
    pointyoure drawing the response line through
    all the data points. This tells you nothing.
  • The maximal modelcontaining all factors,
    interactions, and covariates of interest that can
    be fit given the available data. (You need at
    least three data points for fitting each
    covariate in your model.)
  • The current model, usually smaller than the
    maximal model.
  • The minimum adequate modelsmaller than the
    maximal model, but not significantly smaller.
  • The null modelone parameter, the overall mean
    response, ymean. (y1)

5
Definitions
  • Covariatean explanatory variable that is
    possibly predictive of the outcome under study.
  • Factora covariate that takes a finite number of
    values, in no specific order. A boolean value is
    a factor.
  • Continuous explanatory variablea numerical
    covariate. It may be restricted to integer
    values, to represent an ordering.
  • Interactiona covariate that involves more than
    one explanatory variable.
  • Powera covariate that involves a polynomial of
    degree 2 or greater in one or more continuous
    explanatory variables.

6
General Process of Fitting
  1. Fit the maximal model. Note down the residual
    deviance. Possibly check for overdispersion
    (advanced topic).
  2. Begin model simplification. Use update -, to
    remove the least significant terms first. Start
    with the highest-order interactions and powers.
  3. If the resulting increase in deviance is not
    significant, use the reduced model and continue
  4. If the increase in deviance is significant, go
    back to the unreduced model and look further.
  5. Repeat steps 3 and 4 until only significant terms
    remain.

7
Parsimony Suggests
  • Less parameters
  • Less explanatory variables
  • A linear model
  • A model without a hump (a power gt 1)
  • A model without interactions
  • A model with easily measured variables
  • A model that reflects how the process operates.

8
Actions you can take
  • Remove non-significant interactions, higher order
    terms, explanatory variables
  • Group together factor levels (advanced topic)
  • In ANCOVA (mixed models with continuous variables
    and factors), set slopes to zero if possible
  • Rescale (advanced) if necessary to give
  • constancy of variance
  • approximately normal errors
  • additivity
  • Don't go to extremes, though. If you torture the
    data, it will confess.

9
Model Formulae
  • response explanatory variables
  • The right hand side describes the variables,
    their interactions, and their non-linearityit
    isnt arithmetic!
  • To include something in the model something
  • To remove something from the model - something
  • Interactions are written (or AB)
  • y A B AB is the same as yAB)
  • Nesting / (A/B is AAB or ABinA)
  • y A/B
  • Conditioning is written
  • y x z

10
Expanded Forms
  • ABC is all the interactions up to ABC
  • A/B/C is ABinACinBinA
  • (ABC)3 is ABC
  • (ABC)2 is ABC - ABC
  • poly(x,n) is a polynomial regression of degree n
  • I(formula) means the formula as written in R.
  • 1 labels the intercept
  • Error(A/B/C) can be specified

11
Use of update
  • model lt- lm(yAB)
  • model2 lt- update(model, .-AB) to get rid of
    the interaction.
  • model2 is now lm(yAB)

12
Transforms (Advanced)
  • Sometimes you need to directly transform the left
    hand side to get constant variance. This looks
    like
  • lm(log(y) I(1/x) sqrt(z))
  • If the left hand side has constant variance
    already, you can use a link, defined by
    familysomething in the model formula, and a
    general linear model (glm)
  • Families available
  • Normal (bell-shaped curve, default)
  • poisson (count data)
  • binomial (proportions or binary data)
  • Gamma (special)

13
Factor/Parameter Transforms
  • Used to make the shape of variables closer to
    normal, and with constant variance.
  • More advanced topic, will be mentioned in the
    summary lecture. If it turns out you need to do
    this, Ill provide consulting support (X3227).

14
Types of Models
  • lm(yx)linear model (x is a continuous variable)
  • aov(yx)analysis of variance (x is a factor)
  • aov(yxz)analysis of covariance if x and z
    include a factor and a continuous explanatory
    variable.
  • glm(yx)general linear model (like lm but
    advanced)
  • Non-constant variance
  • Non-normal errors
  • Options include familypoisson, binomial, Gamma,
    Normal
  • gam(yx)generalised additive model (complex,
    advanced)
  • also lme(yx), nls(yx), nlme(yx), loess(yx),
    tree(yx)

15
Modelling Demonstration
  • A demonstration will be given in the analysis of
    covariance slides.

16
Model Checking (demoed later)
  • Plot the residuals against
  • fitted values
  • explanatory values
  • sequence of data collection
  • standard normal deviates
  • Demo of mcheck

17
mcheck
  • mcheck lt- function (obj, ... )
  • rslt-objresid
  • fvlt-objfitted
  • par(mfrowc(1,2))
  • plot(fv,rs,xlab"Fitted values",ylab"Residuals")
  • abline(h0, lty2)
  • qqnorm(rs,xlab"Normal scores",ylab"Ordered
    residuals")
  • qqline(rs,lty2)
  • par(mfrowc(1,1))
  • invisible(NULL)

18
Observations
  • Order of factor deletion will matter
  • Delete the high order interactions (AB), and
    high-order terms (I(x2)) first.
  • Then delete the remaining terms in decreasing
    order of importance.
  • The test you use is anova(), because youre
    comparing two models.
  • Be pragmatic.
Write a Comment
User Comments (0)
About PowerShow.com