Revision of last lecture - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

Revision of last lecture

Description:

A RR of 1 means that the risk of the event is equal in the groups ... When the variable is binary, a logit or probit transformation can be applied to the data ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 59
Provided by: jxmol
Category:

less

Transcript and Presenter's Notes

Title: Revision of last lecture


1
Revision of last lecture
2
Relative Risk
  • For cohort studies the relative risk is used.
  • The relative risk is the risk or rate of disease
    in exposed group relative to the risk or rate of
    disease in the unexposed group.
  • The prevalence of disease can be found using a
    cohort study
  • A RR of 1 means that the risk of the event is
    equal in the groups being compared.
  • A RR gt1 suggests the event (disease) is more
    common in the exposed group than the non-exposed
  • An odds ratio lt1 suggests the event is less
    common in the exposed group than the non-exposed

3
Odds ratio
  • For case-control studies the odds ratio is most
    often used to show relationships.
  • The odds ratio is the ratio of the odds of
    exposure in the diseased group compared to the
    odds of exposure in the non-diseased group.
  • An odds ratio of 1 means that the odds of the
    event is equal in the groups being compared.
  • An odds ratio gt1 suggests the event (exposure) is
    more common in the diseased group than the
    non-diseased
  • An odds ratio lt1 suggests the event is less
    common in the diseased group than the
    non-diseased

4
General comments
  • The odds, risks and rates are the probabilities
    or chances of an individual having a particular
    event such as death or an illness.
  • The odds ratio and the relative risk are very
    similar when a disease or exposure is rare.

5
Transformations and logistic regression
  • Gordon Prescott

6
Two separate topics
  • Two entirely different methods for entirely
    different situations
  • Transformations
  • if continuous health outcome data are not
    Normally distributed
  • if relationship between two continuous / discrete
    variables is not linear, e.g. curve
  • Logistic regression
  • if health outcome is binary

7
Why transform?
  • Normalise the distribution of the data
  • Eliminate heterogeneity of variance
    (non-constant variance)
  • To linearise a relationship between two
    continuous/ discrete variables

8
Types of transformations
  • Logarithmic (can use natural log also known as
    loge, or log to the base 10, log10).
  • Square root (?x)
  • Power (x2, x3)
  • Inverse or reciprocal (1/x)
  • Logit
  • that is loge (p/1-p) which is the log of the odds
  • There are some other transformations but these
    are the most common

9
Advantage and disadvantage of transformations
  • Advantage
  • May allow the use of parametric methods
  • More informative output
  • Higher power
  • Disadvantage
  • Careful interpretation of the units of the
    transformed variable is required

10
Transformations
  • If the data is positively skewed then we can try
    either a logarithmic, reciprocal or square root
    transformation depending on the extent of the
    problem in order to remedy the situation
  • Stretches small values squeezes large values
  • If the data is negatively skewed the we can try a
    power transformation (x2 or x3) depending on the
    extent of the problem
  • Squeezes small values stretches large values

11
Logarithmic transformation
  • Definition
  • U log10 X X 10U 100 102
  • U loge X X eU 100 e4.61

U ln X X invlog(U) 100 invlog(4.61)
expU exp4.61
12
Logarithmic transformation - Normality
Histogram of waiting time
13
Log (waiting time)
14
Unequal variability
The standard deviation for group 1 is much larger
than the SD in group 2. T-test assuming equal
variances would be inappropriate
15
Following log transformation
  • Group Mean SD
  • Waiting time (days)
  • 1 108.2 63.45
  • 2 57.4 37.65
  • Log transformed waiting time (log days)
  • 1 4.46 0.74
  • 2 3.86 0.61

16
Transformation and Interpretation
  • It is important to note that if there are several
    problems with the data we would only need to use
    one transformation to attempt to remedy the
    situation
  • The mathematical properties of the log
    transformation allow sensible interpretation of
    parameter and 95 CI
  • The other transformations (x2, inverse, etc) do
    not have such interpretation

17
Interpretation of logarithmic transformation
  • Waiting time example
  • After log transforming the waiting time data, we
    can compute the mean of the log waiting time
  • We do not want to report the mean waiting time in
    log days
  • Therefore we can back transform the data into the
    original units
  • Following a logarithmic transformation, we
    antilog (or exponentiate) the mean of the logged
    data
  • This mean is known as the geometric mean

18
Geometric mean
  • Waiting time (days)
  • Mean 81.0 days median 53.9 days
  • Log transformed waiting time (log days)
  • Mean 4.17 log days
  • Geometric mean exp (4.17) 64.7 days

19
Comparison of two groups
  • Two independent groups of patients are to be
    compared with respect to waiting time
  • Waiting time is positively skewed
  • A log transformation is applied to the data
  • An independent groups t-test is then applied to
    the log transformed data

20
Raw data and log transformed data
21
Output from SPSS
  • The mean log waiting time is 4.17 in group 1
    compared with 3.99 in group 2
  • The geometric mean waiting time is exp(4.17)
    64.7 days in groups 1 compared to 54.1 days in
    group 2

22
Independent t-test
  • Ho mean (log waiting time) in group 1 mean
    (log waiting time) in group 2
  • Is there a statistically significant difference
    in waiting time?

23
Independent t-test
  • Ho mean (log waiting time) in group 1 mean
    (log waiting time) in group 2
  • Is there a statistically significant difference
    in waiting time?

24
Independent t-test
  • Ho mean (log waiting time) in group 1 mean
    (log waiting time) in group 2
  • Is there a statistically significant difference
    in waiting time?
  • Mean difference in log waiting time 0.1768 log
    days with 95 confidence interval for mean
    difference (0.08 to 0.27) log days

25
Interpretation of t-test
  • However, it would be more informative to report
    in natural units i.e. days.
  • We therefore need to exponentiate (inverse log)
    results obtained.
  • First need to consider some mathematics.
  • Transformation has altered every value (x) into
    log x
  • Estimate of difference from independent t-test
    relates to
  • log x1 log x2 log (x1 / x2) mathematical
  • x1 and x2 are the mean in groups 1 and 2
    respectively
  • The inverse log (exponential) of the mean
    difference (in log scale) from the independent
    t-test is x1 / x2
  • This denotes the ratio of waiting time in group 1
    relative to the waiting time in group 2

26
  • Mean difference 0.1768 log days
  • Ratio of means exp(0.1789) 1.20
  • 95 confidence interval for mean difference (0.08
    to 0.27) log days
  • 95 confidence interval for ratio of means is
    (1.08 to 1.31)

27
Transformations for non-linearity
  • It is often found that the relationships between
    the dependent and explanatory (or independent)
    variables are non-linear
  • There are two approaches to modelling the
    relationship
  • Transform the variable(s) to produce a linear
    relationship
  • Curve fitting techniques
  • Polynomial regression

28
Transformations for non-linearity
  • When you have continuous variables any of the
    previous transformations described can be applied
    (e.g. logarithmic, square, inverse etc)
  • When the variable is binary, a logit or probit
    transformation can be applied to the data

29
Log transformation for non-linearity
30
Non-linear relationships
  • Curve fitting techniques
  • Polynomial regression
  • This approach incorporates the fact that the
    researcher is aware of the nature of the
    relationship between the two variables
  • The researcher uses this knowledge to decide on
    the most appropriate relationship

31
Curve fitting techniques
  • If the researcher did not know the exact nature
    of the relationship between the variables but
    they knew that the relationship was non-linear
    then curve-fitting techniques could be adopted
  • Many statistical packages have these facilities
    and there are packages whose sole function is to
    perform curve-fitting procedures

32
Polynomial regression
  • This type of regression is used if we have some
    idea about the nature of the relationship between
    the two variables
  • It would not be sensible to try and fit a
    straight line to data that obviously was
    non-linear
  • A polynomial regression model is of the form
  • y ? ?1X ?2X2 ?3X3 ...

33
Example
  • The rate of photosynthesis of an Antarctic
    species of grass was determined for a series of
    environmental temperatures
  • The aim of the researchers was to fit a line of
    best fit to this data so that they could use it
    to predict the temperature at which
    photosynthesis was maximum

34
Net photosynthesis rate versus temperature
100
80
60
Photo. rate in
40
20
0
-5
0
5
10
15
20
25
30
35
Temperature in Celsius
35
Example quadratic regression
36
Example quadratic regression
The coefficient for temperature squared
(-0.249) is statistically significant indicating
that a quadratic relationship appears to fit the
data
37
A note of caution
  • The regression equation we find for this data is
  • y 46.37 6.77 x x - 0.249 x x2
  • However when fitting this data in SPSS we will
    find that x and x2 are highly correlated
  • This can lead to collinearity problems
  • e.g. strange p-values
  • What we need to do is to centre the x-variable
    first by subtracting the mean value from the
    individual xs first to reduce the high
    correlation between x and x2

38
Collinearity and centering
39
A typical non-linear relationship
  • For example population growth can be represented
    by the equation
  • N(t) N(0) ert
  • Population at time t, N(t), is related to the
    size of the population at time 0, N(0),
    multiplied by the exponential rate of growth, r,
    and the time period involved, t.
  • The relationship is obviously non-linear we can
    tackle this problem by taking a logarithmic
    transformation.

40
Tackling non-linear relationships
  • This can be linearised by taking the natural logs
    of both sides of the equation
  • Taking the natural log of our population growth
    equation this gives
  • loge N(t) loge N(0) r x t

41
Logit transformation
  • To examine the relationship between presence (or
    absence) of disease with a number of risk factors
  • To compare two treatment groups with respect to
    re-treatment (yes/no) after correcting for
    characteristics of the patients (that may be
    unequally distributed across the two groups)
  • To generate a prediction model for recurrence of
    hernia based on characteristics of patients,
    treatment, etc.

42
Binary health outcomes
  • Logit transformation
  • Logistic regression
  • regression for a binary outcome

43
Probability of an event
  • pi is the probability of a particular outcome
    (e.g. success)
  • 0 ? pi ? 1
  • Require a function that will transform pi onto
    the range (- ? to ?)

44
Logit transformation
  • - ? ? logit(p) ? ?
  • That is the logit transformation will result in a
    variable that can take any value in a range (i.e.
    it is continuous)
  • Logit(p) logep/(1-p)
  • Recall, odds p/(1-p)
  • Therefore this is the log of the odds
  • It is known as the log odds

45
Log odds form of the logistic model
  • A very useful property of the logistic model is
    that by applying a transformation to both sides
    of the equation, it becomes linear
  • pi eab1x1
  • (1eab1x1)
  • Logit(p) log odds a bixi
  • To interpret the coefficients we need to
    transform them back to original units (i.e
    antilog / exponential)

46
Logistic Regression
  • This type of regression is used if the dependent
    variable is a binary (dichotomous) variable e.g.
    presence or absence of disease
  • Under these circumstances classic multiple
    linear regression is unsuitable
  • This method can be used to compare the
    characteristics of subjects with or without a
    particular disease
  • Very commonly used in epidemiological studies

47
Example of logistic regression
  • Information was available on 111 consecutive
    patients admitted to ICU. The researcher wished
    to investigate the relationship between the
    patients vital status (lived/died) and the
    patients characteristics on entry to ICU
  • Strategy for analysis
  • Crosstabulations
  • Calculation of odds of an event occurring
  • Logistic regression

48
Descriptive statistics
49
Crosstabulation
50
Interpretation of crosstabulation
  • Borderline statistically significant association
    between type of admission and whether a patient
    leaves the ICU alive
  • A greater proportion of emergency admission
    patients died (40) compared to elective
    admissions (12)
  • You can also express the association between type
    of admission and vital status using an odds ratio

51
Calculation of odds of dying
  • The odds of an emergency admission patient dying
    whilst in ICU
  • 38/56 0.679
  • The odds of an elective admission patient dying
    whilst in ICU
  • 2/15 0.133
  • Therefore, the odds ratio for type of admission
  • 0.679 / 0.133 5.11 
  • The odds of a patient dying whilst in ICU is 5
    times greater if the patient was admitted as an
    emergency rather than an elective patient

52
Logistic regression
  • Type of surgery (TYPE) 0elective, 1emergency
  • The coefficient for TYPE is 1.625. This
    indicates the difference in log odds between an
    elective and emergency admission is 1.625
  • The increase in the ODDS for an emergency
    admission is by a factor of exp(1.625)
  • That is the odds ratio (emergency/elective) is
    exp(1.625) 5.079.
  • This is indicated in the final column of the
    table above.
  • Recall, this is what was computed by hand for the
    2x2 table

53
How good is the model at predicting outcome?
  • The classification table indicates how good the
    logistic regression model is at predicting
    outcome
  • The model is better at predicting those who died
    and overall, 65 of cases are correctly predicted
    using type of admission alone

54
Multiple logistic regression
  • Logistic regression can be extended to consider a
    number of explanatory variables simultaneously
  • Explanatory variables can be continuous, ordinal
    or categorical
  • SPSS will define dummy variables for you
  • The relative contribution of each explanatory
    variable can be assessed
  • Increasing the number of explanatory variables
    may improve the prediction capability of the
    model
  • In our ICU example, age of patient is added to
    the model
  • The coefficient for type of admission has now
    been adjusted for the effect of age
  • This results in an adjusted odds ratio

55
Multiple logistic model
  • Including age in the model has resulted in an
    adjusted OR for type 6.5. Thus, after
    adjusting for age, the odds of dying for patients
    admitted as emergency cases is 6.5 times that of
    a patient admitted as an elective patient
  • After adjusting for age, patients admitted as
    emergency admissions are approximately 6.5 times
    more likely to die in ICU compared with those
    admitted as an elective

56
Why transform?
  • Normalise the distribution of the data
  • Eliminate heterogeneity of variance
    (non-constant variance)
  • To linearise a relationship between two variables
  • Log for positively skewed data the most useful
    and most easy to interpret

57
What is logistic regression?
  • Very similar to ordinary regression but
  • Dependent variable is binary (instead of
    continuous)

58
Uses of logistic regression
  • Use regression equation to predict probability of
    an outcome given values of explanatory variables
  • Determine which explanatory variables influence
    an outcome
  • Adjust analyses for confounding variables (e.g.
    age, gender, etc.)
Write a Comment
User Comments (0)
About PowerShow.com