Regression - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Regression

Description:

Regression. Petter Mostad. 2005.10.10. Some problems you might want to look at. Given the annual number of cancers of a certain type, over a few decades, make a ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 29
Provided by: uio
Category:

less

Transcript and Presenter's Notes

Title: Regression


1
Regression
  • Petter Mostad
  • 2005.10.10

2
Some problems you might want to look at
  • Given the annual number of cancers of a certain
    type, over a few decades, make a prediction for
    the future, with uncertainty.
  • There seems to be a connection between efficiency
    and size for Norwegian hospitals. Given data from
    many hospitals, determine if there is a
    connection, and what it is.
  • Investigate the connection between efficiency and
    a number of possible explanatory variables.

3
Connection between variables
We would like to study connection between x and
y!
4
Connection between variables
Fit a line!
5
What can you do with a fitted line?
  • Interpolation
  • Extrapolation (sometimes dangerous!)
  • Interpret the parameters of the line

6
How to define the line that fits best?
The sum of the squares of the errors
minimized Least squares method!
  • Note many other ways to fit the line can be
    imagined

7
How to compute the line fit with the least
squares method?
  • Let (x1, y1), (x2, y2),...,(xn, yn) denote the
    points in the plane.
  • Find a and b so that yabx fit the points by
    minimizing
  • Solution
  • where
    and all sums are done for i1,...,n.

8
How do you get this answer?
  • Differentiate S with respect to a og b, and set
    the result to 0
  • We get
  • This is two equations with two unknowns, and the
    solution of these give the answer.

9
Example
  • Some grasshoppers make sound by rubbing their
    wings against each other. There is a connection
    between the temperature and the frequency of the
    movements, unique for each species. Here are some
    data for Nemobius fasciatus fasciatus

If you measure 18 movements per sec, what is
estim. temperature?
Data from Pierce, GW. The Songs of Insects.
Cambridge, Mass. Harvard University Press, 1949,
pp. 12-21
10
Example (cont.)
  • Computation

Answer Estimated temperature
11
y against x ? x against y
  • Linear regression of y against x does not give
    the same result as the opposite.

Regression of y against x
Regression of x against y
12
Centered variables
  • Assume we subtract the average from both x- and
    y-values
  • We get and
  • We get and
  • From definitions of correlation and standard
    deviation se get
  • (even in uncentered case)
  • Note also The residuals sum to 0.

13
Anaylzing the variance
  • Define
  • SSE Error sum of squares
  • SSR Regression sum of squares
  • SST Total sum of squares
  • We can show that
  • SST SSR SSE
  • Define
  • R2 is the coefficient of determination

14
But how to answer questions like
  • Given that a positive slope (b) has been
    estimated Does it give a reproducible indication
    that there is a positive trend, or is it a result
    of random variation?
  • What is a confidence interval for the estimated
    slope?
  • What is the prediction, with uncertainty, at a
    new x value?

15
The standard simple regression model
  • We have to do as before, and define a model
  • where are independent, normally
    distributed, with equal variance
  • We can then use data to estimate the model
    parameters, and to make statements about their
    uncertainty

16
Confidence intervals for simple regression
  • In a simple regression model,
  • a estimates
  • b estimates
  • estimates
  • Also,
  • where estimates
    variance of b
  • So a confidence interval for is given by

17
Hypothesis testing for simple regression
  • Choose hypotheses
  • Test statistic
  • Reject H0 if or

18
Prediction from a simple regression model
  • A regression model can be used to predict the
    response at a new value xn1
  • The uncertainty in this prediction comes from two
    sources
  • The uncertainty in the regression line
  • The uncertainty of any response, given the
    regression line
  • A confidence interval for the prediction

19
Testing for correlation
  • It is also possible to test whether a sample
    correlation r is large enough to indicate a
    nonzero population correlation
  • Test statistic
  • Note The test only works for normal
    distributions and linear correlations Always
    also investigate scatter plot!

20
Influence of extreme observations
  • NOTE The result of a regression analysis is very
    much influenced by points with extreme values, in
    either the x or the y direction.
  • Always investigate visually, and determine if
    outliers are actually erroneous observations

21
Example Transformed variables
  • The relationship between variables may not be
    linear
  • Example The natural model may be
  • We want to find a and b so that the line
    approximates the points as well as possible

22
Example (cont.)
  • When then
  • Use standard formulas on the pairs
    (x1,log(y1)), (x2, log(y2)), ..., (xn, log(yn))
  • We get estimates for log(a) and b, and thus a and
    b

23
Another example of transformed variables
  • Another natural model may be
  • We get that
  • Use standard formulas on the pairs
  • (log(x1), log(y1)),
  • (log(x2), log(y2)), ...,(log(xn),log(yn))

Note In this model, the curve goes through (0,0)
24
More than one independent variable Multiple
regression
  • Assume we have data of the type
  • (x11, x12, x13, y1), (x21, x22, x23, y2), ...
  • We want to explain y from the x-values by
    fitting the following model
  • Just like before, one can produce formulas for
    a,b,c,d minimizing the sum of the squares of the
    errors.
  • x1,x2,x3 can be transformations of different
    variables, or transformations of the same variable

25
Multiple regression model
  • The errors are independent random (normal)
    variables with expectation zero and variance
  • The explanatory variables x1i, x2i, , xni cannot
    be linearily related

26
Use of multiple regression
  • Versions of multiple regression is the most used
    model in econometrics, and in health economics
  • It is a powerful tool to detect and verify
    connections between variables

27
Doing a regression analysis
  • Plot the data first, to investigate whether there
    is a natural relationship
  • Linear or transformed model?
  • Are there outliers which will unduly affect the
    result?
  • Fit a model. Different models with same number of
    parameters may be compared with R2
  • Make tests / confidence intervals for parameters

28
Interpretation
  • The parameters may have important interpretations
  • The model may be used for prediction at new
    values (caution Extrapolation can sometimes be
    dangerous!)
  • Remember that subjective choices have been made,
    and interpret cautiously
Write a Comment
User Comments (0)
About PowerShow.com