NonLinear Regression - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

NonLinear Regression

Description:

... 7, 14, 21, 28, 42, 56, 70, 84 days after application. ... ( greater than the critical value from the chi-square distribution with m - 1 degrees of freedom. ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 72
Provided by: US524
Category:

less

Transcript and Presenter's Notes

Title: NonLinear Regression


1
Non-Linear Regression
  • Introduction
  •  
  • Previously we have fitted, by least squares, the
    General Linear model which were of the type
  •  
  • Y b0 b1X1 b2 X2 ... bpXp e
  •  

2
  • The Non-Linear regression model will generally be
    of the form
  •  
  • Y f(X1, X2, ..., Xp q1, q2, ... , qq) e
  •  
  • where the function (expression) f is known except
    for the q unknown parameters q1, q2, ... , qq.

3
Least Squares in the Nonlinear Case
4
  • Suppose that we have collected data on the Y,
  • (y1, y2, ...yn)
  • corresponding to n sets of values of the
    independent variables X1, X2, ... and Xp
  • (x11, x21, ..., xp1) ,
  • (x12, x22, ..., xp2),
  • ... and
  • (x12, x22, ..., xp2).

5
  • For a set of possible values q1, q2, ... , qq of
    the parameters, a measure of how well these
    values fit the model described in equation
    above is the residual sum of squares function
  • where
  • is the predicted value of the response
    variable yi from the values of the p independent
    variables x1i, x2i, ..., xpi using the model in
    equation and the values of the parameters q1,
    q2, ... , qq.

6
  • The Least squares estimates of q1, q2, ... , qq,
    are values
  • which minimize S(q1, q2, ... , qq).
  • It can be shown that the error terms are
    independent normally distributed with mean 0 and
    common variance s2 than the least squares
    estimates are also the maximum likelihood
    estimate of q1, q2, ... , qq).

7
Iterative Techniques for Estimating the
Parameters of a Nonlinear Model
  • 1) Steepest descent,
  • 2) Linearization, and
  • 3) Marquardt's procedure.

8
  • In each case a iterative procedure is used to
    find the least squares estimators .
  • That is an initial estimates,
  • ,for these values are determined.
  • The procedure than finds successfully better
    estimates,
  • that hopefully converge to the least squares
    estimates,

9
Steepest Descent
  • The steepest descent method focuses on
    determining the values of q1, q2, ... , qq that
    minimize the sum of squares function, S(q1, q2,
    ... , qq).
  • The basic idea is to determine from an initial
    point,
  • and the tangent plane to S(q1, q2, ... , qq) at
    this point, the vector along which the function
    S(q1, q2, ... , qq) will be decreasing at the
    fastest rate.
  • The method of steepest descent than moves from
    this initial point along the direction of
    steepest descent until the value of S(q1, q2, ...
    , qq) stops decreasing.

10
  • It uses this point,
  • as the next approximation to the value that
    minimizes S(q1, q2, ... , qq).
  • The procedure than continues until the successive
    approximation arrive at a point where the sum of
    squares function, S(q1, q2, ... , qq) is
    minimized.
  • At that point, the tangent plane to S(q1, q2, ...
    , qq) will be horizontal and there will be no
    direction of steepest descent.

11
  • The process may not converge.
  • Convergence may be slow.
  • The converged solution may not be the global
    minimum but a local minimum.
  • The performance of this procedure sometimes
    depends on the choiceof the initial starting
    values

12
  • Slow convergence is particularly likely when the
    S(q1, q2, ... , qq) contours are attenuated and
    banana-shaped (as they often are in practice),
    and it happens when the path of steepest descent
    zigzags slowly down a narrow valley, each
    iteration bringing only a slight reduction in
    S(q1, q2, ... , qq).

13
  • The steepest descent method is, on the whole,
    slightly less favored than the linearization
    method (described later) but will work
    satisfactorily for many nonlinear problems,
    especially if modifications are made to the basic
    technique.

14
Steepest Descent
Steepest descent path
Initial guess
15
Linearization
  • The linearization (or Taylor series) method uses
    the results of linear least squares in a
    succession of stages.
  • Suppose the postulated model is of the form
  • Y f(X1, X2, ..., Xp q1, q2, ... , qq) e
  • Let
  • be initial values for the parameters q1, q2, ...
    , qq.
  • These initial values may be intelligent guesses
    or preliminary estimates based on whatever
    information are available.

16
  • These initial values will, hopefully, be improved
    upon in the successive iterations to be described
    below.
  • The linearization method approximates f(X1, X2,
    ..., Xp q1, q2, ... , qq) with a linear function
    of q1, q2, ... , qq using a Taylor series
    expansion of f(X1, X2, ..., Xp q1, q2, ... , qq)
    about the point and curtailing the expansion at
    the first derivatives.
  • The method then uses the results of linear least
    squares to find values, that provide the least
    squares fit of of this linear function to the
    data .

17
  • The procedure is then repeated again until the
    successive approximations converge to hopefully
    at the least squares estimates

18
Linearization
Contours of RSS for linear approximation
2nd guess
Initial guess
19
3rd guess
Contours of RSS for linear approximation
2nd guess
Initial guess
20
4th guess
3rd guess
Contours of RSS for linear approximation
2nd guess
Initial guess
21
  • The linearization procedure has the following
    possible drawbacks 

1. It may converge very slowly that is, a very
large number of iterations may be required before
the solution stabilizes even though the sum of
squares S(qi) may decrease consistently as j
increases. This sort of behavior is not common
but can occur. 2. It may oscillate widely,
continually reversing direction, and often
increasing, as well as decreasing the sum of
squares. Nevertheless the solution may stabilize
eventually. 3. It may not converge at all, and
even diverge, so that the sum of squares
increases iteration after iteration without
bound.
22
  • Levenberg-Marquardt procedure.
  • Uses a combination of both steepest descent and
    Linearization.

23
Example
  • In this example a chemical has been applied to
    the soil in an agricultural field.
  • The concentration of the chemical(Y) is then
    measured 7, 14, 21, 28, 42, 56, 70, 84 days after
    application.
  • 6 measurements of Y are made for each time.

24
The data
25
Graph
26
the Model
a b
b
b
27
To perform non-linear regression select Analysis
-gt Regression -gt Nonlinear
28
The following dialog box appears
Select the dependent variable.Specify the
parameters and their starting valuesSpecify the
model
29
The Examination of Residuals
30
Introduction
  • Much can be learned by observing residuals.
  • This is true not only for linear regression
    models, but also for nonlinear regression models
    and analysis of variance models.
  • In fact, this is true for any situation where a
    model is fitted and measures of unexplained
    variation (in the form of a set of residuals) are
    available for examination.

31
  • Quite often models that are proposed initially
    for a set of data are incorrect to some extent.
  • An important part of the modeling process is
    diagnosing the flaws in these models.
  • Much of this can be done by carefully examining
    the residuals

32
  • The residuals are defined as the n differences

33
  • We can see from this definition that the
    residuals, ei, are the differences between what
    is actually observed, and what is predicted by
    model.
  • That is, the amount which the model has not been
    able to explain.

34
  • Many of the statistical procedures used in linear
    and nonlinear regression analysis are based
    certain assumptions about the random departures
    from the proposed model.
  • Namely the random departures are assumed
  • i) to have zero mean,
  • ii) to have a constant variance, s2,
  • iii) independent, and
  • iv) follow a normal distribution.

35
  • Thus if the fitted model is correct,
  • the residuals should exhibit tendencies that tend
    to confirm the above assumptions, or at least,
    should not exhibit a denial of the assumptions.

When examining the residuals one should ask
"Do the residuals make it appear that our
assumptions are wrong?"
36
  • After examination of the residuals we shall be
    able to conclude either

(1) the assumptions appear to be violated (in a
way that can be specified), or
(2) the assumptions do not appear to be violated.
37
  • Note that (2) , in the same spirit of hypothesis
    testing of does not mean that we are concluding
    that the assumptions are correct
  • it means merely that on the basis of the data we
    have seen, we have no reason to say that they are
    incorrect.
  • The methods for examining the residuals are
    sometimes graphical and sometimes statistical

38
  • The principal ways of plotting the residuals ei
    are

1. Overall.
2. In time sequence, if the order is known.
3. Against the fitted values
4. Against the independent variables xij for
each value of j
In addition to these basic plots, the residuals
should also be plotted 5. In any way that is
sensible for the particular problem under
consideration,
39
Overall Plot
  • The residuals can be plotted in an overall plot
    in several ways.

40
1. The scatter plot.
41
2. The histogram.
42
3. The box-whisker plot.
43
4. The kernel density plot
44
5. a normal plot or a half normal plot on
standard probability paper.
45
  • If our model is correct these residuals should
    (approximately) resemble observations from a
    normal distribution with zero mean.
  • Does our overall plot contradict this idea?
  • Does the plot exhibit appear abnormal for a
    sample of n observations from a normal
    distribution.
  • How can we tell?
  • With a little practice one can develop an
    excellent "feel" of how abnormal a plot should
    look before it can be said to appear to
    contradict the normality assumption.

46
  • The standard statistical test for testing
    Normality are

1. The Kolmogorov-Smirnov test.
2. The Chi-square goodness of fit test
47
  • The Kolmogorov-Smirnov test
  • The Kolmogorov-Smirnov uses the empirical
    cumulative distribution function as a tool for
    testing the goodness of fit of a distribution.
  • The empirical distribution function is defined
    below for n random observations

Fn(x) the proportion of observations in the
sample that are less than or equal to x.
48
  • Let F0(x) denote the hypothesized cumulative
    distribution function of the population (Normal
    population if we were testing normality)

If F0(x) truly represented distribution of
observations in the population than Fn(x) will be
close to F0(x) for all values of x.
49
  • The Kolmogorov-Smirinov test statistic is

the maximum distance between Fn(x) and F0(x).
  • If F0(x) does not provide a good fit to the
    distributions of the observation - Dn will be
    large.
  • Critical values for are given in many texts

50
  • The Chi-square goodness of fit test
  • The Chi-square test uses the histogram as a tool
    for testing the goodness of fit of a
    distribution.
  • Let fi denote the observed frequency in each of
    the class intervals of the histogram.
  • Let Ei denote the expected number of observation
    in each class interval assuming the hypothesized
    distribution.

51
  • The hypothesized distribution is rejected if the
    statistic
  • is large. (greater than the critical value from
    the chi-square distribution with m - 1 degrees of
    freedom.
  • m the number of class intervals used for
    constructing the histogram).

52
  • Note.

The in the above tests it is assumed that the
residuals are independent with a common variance
of s2.
This is not completely accurate for this reason
Although the theoretical random errors ei are all
assumed to be independent with the same variance
s2, the residuals are not independent and they
also do not have the same variance.
53
  • They will however be approximately independent
    with common variance if the sample size is large
    relative to the number of parameters in the model.

It is important to keep this in mind when judging
residuals when the number of observations is
close to the number of parameters in the model.
54
  • Time Sequence Plot

The residuals should exhibit a pattern of
independence.
If the data was collected in time there could be
a strong possibility that the random departures
from the model are autocorrelated.
55
  • Namely the random departures for observations
    that were taken at neighbouring points in time
    are autocorrelated.

This autocorrelation can sometimes be seen in a
time sequence plot.
The following three graphs show a sequence of
residuals that are respectively i) positively
autocorrelated , ii) independent and iii)
negatively autocorrelated.
56
i) Positively auto-correlated residuals
57
ii) Independent residuals
58
iii) Negatively auto-correlated residuals
59
  • There are several statistics and statistical
    tests that can also pick out autocorrelation
    amongst the residuals. The most common are

i) The Durbin Watson statistic
ii) The autocorrelation function
iii) The runs test
60
  • The Durbin Watson statistic

The Durbin-Watson statistic which is used
frequently to detect serial correlation is
defined by the following formula
If the residuals are serially correlated the
differences, ei - ei1, will be stochastically
small. Hence a small value of the Durbin-Watson
statistic will indicate positive autocorrelation.
Large values of the Durbin-Watson statistic on
the other hand will indicate negative
autocorrelation. Critical values for this
statistic, can be found in many statistical
textbooks.
61
  • The autocorrelation function

The autocorrelation function at lag k is defined
by
This statistic measures the correlation between
residuals the occur a distance k apart in time.
One would expect that residuals that are close in
time are more correlated than residuals that are
separated by a greater distance in time. If the
residuals are independent than rk should be close
to zero for all values of k A plot of rk versus k
can be very revealing with respect to the
independence of the residuals. Some typical
patterns of the autocorrelation function are
given below
62
  • This statistic measures the correlation between
    residuals the occur a distance k apart in time.

One would expect that residuals that are close
in time are more correlated than residuals that
are separated by a greater distance in time.
If the residuals are independent than rk should
be close to zero for all values of k A plot of rk
versus k can be very revealing with respect to
the independence of the residuals.
63
  • Some typical patterns of the autocorrelation
    function are given below

Auto correlation pattern for independent
residuals
64
  • Various Autocorrelation patterns for serially
    correlated residuals

65
(No Transcript)
66
  • The runs test

This test uses the fact that the residuals will
oscillate about zero at a normal rate if the
random departures are independent.
If the residuals oscillate slowly about zero,
this is an indication that there is a positive
autocorrelation amongst the residuals.
If the residuals oscillate at a frequent rate
about zero, this is an indication that there is a
negative autocorrelation amongst the residuals.
67
  • In the runs test, one observes the time
    sequence of the sign of the residuals

- - - - -
and counts the number of runs (i.e. the number of
periods that the residuals keep the same sign).
This should be low if the residuals are
positively correlated and high if negatively
correlated.
68
  • Plot Against fitted values and the Predictor
    Variables Xij

If we "step back" from this diagram and the
residuals behave in a manner consistent with the
assumptions of the model we obtain the impression
of a horizontal "band " of residuals which can be
represented by the diagram below.
69
  • Individual observations lying considerably
    outside of this band indicate that the
    observation may be and outlier.

An outlier is an observation that is not
following the normal pattern of the other
observations.
Such an observation can have a considerable
effect on the estimation of the parameters of a
model.
Sometimes the outlier has occurred because of a
typographical error. If this is the case and it
is detected than a correction can be made.
If the outlier occurs for other (and more
natural) reasons it may be appropriate to
construct a model that incorporates the
occurrence of outliers.
70
  • If our "step back" view of the residuals
    resembled any of those shown below we should
    conclude that assumptions about the model are
    incorrect. Each pattern may indicate that a
    different assumption may have to be made to
    explain the abnormal residual pattern.

b)
a)
71
  • Pattern a) indicates that the variance the random
    departures is not constant (homogeneous) but
    increases as the value along the horizontal axis
    increases (time, or one of the independent
    variables).

This indicates that a weighted least squares
analysis should be used.
The second pattern, b) indicates that the mean
value of the residuals is not zero.
This is usually because the model (linear or non
linear) has not been correctly specified.
Linear and quadratic terms have been omitted that
should have been included in the model.
Write a Comment
User Comments (0)
About PowerShow.com