Title: Least Squares Regression
1Least Squares Regression
- Engineering Experimental Design
- Valerie L. Young
2In todays lecture . . .
- What is regression?
- What does least squares mean?
- MLR with Excel
- NLR with Matlab
- Linearization of NL equations
3Regression A set of statistical tools that can.
. .
- define a mathematical relationship (model)
between factors and a response. - NOT proof of any physical relationship (though
ideally terms in the model have physical
significance) - quantify the significance of each factors
correlation with the response. - estimate values for the constants in a model.
- indicate how well a particular model fits the
data.
4Models
- Every model consists of two parts
5Models
- Every model consists of two parts
- The predictable relationship may be modeled as
- Linear
- Simple One factor and one response
- Multiple linear Multiple factors and one
response - Nonlinear
- The random uncertainty is usually modeled as a
normal distribution. - Always in this course
- More on this later in the course
6Examples of Models
- PAI ? xAl ?,
- where ? and ? are constants, xAl is the mass
fraction of aluminum, and PAI is the phosphate
adsorption index. - PAI ? xAl ? xFe ?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption
index. - PAI ? (xAl)? (xFe)?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption index.
7What Kind of Model is This?
- PAI ? xAl ?,
- where ? and ? are constants, xAl is the mass
fraction of aluminum, and PAI is the phosphate
adsorption index. - PAI ? xAl ? xFe ?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption
index. - PAI ? (xAl)? (xFe)?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption index.
8Where is the Random Uncertainty?
- PAI ? xAl ?,
- PAI ? xAl ? xFe ?,
- PAI ? (xAl)? (xFe)?,
- Often, we just write down the predictable
relationship part of the model. The random
uncertainty part is understood to be there.
9What Will Regression Do?
- PAI ? xAl ? ?,
- PAI ? xAl ? xFe ? ?,
- PAI ? (xAl)? (xFe)? ?,
- Given a set of values for (xAl,xFe,PAI),
regression will - Calculate values for ?, ?, ? (constants,
adjustable parameters) - Determine how much of the variability in PAI is
accounted for by the predictable relationship
part of the model and how much is not (the
error). - Estimate uncertainties for ?, ?, ? (assumes the
error is random and normally distributed)
10What Does Least-Squares Mean?
- Least-squares regression finds the set of
values for ?, ?, and ? that minimizes the sum of
squared errors between the values of PAI
calculated using the model and the values of
PAI actually measured. - In other words
- Pick values for ?, ?, and ?
- For each data point (xAl,xFe,PAI), calculate
(PAImeasured ? xAl ? xFe ?). These
differences are the errors or residuals. - Square the errors and add them all up.
- Adjust ?, ?, and ? to minimize the sum of the
squared errors.
11Is All Regression Least-Squares?
- There are other types of regression.
- Other types of regression minimize different
functions of the error. - Least-squares regression is the type most
commonly used. - In this course, we will ALWAYS use least-squares
regression.
12Warning About Least-Squares Regression
- Because the SQUARE of the error is used,
- one really weird point can pull the line far away
from most of the data. - the line might fit large values of the response
much better than it fits small values.
13Regression with Excel
- Excel can do simple linear and multiple linear
regression. - We did simple linear regression in the tutorial
the first week. - I will demonstrate multiple linear regression
next. - Excel cannot do non-linear regression.
14Adsorption of Phosphate on Soil
Proposed Model PAI ?(xAl) ?(xFe) ?
The proposed model is a linear equation, so we
will do multiple linear regression.
15MLR in Excel
- Download the Excel file MLR example.xls from
the ChE 408 homepage. - Tools gt Data Analysis gt Regression
- Select the PAI data (c6c18) as the y-range
- Select the two columns of extractable metal data
together (a6b18) as the x-range - Tick the boxes for residuals and plots
- OK
16Values of Coefficients
- ? (0.11 0.07) g soil / mg Al
- ? (0.35 0.16) g soil / mg Fe
- ? (-7 8)
- The intercept, ?, is not significantly different
from zero (at the 5 significance level) - Should we make it equal to zero?
- What is the physical significance of ? 0?
- Regression tells you math. YOU must think about
the physical system.
17If you decide an adjustable parameter should be
zero . . .
- You must redo the regression without that term in
the model. - To set ?0, dont select xAl as an independent
variable (x-input in Excel-speak) - To set ?0, dont select xFe as an independent
variable. - To set ?0, tick constant is zero in the
regression dialog box.
18If you decide an adjustable parameter should be
zero . . .
- DANGER DANGER DANGER DANGER
- If you tick constant is zero in the regression
dialog box, then the way Excel calculates the
error in the regression is WRONG.
19Nonlinear Regression Example
Cells (B1)e-(B2)m
20Matlab Code to Fit Exponential Model
Cells (B1)e-(B2)m
m 6.8 8.2 2.5 4.6 6.7 3.1 0.8 7.4 5.2
7.4' Cells 14 5 88 32 12 66 197 6 17
7' coeff0 1,-1 expmodel
inline('B(1)exp(B(2)m)','B','m') coeff,res,J
nlinfit(m,Cells,expmodel,coeff0) exp_percent_res
res./Cells100 cl nlparci(coeff,res,J)
21NLR Results
- Cells (292 13 cells)e-(0.49 0.03/g)m
- Please, try this at home.
22Linearizing the Nonlinear Model
- In the old days, when computers were expensive or
non-existent, nonlinear regression was almost
impossible - You can often convert a nonlinear equation to a
linear one, then use linear regression - WARNING The values you get for the adjustable
parameters WILL be different, and putting
uncertainties on them may not be straightforward.
23Linearizing the Cell Model
- Cells (B1)e -(B2)m
- ln(Cells) ln(B1) B2(m)
- Now plot ln(Cells) vs. m
- Use linear regression to find
- ln(B1) intercept
- B2 slope