Title: Least Squares Regression
1Least Squares Regression
- Engineering Experimental Design
- Valerie L. Young
2In todays lecture . . .
- What is regression?
- What does least squares mean?
- MLR with Excel
- NLR with Matlab
- Linearization of NL equations
3Regression A set of statistical tools that can.
. .
- define a mathematical relationship (model)
between factors and a response. - NOT proof of any physical relationship (though
ideally terms in the model have physical
significance) - quantify the significance of each factors
correlation with the response. - estimate values for the constants in a model.
- indicate how well a particular model fits the
data.
4Models
- Every model consists of two parts
5Models
- Every model consists of two parts
- The predictable relationship between the
factor(s) and the response. - The random uncertainty.
6Models
- Every model consists of two parts
- The predictable relationship between the
factor(s) and the response. - The random uncertainty.
- The predictable relationship may be modeled as
- Linear
- Simple One factor and one response
- Multiple linear Multiple factors and one
response - Nonlinear
- The random uncertainty is usually modeled as a
normal distribution. - Always in this course
- More on this later in the course
7Examples of Models
- PAI ? xAl ?,
- where ? and ? are constants, xAl is the mass
fraction of aluminum, and PAI is the phosphate
adsorption index. - PAI ? xAl ? xFe ?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption
index. - PAI ? (xAl)? (xFe)?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption index.
8Examples of Models
Values dont change, regardless of the values of
PAI and xAl
- PAI ? xAl ?,
- where ? and ? are constants, xAl is the mass
fraction of aluminum, and PAI is the phosphate
adsorption index. - PAI ? xAl ? xFe ?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption
index. - PAI ? (xAl)? (xFe)?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption index.
Also called adjustable parameters.
9Examples of Models
Response?
- PAI ? xAl ?,
- where ? and ? are constants, xAl is the mass
fraction of aluminum, and PAI is the phosphate
adsorption index. - PAI ? xAl ? xFe ?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption
index. - PAI ? (xAl)? (xFe)?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption index.
Factor(s)?
10Examples of Models
Response
- PAI ? xAl ?,
- where ? and ? are constants, xAl is the mass
fraction of aluminum, and PAI is the phosphate
adsorption index. - PAI ? xAl ? xFe ?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption
index. - PAI ? (xAl)? (xFe)?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption index.
Factor
11What Kind of Model is This?(Simple Linear,
Multiple Linear, Nonlinear)
- PAI ? xAl ?,
- where ? and ? are constants, xAl is the mass
fraction of aluminum, and PAI is the phosphate
adsorption index. - PAI ? xAl ? xFe ?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption
index. - PAI ? (xAl)? (xFe)?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption index.
12What Kind of Model is This?(Simple Linear,
Multiple Linear, Nonlinear)
- PAI ? xAl ?,
- where ? and ? are constants, xAl is the mass
fraction of aluminum, and PAI is the phosphate
adsorption index. - PAI ? xAl ? xFe ?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption
index. - PAI ? (xAl)? (xFe)?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption index.
Simple Linear
13Examples of Models
Response?
- PAI ? xAl ?,
- where ? and ? are constants, xAl is the mass
fraction of aluminum, and PAI is the phosphate
adsorption index. - PAI ? xAl ? xFe ?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption
index. - PAI ? (xAl)? (xFe)?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption index.
Factor(s)?
14Examples of Models
Response
- PAI ? xAl ?,
- where ? and ? are constants, xAl is the mass
fraction of aluminum, and PAI is the phosphate
adsorption index. - PAI ? xAl ? xFe ?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption
index. - PAI ? (xAl)? (xFe)?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption index.
Factor(s)
15What Kind of Model is This?(Simple Linear,
Multiple Linear, Nonlinear)
- PAI ? xAl ?,
- where ? and ? are constants, xAl is the mass
fraction of aluminum, and PAI is the phosphate
adsorption index. - PAI ? xAl ? xFe ?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption
index. - PAI ? (xAl)? (xFe)?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption index.
16What Kind of Model is This?(Simple Linear,
Multiple Linear, Nonlinear)
- PAI ? xAl ?,
- where ? and ? are constants, xAl is the mass
fraction of aluminum, and PAI is the phosphate
adsorption index. - PAI ? xAl ? xFe ?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption
index. - PAI ? (xAl)? (xFe)?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption index.
Multiple Linear
17What Kind of Model is This?
- PAI ? xAl ?,
- where ? and ? are constants, xAl is the mass
fraction of aluminum, and PAI is the phosphate
adsorption index. - PAI ? xAl ? xFe ?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption
index. - PAI ? (xAl)? (xFe)?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption index.
18What Kind of Model is This?
- PAI ? xAl ?,
- where ? and ? are constants, xAl is the mass
fraction of aluminum, and PAI is the phosphate
adsorption index. - PAI ? xAl ? xFe ?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption
index. - PAI ? (xAl)? (xFe)?,
- where ?, ? and ? are constants, xAl is the mass
fraction of aluminum, xFe is the mass fraction of
iron, and PAI is the phosphate adsorption index.
Nonlinear
19Where is the Random Uncertainty?
- PAI ? xAl ? ?,
- PAI ? xAl ? xFe ? ?,
- PAI ? (xAl)? (xFe)? ?,
- Often, we just write down the predictable
relationship part of the model. The random
uncertainty part is understood to be there.
20What Will Regression Do?
- PAI ? xAl ? ?,
- PAI ? xAl ? xFe ? ?,
- PAI ? (xAl)? (xFe)? ?,
- Given a set of values for (xAl,xFe,PAI),
regression will - Calculate values for ?, ?, ? (constants,
adjustable parameters) - Determine how much of the variability in PAI is
accounted for by the predictable relationship
part of the model and how much is not (the
error). - Estimate uncertainties for ?, ?, ? (assumes the
error is random and normally distributed)
21What Does Least-Squares Mean?
- Least-squares regression finds the set of
values for ?, ?, and ? that minimizes the sum of
squared errors between the values of PAI
calculated using the model and the values of
PAI actually measured. - In other words
- Pick values for ?, ?, and ?
- For each data point (xAl,xFe,PAI), calculate
(PAImeasured ? xAl ? xFe ?). These
differences are the errors or residuals. - Square the errors and add them all up.
- Adjust ?, ?, and ? to minimize the sum of the
squared errors.
Adjustable parameters
22Is All Regression Least-Squares?
- There are other types of regression.
- Other types of regression minimize different
functions of the error. - Least-squares regression is the type most
commonly used. - In this course, we will ALWAYS use least-squares
regression.
23Warning About Least-Squares Regression
- Because the SQUARE of the error is used,
- one really weird point can pull the line far away
from most of the data. - the line might fit large values of the response
much better than it fits small values.
24Regression with Excel
- Excel can do simple linear and multiple linear
regression. - We did simple linear regression in the tutorial
the first week. - I will demonstrate multiple linear regression
next. - Excel cannot do non-linear regression.
25Adsorption of Phosphate on Soil
Proposed Model PAI ?(xAl) ?(xFe) ?
The proposed model is a linear equation, so we
will do multiple linear regression.
26MLR in Excel
- Download the Excel file MLR example.xls from
the ChE 408 homepage. - Tools gt Data Analysis gt Regression
- Select the PAI data (c6c18) as the y-range
- Select the two columns of extractable metal data
together (a6b18) as the x-range - Tick the boxes for residuals and plots
- OK
27Values of Coefficients
- ? (0.11 0.07) g soil / mg Al
- ? (0.35 0.16) g soil / mg Fe
- ? (-7 8)
- The intercept, ?, is not significantly different
from zero (at the 5 significance level) - Should we make it equal to zero?
- What is the physical significance of ? 0?
- Regression tells you math. YOU must think about
the physical system.
28Values of Coefficients
- ? (0.11 0.07) g soil / mg Al
- ? (0.35 0.16) g soil / mg Fe
- ? (-7 8)
- The intercept, ?, is not significantly different
from zero (at the 5 significance level) - Should we make it equal to zero?
- ? 0 means (physically) that if there is no Al
or Fe in the soil, there is no adsorbed phosphate - Regression tells you math. YOU must think about
the physical system.
29If you decide an adjustable parameter should be
zero . . .
- You must redo the regression without that term in
the model. - To set ?0, dont select xAl as an independent
variable (x-input in Excel-speak) - To set ?0, dont select xFe as an independent
variable. - To set ?0, tick constant is zero in the
regression dialog box.
30If you decide an adjustable parameter should be
zero . . .
- DANGER DANGER DANGER DANGER.
- If you tick constant is zero in the regression
dialog box, the way Excel calculates the error in
the resulting regression is WRONG.
31Nonlinear Regression Example
Cells (B1)e-(B2)m
32Matlab Code to Fit Exponential Model
Cells (B1)e-(B2)m
Adjustable parameters
Other stats
Residuals
Values for factor
m 6.8 8.2 2.5 4.6 6.7 3.1 0.8 7.4 5.2
7.4' Cells 14 5 88 32 12 66 197 6 17
7' coeff0 1,-1 expmodel
inline('B(1)exp(B(2)m)','B','m') coeff,res,J
nlinfit(m,Cells,expmodel,coeff0) exp_percent_res
res./Cells100 cl nlparci(coeff,res,J)
Values for response
Built-in function for nonlinear fit
Initial guesses for adjustable parameters
Model
Built-in function for confidence interval for
nonlinear adj. parameters
33NLR Results
- Cells (292 13 cells)e-(0.49 0.03/g)m
- Please, try this at home.
34Linearizing a Nonlinear Model
- In the old days, when computers were expensive or
non-existent, nonlinear regression was almost
impossible - You can often convert a nonlinear equation to a
linear one, then use linear regression - WARNING The values you get for the adjustable
parameters WILL be different, and putting
uncertainties on them may not be straightforward.
35Linearizing the Cell Model
- Cells (B1)e -(B2)m
- ln(Cells) ln(B1) B2(m)
- Now plot ln(Cells) vs. m
- Use linear regression to find
- ln(B1) intercept
- B2 slope