Title: Transforming Relationships
 1Transforming Relationships
- AP Statistics 
- Practice of Statistics 
- Section 4.1
2What Youll Learn
- Recognize when the relationship between two 
 variables is either an exponential relationship
 or a power relationship
- Perform the appropriate transformation to 
 linearize the data, find the LSRL on the
 transformed points, untransform to find a model
 for the original data
3Not everything in Linear!
- Weve looked at several sets of data in which the 
 relationships are linear in nature
- What about those relationships that exhibit a 
 different nonlinear pattern?
- Consider for a moment gypsy moths. 
- An outbreak of gypsy moths in Massachusetts from 
 1978 to 1981 resulted in many acres of defoliated
 land. The acreages are listed in the following
 table.
4Gypsy Moths
- The data and graph depict the number of acres 
 defoliated by gypsy moths in Massachusetts
 between 1978 and 1981.
 Years 1978 1979 1980 1981
Acres of Defoliated land 63042 226260 907075 2826095 
 5- So, this doesnt look too bad! Lets try a 
 linear regression on the data, remembering to
 check both the correlation coefficient and the
 residual plot.
Simple Linear Regression Simple linear regression 
results Dependent Variable Acres Independent 
Variable Year Acres  -1.7746007E9  896997.4 
(Year) Sample size 4 R (correlation 
coefficient)  0.9136 R-sq  0.8347045 Estimate 
of error standard deviation 631139.44 
Well a visual of the line doesnt look too bad, 
and thats a great correlation coefficient. 
 (remember though, sometimes r is 
deceptive---be sure to check the residuals!) 
 6The Residuals
- A check of the residuals indicates that a linear 
 model is not appropriate! (Notice the parabolic
 pattern in the plot that even with only 4 data
 points can be seen!)
7 So, what type of relationship is this?
- Remember from linear regression that when the 
 relationship is linear, the response variable
 increases (or decreases) by a constant amount.
 Years Since 1977 1 2 3 4
Acres of defoliated land 63042 226260 907075 2826095
Difference in Acres 163218 680815 1919020
- Notice that the difference between number of 
 acres is not constant
- With this in mind and the problem with the 
 residual plot, lets consider another type of
 relationship.
8Exponential Relationships
- In an exponential relationship, the response 
 variable increases by a fixed percentage of the
 previous total. In other words, we should be able
 to multiply the previous value by some constant
 to get the next one.
- So, lets check out this possibility (we will 
 again disregard the increase from 1990-1993 and
 only look at the increases for 1-year intervals.
 Years Since 1977 1 2 3 4
Acres of defoliated land 63042 226260 907075 2826095
Ratio (Next/Prev) 3.5890 4.0090 3.1156
- Notice that although the ratio is not exactly the 
 same (we wouldnt expect it to be exact with
 real data) that there does appear to be a
 pretty consistent ratio value.
9So How Do We Create the Model?
- If the relationship is an exponential one, we can 
 use a mathematical transformation to linearize
 the data, find the LSRL of the transformed data,
 then untransform to find the model that will
 fit the original data.
- Ok, so lets take all of that step by step
10Finding the Model
- Step 1 Use a mathematical model to linearize 
 (create a new data set whose relationship is
 linear)
- If the original data is exponential, find the 
 logarithm (either common log or natural log) of
 each of the response values.
- When working with years it is also helpful to 
 code the year data so our calculators can
 handle the values (most computer programs are
 capable of creating models using the full year)
 To do this we will take each year and subtract
 1977 (this way all of our values are gt 0)
 Years 1978 1979 1980 1981
Acres of Defoliated land 63042 226260 907075 2826095
Years Since 1977 1 2 3 4
Log10 (acres) 4.7996 5.3546 5.9576 6.4512 
 11Finding the Model
- Now, lets check a scatterplot of the transformed 
 data
Notice the change in the pattern from our 
original data to the transformed data. The 
logarithm transformation really straightened our 
data. (Using the natural logarithm would have 
had the same effect, our values would have just 
been different) 
 12Finding the Model
- Step 2 Find the LSRL for the transformed data 
 (remember to check the r and the residuals!)
Simple Linear Regression Simple linear regression 
results Dependent Variable log10(Acres) 
Independent Variable Year-1977 log 10(Acres)  
4.2513404  0.5557706 (Year-1977) Sample size 4 
R (correlation coefficient)  0.9993 R-sq  
0.9985874 Estimate of error standard deviation 
0.033050213 
This model looks promising, but remember to check 
the residuals. 
 13Finding the Model
A check of the residuals confirms that an 
exponential model is appropriate. 
 14Untransforming to find the model for our 
original data
- Remember that our goal was to find a model that 
 we could use for prediction of the number of
 defoliated acres of land for a given year.
- The linear model we have would predict the common 
 logarithm of acres. In order for our model to be
 useful, we need to reverse the transformation to
 create the model that fits the original data.
- Although many transformations are easier to 
 untransform after evaluating, we can use the
 properties of logarithms with both exponential
 and power (well look at those next) to find the
 model for our original data.
15Properties of Logarithms
- Before we try to untransform, lets review the 
 properties of logarithms you learned in Algebra
 (yes, you really did learn these!)
- Logb xy  logb x  logb y (Addition rule) 
- Logb xm  mlogb x (Power rule) 
- Logb bn  n (Same base) 
- Logb(x/y)  logb x  logb y (Subtraction rule) 
- Since any subtraction can be changed to an 
 addition equation, we will not use this last rule
 much!
16Untransforming exponential expressions
- An exponential function takes the form 
- y  abx, where a, b are constants 
- (This is the form we want to end up with) 
- So, lets get started
log10 (Acres)  4.2513404  0.5557706 
(Year-1977) 10log10(Acres)  10 4.2513404  
0.5557706 (Year-1977) Acres  10 4.2513404 
(10.5557706(Year-1977)) Acres  17837.7634 
(3.5956(Year-1977))
Linear regression of the transformed data Raise 
both sides using power of 10 (same base) Same 
base law and multiplication law for 
exponents. Simplify the constants
This is now in the form of yabx, where 
a17837.7634 and b  3.5956 Notice that b is 
approximately the average of the ratios 
(next/prev) we calculated when we began looking 
for a model. 
 17So, does it fit our original data?
- Since our original goal was to find a model that 
 would allow us to predict the number of acres of
 defoliated land if we knew the year, we need to
 check to see if our model actually fits the data.
The model looks pretty good, but as with any 
model we need to use caution when predicting 
outside our original data range. 
 18Power Models
- Another important transformation used in modeling 
 is the power model.
- Power models have the form 
- Y  axb where a and b are constants 
- We can find an appropriate power model by taking 
 the logarithms for both the response and
 explanatory variables, finding the linear
 regression for the transformed data, then using
 the laws of logarithms and exponents to
 untransform
- Lets look at an example
19Fishing Tournament
- In a fishing tournament that you are in charge of 
 you need to find a way to record the weight of
 each fish caught without destroying or killing
 the fish.
- Since it is easier to measure the length of the 
 fish rather than its weight, we must find a way
 to convert the length to weight.
- The local marine research lab has been gracious 
 enough to provide you with the data for the
 average length and weight at different ages for
 Atlantic Ocean rockfish which model most fish
 species growing under normal feeding conditions.
20The Data
Age (yr) Length (cm) Weight (g)
1 5.2 2
2 8.5 8
3 11.5 21
4 14.3 38
5 16.8 69
6 19.2 117
7 21.3 148
8 23.3 190
9 25.0 264
10 26.7 293
11 28.2 318
12 29.6 371
13 30.8 455
14 32.0 504
15 33.0 518
16 34.0 537
17 34.9 651
18 36.4 719
19 37.1 726
20 37.7 810
- Since length is one dimensional and weight is 
 three dimensional we should be able to find a
 reasonable model using power model (the residuals
 for a regression on the original data confirms
 that the variables are NOT linearly relatedbut
 we already knew that!)
- As before we need to first transform our data but 
 we have to perform transformations on both length
 and weight
21Transforming the Data
Age (yr) Length (cm) Log 10 (length) Weight (g) Log10 (weight)
1 5.2 .7160 2 .3010
2 8.5 .9294 8 .9031
3 11.5 1.0607 21 1.3222
4 14.3 1.1553 38 1.5798
5 16.8 1.2253 69 1.8388
6 19.2 1.2833 117 2.0682
7 21.3 1.3284 148 2.1703
8 23.3 1.3674 190 2.2788
9 25.0 1.3979 264 2.4216
10 26.7 1.4265 293 2.4669
11 28.2 1.4502 318 2.5024
12 29.6 1.4713 371 2.5694
13 30.8 1.4886 455 2.6580
14 32.0 1.5052 504 2.7024
15 33.0 1.5315 518 2.7143
16 34.0 1.5428 537 2.7300
17 34.9 1.5611 651 2.8136
18 36.4 1.5694 719 2.8567
19 37.1 1.5763 726 2.8609
20 37.7 1.5763 810 2.9085
This scatterplot indicates that a linear 
regression on the logarithms of both variables is 
certainly one to consider. 
 22Linear Regression on the transformed data
- Simple linear regression results Dependent 
 Variable log10(Weight(g)) Independent Variable
 log10(Length(cm)) log10 (Weight(g))  -1.8993973
 3.049418 log10 (Length(cm)) Sample size 20 R
 (correlation coefficient)  0.9993 R-sq
 0.9985228
A check of the correlation coefficient is 
certainly promising (r.9993), the scatterplot of 
the transformed data indicates the line fits very 
well, and most importantly-----look at those 
residuals!!! Yes, statisticians get very excited 
when they see residuals that look that good! 
 23Untransforming a power model
- log10 (Weight(g))  -1.8993973  3.049418 log10 
 (Length(cm))
- 10log10(Weight(g))  10-1.8993973  3.049418 
 log10(length(cm))
- Weight  10-1.8993973 (103.049418log10(length(cm))
 )
- Weight  10-1.8993973(10log10(length(cm))3.049418)
 
- Weight  10-1.8993973(length(cm))3.049418) 
- Weight  .01261 (length(cm))3.049418
- Linear equation of the transformed data 
- Raise both sides using a base of 10 
- Same base and Multiplication law for exponents 
- Power rule for logarithms 
- Same base 
- Simplify constants
Last check plot the new model on the original 
data. Looks like weve got a model that will be 
very useful for estimating the weight of a fish 
if we know its length! 
 24Are there Other Possibilities?
- There are many other possibilities to transform 
 data in order to find a model.
- If either an exponential or power model is not 
 appropriate you may try
- Square the response or explanatory variable 
- Take the square root of either variable 
- Take the reciprocal of either variable 
- The possibilities are endless, but for now we 
 will concentrate mostly on either an exponential
 or power model.
25Transforming on the TI
- There are a couple of different ways to find both 
 an exponential and power regression model on your
 TI-calculator
- Using lists to transform 
- Using the built in regression models
26Using lists to transform
- Well use the Gypsy Moth data first.
Enter in lists 1  2 L1 years since 1977 
L2 acres of defoliated land Take the common log 
of the values in list 2 and put the new values in 
list 3 L3 log (L2) Now do a linear 
regression on lists 1  3 You can check residuals 
just like we did before to verify this 
regression. Now untransform as we did before to 
get the exponential
Note for a power model create another list for 
the logarithm of the explanatory variable and do 
the linear regression on these two lists. 
 27Using the Regression Models
- The TI family of calculators has both an 
 exponential and power model built into the stat
 calc menus.
- Create a list for the explanatory variable and 
 one for the response variable
- From the home screen 
- STAT 
- CALC 
- 0ExpReg 
-  (APwrReg) 
-  L1, L2 
- The model does not need untransforming 
- The residuals created are the residuals from the 
 linear transformation on the transformed data
 (yes, your calculator actually transforms the
 data, does a linear regression, then untransforms
28How to decide which model
- Creating mathematical models for real data 
 involves a lot of trial and error.
- One strategy 
- Try a linear model first ( residuals) 
- Then try an exponential model ( residuals) 
- Then try a power model ( residuals) 
- If all residuals show a pattern, you can continue 
 to try different transformations or choose the
 one with the best correlation
- Remember, no model is perfect, some models are 
 useful..we wish to find a useful model.