Title: Statistical tests and data fitting
1Statistical tests and data fitting
2tests
- T-test compare groups of data by comparing mean
values - Correlation and regression compare data sets
and find best fits - ANOVA Analysis of variance - evaluate
differences between data
3T-test in excel
- Can use tools, data analysis
- Tails
- Two-tailed is one mean greater and/or lesser?
- One-tailed one mean only greater
- Type
- Do data sets have the same variance?
- Type 1 (paired) gt same data, different time
- Type 2 (same variance)
- Type 3 (different variance)
4Results
P value Probability associated with a Students
t-test. Shows the probability that they are from
the same population. In this case, the data are
identical so the probability that they are from
the same population is 1.0 or 100
5(No Transcript)
6T-tests
- You sample the same beach before and after a
hurricane which one? - You compare sand grains in a suspects car with
sand grains from a beach? - You compare sand grains taken from the same beach.
7What are they?
- Correlation- tells how much two variables are
related - X and Y measured independently
- Line fitting derives a best-fitting model
between two variables. - Least squares (linear regression - straight line)
- Curved lines (polynomial or spline fit)
- Typically, for known X and measured Y (function
of time, etc)
8correlation
9Correlation coefficient
Varies between -1 and 1 1 is perfectly
anti-correlated 0 is no correlation 1 is
correlated
10correlation
Use correl function in Excel
correlation 0.98
correlation1
correlation-1
correlation0.01
11Confidence interval for correlation
- Possible to define a variable w
W has a normal distribution with a defined mean
and variance
12Use this mean and variance to set the normal
distribution
- Now can check confidence intervals
- Often useful to check confidence interval of the
null hypotheses (rxy0)
13Least squares line fitting(linear regression)
- For perfect linear correlation, it is
straightforward to define an equation so that - Need to determine the coefficients A and constant
B so that they define a straight line that fits
the data as well as possible - We are estimating the best value of A and B.
- We are assuming that the x value is known
exactly and that the y value is uncertain.
14Least squares fit
- Common to use a least-squares fit.
- The error between the best-fitting line and each
data point is (y-y) where y is the data and y
is the best fit (in a vertical distance). - We seek to minimize the sum of all the errors
squared. - Why squared? Well, it has some nice properties.
15Some details
regression line
Error between data and best-fit.
Y-intercept (jn this case, close to zero)
16More details
- We can think of the best fit line as a sort of
mean value. - The scatter is measured by the estimated standard
error. - This is analogous to the standard deviation.
17Confidence intervals
- 95 confidence interval for y (i.e., we are 95
sure that y lies between the values a and b is
defined by - (a,b) (y-k,yk) where k is
18Some problems
- Outliers tend to skew the line away from other
data. - Results in a poor fit.
- Line is weighted by the square of the vertical
distance between the data point and the trend. - One large offset counts more than several small
ones.
outliers
19Why square?
- Could use 3rd power
- Or just absolute value
- Also provide a straight line
- More complicated and less elegant mathematics.
- May be useful for some data
- Absolute value handles outliers better.
20Least-squares fit and Excel
- Three ways (at least) to make a least squares fit
to data in Excel. - Use linest(y,x,b,stats) and then plot.
- Allows calculation of statistics
- Powerful but complicated.
- Use regression in Analysis ToolPak add-in
- Make data plot (without line), then left click on
data point. Then add trend line much easier but
it is not clear how it does it.
21Excel output for regression
70 of the variance is explained.
If you use this line,you could be off by this
much. It is square root of MS.
Probability of how significant the fit is
The y intercept is -0.58833 and the constant (b)
is 0.99256 so the equation is Y -0.58833X
0.99156.
Upper and lower bound on coefficient.
22Fitting a curved line
- Suppose the data are exponential or something you
expect is curved. - Use a polynomial fit - click box under add
trendline - Spline fit
- Nonlinear least squares
23ANOVA and F-test
- Analysis of variance
- Does the variance of two or more datasets vary
significantly?
data