Title: Statistics: Data Analysis and Presentation
1StatisticsData Analysis and Presentation
2Overview
- Tables and Graphs
- Populations and Samples
- Mean, Median, and Standard Deviation
- Standard Error 95 Confidence Interval (CI)
- Error Bars
- Comparing Means of Two Data Sets
- Linear Regression (LR)
3Warning
- Statistics is a huge field, Ive simplified
considerably here. For example - Mean, Median, and Standard Deviation
- There are alternative formulas
- Standard Error and the 95 Confidence Interval
- There are other ways to calculate CIs (e.g., z
statistic instead of t difference between two
means, rather than single mean) - Error Bars
- Dont go beyond the interpretations I give here!
- Comparing Means of Two Data Sets
- We just cover the t test for two means when the
variances are unknown but equal, there are other
tests - Linear Regression
- We only look at simple LR and only calculate the
intercept, slope and R2. There is much more to
LR!
4Tables
Table 1 Average Turbidity and Color of Water
Treated by Portable Water Filters
4 5 12
Consistent Format, Title, Units, Big
Fonts Differentiate Headings, Number Columns
5Figures
Consistent Format, Title, Units Good Axis Titles,
Big Fonts
11
Figure 1 Turbidity of Pond Water, Treated and
Untreated
6Populations and Samples
- Population
- All of the possible outcomes of experiment or
observation - US population
- Particular type of steel beam
- Sample
- A finite number of outcomes measured or
observations made - 1000 US citizens
- 5 beams
- We use samples to estimate population properties
- Mean, Variability (e.g. standard deviation),
Distribution - Height of 1000 US citizens used to estimate mean
of US population
7Mean and Median
- Turbidity of Treated Water (NTU)
Mean Sum of values divided by number of
samples (1336810)/6 5.2 NTU
1 3 3 6 8 10
Median The middle number Rank - 1 2 3
4 5 6 Number - 1 3 3 6 8 10
For even number of sample points, average middle
two (36)/2 4.5
Excel Mean AVERAGE Median - MEDIAN
8Variance
- Measure of variability
- sum of the square of the deviation about the mean
divided by degrees of freedom
n number of data points
Excel variance VAR
9Standard Deviation, s
- Square-root of the variance
- For phenomena following a Normal Distribution
(bell curve), 95 of population values lie within
1.96 standard deviations of the mean - Area under curve is probability of getting
value within specified range
Excel standard deviation STDEV
Standard Deviations from Mean
10Standard Error of Mean
- Standard deviation of mean
- Of sample of size n
- taken from population with standard deviation s
- Estimate of mean depends on sample selected
- As n ?, variance of mean estimate goes down,
i.e., estimate of population mean improves - As n ?, mean estimate distribution approaches
normal, regardless of population distribution
1195 Confidence Interval (CI) for Mean
- Interval within which we are 95 confident the
true mean lies - t95,n-1 is t-statistic for 95 CI if sample size
n - If n ? 30, let t95,n-1 1.96 (Normal
Distribution) - Otherwise, use Excel formula TINV(0.05,n-1)
- n number of data points
12Error Bars
- Show data variability on plot of mean values
- Types of error bars include
- Standard Deviation, Standard Error, 95 CI
- Maximum and minimum value
13Using Error Bars to compare data
- Standard Deviation
- Demonstrates data variability, but no comparison
possible - Standard Error
- If bars overlap, any difference in means is not
statistically significant - If bars do not overlap, indicates nothing!
- 95 Confidence Interval
- If bars overlap, indicates nothing!
- If bars do not overlap, difference is
statistically significant - Well use 95 CI
14Example 1
Create Bar Chart of Name vs Mean. Right click on
data. Select Format Data Series.
15Example 2
16What can we do?
- Plot mean water quality data for various filters
with error bars - Plot mean water quality over time with error bars
17Comparing Filter Performance
- Use t test to determine if the mean of two
populations are different. - Based on two data sets
- E.g., turbidity produced by two different filters
18Comparing Two Data Sets using the t test
- Example - You pump 20 gallons of water through
filter 1 and 2. After every gallon, you measure
the turbidity. - Filter 1 Mean 2 NTU, s 0.5 NTU, n 20
- Filter 2 Mean 3 NTU, s 0.6 NTU, n 20
- You ask the question - Do the Filters make water
with a different mean turbidity?
19Do the Filters make different water?
- Use TTEST (Excel)
-
- Fractional probability of being wrong if you
answer yes - We want probability to be small ? 0.01 to 0.10
(1 to 10 ). Use 0.01
20t test Questions
- Do two filters make different water?
- Take multiple measurements of a particular water
quality parameter for 2 filters - Do two filters treat difference amounts of water
between cleanings? - Measure amount of water filtered between
cleanings for two filters - Does the amount of water a filter treats between
cleaning differ after a certain amount of water
is treated? - For a single filter, measure the amount of water
treated between cleanings before and after a
certain total amount of water is treated
21Linear Regression
- Fit the best straight line to a data set
Right-click on data point and use trendline
option. Use options tab to get equation and R2.
22R2 - Coefficient of multiple Determination
yi Predicted y values, from regression
equation yi Observed y values
R2 fraction of variance explained by
regression (variance standard deviation
squared) 1 if data lies along a straight line