Title: Statistics: Data Presentation
1Statistics Data Presentation Analysis
2Overview
- Tables Graphs
- Populations Samples
- Mean, Median, Variance
- Error Bars
- Standard Deviation, Standard Error 95
Confidence Interval (CI) - Comparing Means of Two Populations
- Linear Regression (LR)
3Warning
- Statistics is a huge field, Ive simplified
considerably here. For example - Mean, Median, and Standard Deviation
- There are alternative formulas
- 95 Confidence Interval
- There are other ways to calculate CIs (e.g., z
statistic instead of t difference between two
means, rather than single mean) - Error Bars
- Dont go beyond the interpretations I give here!
- Comparing Means of Two Data Sets
- We just cover the t test for two means when the
variances are unknown but equal, there are other
tests - Linear Regression
- We only look at simple LR and only calculate the
intercept, slope and R2. There is much more to
LR!
4Tables
Table 1 Average Turbidity and Color of Water
Treated by Portable Water Filters
Consistent Format, Title, Units, Big
Fonts Differentiate Headings, Number Columns
5Figures
Consistent Format, Title, Units Good Axis Titles,
Big Fonts
11
Figure 1 Turbidity of Pond Water, Treated and
Untreated
6Populations and Samples
- Population
- All possible outcomes of experiment or
observation - US population
- Particular type of steel beam
- Sample
- Finite number of outcomes measured or
observations made - 1000 US citizens
- 5 beams
- Use samples to estimate population properties
- Mean, Variance
- E.g., Height of 1000 US citizens used to estimate
mean of US population
7Central Tendency
Mean xbar Sum of values divided by sample
size (1336810)/6 5.2 NTU
1 3 3 6 8 10
Median m Middle number Rank - 1 2 3
4 5 6 Number - 1 3 3 6 8 10
For even number of sample points, average middle
two (36)/2 4.5
Excel Mean AVERAGE Median - MEDIAN
8Variability
- Variance, s2
- sum of the square of the deviation about the mean
divided by degrees of freedom - s2 n(xi xbar)2/(n-1)
- Where xi a data point and n number of data
points - Example (cont.)
- s2 (1-5.2)2 (3-5.2)2 (3-5.2)2 6-5.2)2
(8-5.2)2 (10-5.2)2 /(6-1) 11.8 NTU2
Excel Variance VAR
9Error Bars
- Show data variability on plot of mean values
- Types of error bars include
- Max/min, Standard Deviation, Standard Error,
95 CI
10Standard Deviation, s
- Square-root of variance
- If phenomena follows Normal Distribution (bell
curve), 95 of population lies within 1.96
standard deviations of the mean - Error bar is s above below mean
Excel standard deviation STDEV
Standard Deviations from Mean
11Standard Error of Mean
- Also called St-Err or sxbar
- For sample of size n taken from population with
standard deviation estimated as s - As n ?, sxbar estimate?, i.e., estimate of
population mean improves - Error bar is St-Err above below mean
1295 Confidence Interval (CI) for Mean
- A 95 Confidence Interval is expected to contain
the population mean 95 of the time (i.e., of
95-CIs from 100 samples, 95 will contain pop
mean) - t95,n-1 is a statistic for 95 CI from sample of
size n - t95,n-1 TINV(0.05,n-1)
- If n ? 30, t95,n-1 1.96 (Normal Distribution)
- Error bar is above below
mean
13Using Error Bars to compare data
- Standard Deviation
- Demonstrates data variability, but no comparison
possible - Standard Error
- If bars overlap, any difference in means is not
statistically significant - If bars do not overlap, indicates nothing!
- 95 Confidence Interval
- If bars overlap, indicates nothing!
- If bars do not overlap, difference is
statistically significant - Well use 95 CI in this class
- Any time you have 3 or more data points,
determine mean, standard deviation, standard
error, and t95,n-1, then plot mean with error
bars showing the 95 confidence interval
14Adding Error Bars to an Excel Graph
- Create Graph
- Column, scatter,
- Select Data Series
- In Layout Tab-Analysis Group, select Error Bars
- Select More Error Bar Options
- Select Custom and Specify Values and select cells
containing the values
15Example 1 95 CI
16What can we do?
- Lift weight multiple times using different solar
panel combinations (or hyrdoturbines, or gear
boxes) and plot mean and 95 Confidence interval
error bars. - If error bars overlap between to different test
conditions, indicates nothing! - If error bars do not overlap, difference is
statistically significant
17T Test
- A more sophisticated way to compare means
- Use t test to determine if means of two
populations are different - E.g., lift times with different solar panel
combinations or turbines or
18Comparing Two Data Sets using the t test
- Example - You lift weight with two panels in
series and two in parallel. - Series Mean 2 min, s 0.5 min, n 20
- Parallel Mean 3 min, s 0.6 min, n 20
- You ask the question - Do the different panel
combinations result in different lift times? - Different in a statistically significant way
19Are the Lift Times Different?
- Use TTEST (Excel)
-
- Fractional probability of being wrong if you
claim the two populations are different - Well say they are significantly different if
probability is 0.05
20Marbles
21Linear Regression
- Fit the best straight line to a data set
Right-click on data point and select trendline.
Select options to show equation and R2.
22R2 - Coefficient of multiple Determination
- R2 n(yi - ybar)2 / n(yi - ybar)2
- yi Predicted y values, from regression equation
- yi Observed y values
- Ybar mean of y
- R2 fraction of variance explained by
regression - R2 1 if data lies along a straight line