Title: STATISTICAL ANALYSIS.
1STATISTICAL ANALYSIS.
- Your introduction to statistics should not be
like drinking water from a fire hose!!
2What do you mean by data??
3Statistics 101!!
- Statistics
- Measures of locationmean vs. median and why
- Measures of scalerange, interquartile range,
standard deviation (and variance) - Measures of positionpercentiles, deciles,
quartiles, median - Note. For categorical variables, we use
proportions as the descriptive statistics
4Why does lack of normality cause problems?
- When we calculate the p-value for an inference
test, we find the probability that the sample was
different due to sampling variability.
Basically, we are trying to see if a recorded
value occurred by chance and chance alone. When
we look for a p-value, we are assuming that all
samples of the given sample size are normally
distributed around the mean. This is why the test
statistic, which is the number of standard
deviations away from the population mean the
sample mean is, is able to be used. Therefore,
without normality, no p-value can be found.
5There are non-parametric tests which are similar
to the parametric tests. The following table
shows how some of the tests match up.
6What is different about Non-Parametric Statistics?
- Sometimes statisticians use what is called
ordinal data. This data is obtained by taking
the raw data and giving each sample a rank.
These ranks are then used to create test
statistics. - In parametric statistics, one deals with the
median rather than the mean. Since a mean can be
easily influenced by outliers or skewness, and we
are not assuming normality, a mean no longer
makes sense. The median is another judge of
location, which makes more sense in a
non-parametric test. The median is considered
the center of a distribution.
7Drawing a histogram..the good the bad and the
downright ugly!!.
Many modern introductory texts and confuse
frequency graphs, relative frequency graphs, and
histograms.
Bad
Good
8What's the difference between a bar chart a
Histogram??
9Critical Values
- For a given number of degrees of freedom, by the
property of the t-distribution, we know how large
the t-statistic must be in order to reject the
null. - We call that number the critical value of the
t-statistic and is typically determined by the
values in a table of the t-statistic. - If the value of the t-statistic calculated from
the data is greater than this critical value,
then we reject the null hypothesis. - - This is because, for t-statistics greater than
this critical value, our probability of falsely
rejecting the null hypothesis is very small.
10Example
- Suppose our null hypothesis is that X is less
than 0. - The sample mean is 3
- The sample standard deviation is 2
- There are 121 observations.
- Step 1. We need to establish our critical
value. - We wish to reject the null hypothesis if we are
95 certain that it is false. For 121
observations and a one-tailed test, the
critical value is 1.66 (which we look up on the
table. This corresponds to a significance level
of .05 with 120 degrees of freedom). - Step 2. The t-statistic ( 3 0 ) / ( 2 / ?121
) ? 3 / .18 ? 16.7. - Step 3. Compare the t-statistic with the critical
value. If the t-statistic is greater than the
critical value, then you can reject the null
hypothesis. - In this case, 16.7 is greater than 1.66, so we
can reject the null hypothesis that X is less
than zero.
11Example
- The table to the right is a sample cross-tab
- Your research hypothesis is that dog ownership
and gender are related. - How do you test this hypothesis?
12Hypothesis Tests about tables
- Step 1. Define null and research hypotheses.
- The null hypothesis will usually be that there
is no relationship between the rows and the
columns. - Step 2. Determine your tolerance for falsely
rejecting the null hypothesis of no relationship. - Step 3. Empirically analyse the data to determine
if there is a relationship.
13Example
- To calculate independence
- 1) Identify the number of respondents in each
internal cell of the table - 2) Calculate the number of respondents who would
be in each cell if independent (corresponds to
the second number under each total) - e.g. cell1,1 .5 .15 1000 75
- cell1,2 .5 .85 1000 425
- 3) Compute the chi-squared test statistic (next
slide)
14The Chi-Square Test Statistic
- To calculate independence
- 3) Compute the chi-squared test statistic
- The chi-squared test statistic is simply
- ??2 ?rows?columns (Observedrow,column -
Expectedrow,column)2 - Expectedrow,column
- The chi-squared statistic follows a chi-squared
distribution with degrees of freedom (rows 1)
(columns 1).
15Example
- If we look at our table of the ??2 with 1 degrees
of freedom, the critical value for our test
statistic is 3.84. - ??2 (100 - 75)2 / 75
- (400-425)2 / 425
- (50- 75)2 / 75
- (450-425)2 / 425
- 19.6
- In this case, we reject the null hypothesis that
the two populations are statistically independent
because our test-statistic is greater than our
critical value.