Title: Using SPSS to calculate summary statistics
1Using SPSS to calculate summary statistics
2SPSS - summary statistics
- In the data view
- Enter the scores from problem 4, page 66
- Analyze-gtDescriptive Statistics-gtFrequencies
- Select VAR00001 for analysis
- Click Statistics
- Choose the Central Tendencies you want
- Continue
- OK
- Use SPSS to check your work on problem 6 (except
for SS and mean deviation). - Do you get the same answers?
3Chapter 3B
More on summary statistics
4Other ways to look at the mean and the variance
- Weighted average (mean)
- Efficient formula for the variance
5The weighted mean
- Suppose that one had repeating scores in a sample
to be averaged - Example - a students grades in a 4.0 grading
system - 2,3,3,3,3,3,4,4
- One could find the mean directly, but this would
be somewhat redundant - Instead, group like score together and weight
each by the frequency of each score
6The weighted mean
- An even better motivation for using the weighted
mean is that one might want to combine means from
different studies - We cant just average the means
- The means from larger samples must have a greater
weight - How much weight?
- Answer the sample size
- Suppose we have 3 studies with means
- And sample sizes
- The resulting combined mean is then
7Computationally efficient formula for the
variance
- Not absolutely necessary
- But, saves time in hand calculations
- Is used by the author
- Most of the work is in computing SS.
- We will focus there.
8Deriving the shortcut SS
- Sum of squares is the numerator of the variance.
- We will often have reason to handle it
separately.
9Deriving the shortcut SS
- Start with
- .
- .
- .
- .
- .
- End with
10Using the shortcutSS to find the variance and
standard deviation
- Recall that the variance is SS/N for the
population - And the unbiased variance is SS/(N-1) for the
sample - And the standard deviation is just the square
root of the variance - When calculating for the sample, we will almost
always be using the unbiased expression
11Properties of the meanDevelop your intuition
about the mean
- If a constant C is added to every score in a
sample, the new mean is C the original mean - If every score is multiplied by a constant C,
then the new mean will be C the original mean - This can be deduced by the properties of
summation - The sum of the deviations from the mean will
always be zero - The sum of the squared deviations from the mean
(SS) will be less than the sum of the squared
deviations around any other point in the
distribution - Called least squared property
- Important when fitting a straight line to a cloud
of points
12Properties of the standard deviationDevelop your
intuition about standard deviation
- If a constant C is added to every score in a
sample, the new standard deviation is the same as
the original mean - If every score is multiplied by a constant C,
then the new standard deviation will be C the
original standard deviation - This can be deduced by the properties of
summation - The standard deviation from the mean will be
smaller than the standard deviation from any
other point in the distribution - Follows from the related property of the mean
13Exercises
14Chapter 4
- Standardized scores and the normal distribution
15Evaluating a single score within a distribution
- The numerical distance between a score and the
mean may not be very meaningful - However, if that distance can be expressed in
units of standard deviation, then we have a much
better understanding of the relationship between
the score and the rest of the data
16Z scores
- Such a scaled distance is called the z score
- We can replace our X scores with the z scores
- Computed like so.
17Properties of the z scores
- Mean of the z scores is 0.
- From the properties of the mean
- Subtracting a constant from each score shifts the
mean by the constant - Multiplying each score by a constant multiplies
the mean by that constant.
18This gives us a simplified formula for the
standard deviation
19The standard deviation of a population of z
scores is always 1!
- From properties of standard deviation
- Adding a constant to each score does not change
the standard deviation - Multiplying each score by a constant multiplies
the standard deviation by that constant.
20Normal distributions, standard normal
distributions and z scores revisited
- What means are zero?
- Only the mean of a standardized distribution is
guaranteed to be zero - Normal distributions can have non-zero means
- What is the purpose of a z score?
- To permit the score to be inserted into a
standard normal distribution for comparison - Why do we want to insert our score into a
standard normal distribution? - If we dont use the standard distribution, we
need lots of normal distribution tables (an
infinite number, actually) - What would happen if we had a distribution of z
scores? - It would be the standard normal distribution
21Limitations of z scores
- By translating the set of scores by the mean
- And scaling by the standard deviation
- We can compare two different distributions
- But not if they are skewed differently
22The normal distribution
- Comparing scores in distributions from the same
family overcomes this problem - The normal distribution is such a family of
distributions - Why is it a family of distributions?
23The standard normal distribution
- Generic parameters
- Mean set to zero
- Standard deviation set to 1
24Probability that a score falls between any two
values
- Recall that a probability distribution represents
the relative probability of various scores - And that the total area under a probability
distribution is 1 - Hence the probability that a score falls in any
interval is the area under the corresponding
part of the curve - The table of the standard normal deviation
contains the cumulative area under the curve,
from the mean outward
25Distribution of sample means
- What if we sample means rather than individuals?
- Take a sample
- Find the mean
- Use that as a score
- Form a distribution of such scores
- This is called a Distribution of sample means
26Properties of the distribution of sample means
- If the underlying distribution is normal, the
sampling distribution will be normal - The mean of the sampling distribution will tend
to be the same as the mean of the population (as
the number of means approaches infinity) - Groups vary less than individuals
- Therefore the standard deviation of the sampling
distribution is less than the standard deviation
of the population - This value is called the standard error
27Why do we care about the distribution of sample
means?
- Because when we take the mean of a sample, we
want to know how good of an estimate it is of the
population mean - If the means vary a lot from sample to sample,
then the estimate is not very good - If the means vary little, then the estimate is
good - We may never actually do repeated sampling, we
just wanted to come up with this equation(!),
which tells us how the sample size improves our
estimation of the mean
28Exercises
29Exercises