Title: Descriptive Statistics
1Descriptive Statistics
- Measures of Central Tendency
2Why? What? And How
- Remember, data reduction is key
- Are the scores generally high or generally low?
- Where the center of the distribution tends to be
located - Three measures of central tendency
- Mode
- Median
- Mean
- Which one you report is related to the scale of
measurement and the shape of the distribution
3Mode
- The most frequently occurring score
- Look at the simple frequency of each score
- Unimodal or bimodal
- Report mode when using nominal scale, the most
frequently occurring category - If you have a rectangular distribution do not
report the mode
4Median
- Score at the 50th percentile (Mdn)
- If normal distribution the mdn is the same as the
mode - Arrange scores from lowest to highest, if odd
number of scores the mdn is the one in the
middle, if even number of scores then average the
two scores in the middle - Used when have ordinal scale and when the
distribution is skewed
5Mean
- Score at the exact mathematical center of
distribution (average) - M ?X/N
- Used with interval and ratio scales, and when
have a symmetrical and unimodal distribution - Not accurate when distribution is skewed because
it is pulled towards the tail
6Deviations around the Mean
- The score minus the mean
- Include plus or minus sign
- Sum of deviations of the mean always equals zero
- ?(X-M)
7Uses of the Mean
- Describes scores
- Deviation of mean gives us the error of our
estimate of the score, with total error equal to
zero - Predict scores
- Describe a scores location
- Describe the population mean (?) which is a
parameter - Typically estimate ?
8Summarizing Results
- Used in all research methods including
observational, survey, correlational, and
experimental - Compute the mean of the dependent variable for
each of the conditions or levels of the
independent variable - Mean dependent score changes as function of
changes in the IV - Graphing the results using line or bar graphs
9Measures of Variability
- Extent to which the scores differ from each other
or how spread out the scores are - Tells us how accurately the measure of central
tendency describes the distribution - Shape of the distribution
10Why do we care about variability?
Where would you rather vacation, Gulfside
Bungalows, where the mean temperature is 70
degrees, or Kalahari Condos where the mean
temperature is 70 degrees?
Gulfside temperature range day 72 night
68 Kalahari temperature range day
110 night 30 Also variability in terms of
the range of temperature at each of these places
over the years that temperature has been
documented
11Range
- Can report the lowest and highest value
- Or report the maximum difference between the
lowest and highest - Semi-interquartile range used with the median
one half the distance between the scores at the
25th and 75th percentile
12Variance and Standard Deviation
- Definitional and computational formulas (remember
order of operations) - Again, most psychological research uses interval
and ratio scales of measurement and assume a
normal distribution - Goal is to assess the average or typical amount
the scores differ from the mean - Biased estimates of the population variance
13Sample Variance
- Uses the deviation from the mean
- Remember, the sum of the deviations always equals
zero, so you have to square each of the
deviations - S2X sum of squared deviations divided by the
number of scores (p. 107 and 108) - Provides information about the relative
variability
14Some Limits
- It isnt the average deviation
- Interpretation doesnt make sense because
- Number is too large
- And it is a squared value
15So, Standard Deviation
- Take the square root of the variance
- P.109 and 110
- SX
- Uses the same units of measurement as the raw
scores - How much scores deviate below and above the mean
16The standard deviation
What is a standard deviation (in English)?
the mean of deviations from the mean (sort of)
What is
s
(lowercase sigma) is the population standard
deviation.
S
the sample standard deviation
(s-hat) is the sample estimate of s
17The deviation (definitional) formula for the
population standard deviation
- The larger the standard deviation the more
variability there is in the scores - The standard deviation is somewhat less
sensitive to extreme outliers than the range (as
N increases)
18The deviation (definitional) formula for the
sample standard deviation
Whats the difference between this formula and
the population standard deviation?
In the first case, all the Xs represent the
entire population. In the second case, the Xs
represent a sample.
19Standard Deviation Example
26.8
106.8
20Calculating S using the raw-score formula
To calculate SX2 you square all the scores first
and then sum them To calculate (SX)2 you sum
all the scores first and then square them
21The raw-score formula example
?X 134
?X2 3698
22Estimating the population standard deviation
from a sample
S, the sample standard, is usually a little
smaller than the population standard deviation.
Why?
The sample mean minimizes the sum of squared
deviations (SS). Therefore, if the sample mean
differs at all from the population mean, then the
SS from the sample will be an understimate of the
SS from the population
Therefore, statisticians alter the formula of the
sample standard deviation by subtracting 1 from N
23Population Variance and Standard Deviation
- When we have data from the entire population we
use ? to compute ?X using the same formulas (p.
115) - We usually need to estimate
- Variance and standard deviations of the sample
are biased estimates of the population - Limited in terms of how free the scores can vary
- Not all of the deviations in the sample are free
to be random
24Estimates of Population Variability
- P. 117, 118, and 119
- Symbol s2X and sX or s-hat -- estimations
- Correction factor N-1
- Not all of the deviations in the sample are free
to be random - Degrees of freedom df
- With M 6 and scores of 1,5,7,and 9, then the
only possibility is for the score to be 8 - More accurate estimate of population variability
25Formulas for s-hat (estimate)
Definitional formula
Raw-score formula
26The Estimate of the Variance
Remember what the variance is..
The standard deviation squares, or the number
that you took the square root of to get the
standard deviation
The variance is not a very useful descriptive
statistic, but it is very important value you
will use in other techniques (e.g., the analysis
of variance or ANOVA)
27Sum up
- Assuming a normal distribution
- Sample mean is a good estimate of population mean
- The estimate of the population variance and
standard deviation tells us how spread out the
scores are - 68 of the scores are within 1 and 1 sX
28Application to Normal Distribution
- Knowing the standard deviation you can describe
your sample more accurately - Look at the inflection points of the distribution
29(No Transcript)
30(No Transcript)
31Transformations
- Adding or subtracting just shifts the
distribution, without changing the variation
(variance) - Multiplying or dividing changes the variability,
but it is a multiple of the transformation
32Variance is Error in Predictions
- The larger the variability, the larger the
differences between the mean and the scores, so
the larger the error when we use the mean to
predict the scores - Error or error variance average error between
the predicted mean score and the actual raw
scores - Same for the population estimate of population
variance
33Summarizing Research Using Variability
- Remember, the standard deviation is most often
the measure of variability reported - The more consistent the scores are (i.e., the
smaller the variance), the stronger the
relationship
34Proportion of Variance Accounted For
- Objective approach compute proportion of
variance accounted for - Can compute the overall mean and standard
deviation, not taking into consideration the
relationship with the levels of the IV - It is the largest error we would accept
- When look at relationship we compute variance for
each condition and average
35Computation
- Subtract the average error from the each of the
conditions from the error of the total sample - Divide that difference into the error from the
total sample - Gives proportion of error accounted for by the
levels of the IV
36Thus, .
- Proportional improvement in predictions by using
a relationship - The stronger and more consistent the
relationship, the greater proportion of variance
we can account for