Title: Describing your Data: Central Tendency and Spread
1Lecture 72032006
- Describing your Data Central Tendency and Spread
2Describing your results
- Recall, different data types limit what you can
do with them - Nominal/categorical data simply allows frequency
counts with no implication of more or less - Ordinal transitive bigger than less than
without saying how much - Interval and Ratio allow arithmetic operations
(Ratio scales have an absolute zero)
3Describing the Data for a population or samples
from a population
- Frequency
- applicable to nominal, interval and ratio data
- The most representative score
- Mode (nominal, interval, ratio), Median (not
nominal or ordinal), Mean (not nominal or
ordinal) - Spread
- proportion, range, inter quartile range,
variance, standard deviation (not nominal or
ordinal) - Individual scores by location in the distribution
of scores (Z, Sten t)
4A bar graph showing the distribution of
personality types in a sample of college
students.
5(No Transcript)
6Memory scores for a sample of 16 participants.
The scores represent the number of sentences
recalled from each category.
7Hypothetical data showing the number of humorous
sentences and the number of nonhumorous sentences
recalled by participants in a memory experiment.
8An example of a typical frequency distribution
histogram. The same set of data is presented in a
frequency distribution table and in a histogram.
9An example of a frequency distribution histogram
for grouped data. The same set of data is
presented in a grouped frequency distribution
table and in a histogram.
Sometimes there is too much data for a simple
histogram so we have to create groups.
10A frequency distribution histogram and a stem and
leaf display showing the distribution of scores.
The stem and leaf display is placed on its side
to demonstrate that the display gives the same
information provided in the histogram.
11From Histogram to Frequency Polygon
Histogram
Polygon
12The population distribution of IQ scores an
example of a normal distribution (a frequency
polygon with an infinite number of intervals).
When the sample size is large and the number of
categories is infinite the data tend to look more
like this
13Measures of Central Tendency
With a symmetrical distribution Mean, Median and
Mode are all the same.
Mode Most frequent value
Median Middle value
Mean most representative value
14Examples of different shapes for distributions.
Mean Median
15Measures of Central Tendency
16Why is picking the correct Typical Score
important
- Description of typicality
- Representativeness when comparing samples
- Can be used to say by how much samples differ
(interval and ratio data) and this is important
in experimental work - But this is not always that easy
17In Experiment A, the variability within samples
is small and it is easy to see the 5-point mean
difference between the two samples.
BUT in Experiment B the variability is much
larger and differences between the samples less
easy to see
18Note Mean Difference Overlap in distributions Is
the difference due to the experiment or sampling
error Same Spread of data
What can be seen now? Same mean Distributions
spread differently
Control Grp
Exptl Grp
The two main ways in which we describe
distributions are
1. Describing central tendency
2. Describing how scores a distributed around the
central point
19Spread
- Range Highest to lowest score
- Inter-quartile range (25th -75th percentile)
- Deviation from the mean
- Average Squared Deviation from the Mean
(Variance) - Square root of the Average Squared Deviation from
the mean (Standard Deviation) gets us back to
describing variance in terms of the original
numbers
20Standardisation and Transformation
- Some times we want to locate scores in a
distribution to see how likely such a score would
be obtained. - Also we often want to know how scores compare in
different distributions using different scales.
(e.g., attitudes toward violence and number of
violent acts committed) - Standardisation lets us do this.
- Typically we transform scores to Zs although
other forms of standardisation exist (Sten, T and
even IQ)
21Frequency distribution for a population of N 16
scores. The first quartile is Q1 4.5. The third
quartile is Q3 8.0. The interquartile range is
3.5 points. Note that the third quartile divides
the two boxes at X 8 exactly in half, so that a
total of 4 boxes are above Q3 and 12 boxes are
below it.
Number of Scores is 16
25th percentile is Between score 45
75th percentile is at the Point where the top 4
scores are above
Because it deals with Location and not
value High or low scores Will not affect
the Inter-quartile range
22Notice that the distribution is centered around
36 and that most of the scores are within a
distance of 4 points from the mean, although some
scores are farther away.
The sum of the deviations about the mean is 0 so
the average deviation is 0 unless we ignore the
sign.
Another way of ignoring the sign is to square the
values and take the average squared deviation.
23Variance
- The sum of squared deviations divided by the
number of deviations from the mean gives us the
variance!
24Standard Deviation
SD Square root of the variance
25One standard deviation above or below the mean
cuts off an area under the normal curve that is
the same no matter the source of the data
-1.0
The scores in the distribution can be described
in std deviation units (Z scores). Note that as
we move to the extremes the probability of
drawing those numbers at random tends towards 0
26Area under the standard normal curve.
27(No Transcript)
28Measures of Variation
29Raw scores, sample mean, and standard deviation
for English and psychology exams.
English z.62 Psychology z .88
30A portion of the Standard Normal Curve
Table(continued on next slide)
31The proportion of the scores in either half of
the distribution is .5 (50)
32Standard normal curve with z-scores of 1.0 and
1.5 indicated.
As the tables give the proportion of the score
between the mean and a Z score we can easily
calculate the proportion of scores that fall in
any part of the distribution.
33Proportion of scores between a z-score of 1.0
and 1.5.
34The genius part
- As we can locate scores from sample in a normal
distribution, we can locate differences in scores
between samples in a distribution in the same
way. - This is one of the pillars upon which inferential
statistics is built.