Title: Edpsy 511
1Edpsy 511
- Exploratory Data Analysis
- Homework 1 Due 9/20
2Landmarks in the data
- Quartiles
- Were often interested in the 25th, 50th and 75th
percentiles. - 39, 38, 38, 36, 36, 31, 29, 29, 28, 19
- Steps
- First, order the scores from least to greatest.
- Second, Add 1 to the sample size.
- Why?
- Third, Multiply sample size by percentile to find
location. - Q1 (10 1) .25
- Q2 (10 1) .50
- Q3 (10 1) .75
- If the value obtained is a fraction take the
average of the two adjacent X values.
3Box-and-Whiskers Plots (a.k.a., Boxplots)
4Shapes of Distributions
- Normal distribution
- Positive Skew
- Or right skewed
- Negative Skew
- Or left skewed
5How is this variable distributed?
6How is this variable distributed?
7How is this variable distributed?
8Descriptive Statistics
9Statistics vs. Parameters
- A parameter is a characteristic of a population.
- It is a numerical or graphic way to summarize
data obtained from the population - A statistic is a characteristic of a sample.
- It is a numerical or graphic way to summarize
data obtained from a sample
10Types of Numerical Data
- There are two fundamental types of numerical
data - Categorical data obtained by determining the
frequency of occurrences in each of several
categories - Quantitative data obtained by determining
placement on a scale that indicates amount or
degree
11Techniques for Summarizing Quantitative Data
- Frequency Distributions
- Histograms
- Stem and Leaf Plots
- Distribution curves
- Averages
- Variability
12Summary Measures
Summary Measures
Variation
Central Tendency
Quartile
Arithmetic Mean
Median
Mode
Range
Variance
Standard Deviation
13Measures of Central Tendency
Central Tendency
Average (Mean)
Median
Mode
14Mean (Arithmetic Mean)
- Mean (arithmetic mean) of data values
- Sample mean
- Population mean
Sample Size
Population Size
15Mean
- The most common measure of central tendency
- Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10 12
14
Mean 5
Mean 6
16Mean of Grouped Frequency
17Weighted Mean
- A form of mean obtained from groups of data in
which the different sizes of the groups are
accounted for or weighted.
18(No Transcript)
19Median
- Robust measure of central tendency
- Not affected by extreme values
-
-
- In an Ordered array, median is the middle
number - If n or N is odd, median is the middle number
- If n or N is even, median is the average of the
two middle numbers
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10 12
14
Median 5
Median 5
20Mode
- A measure of central tendency
- Value that occurs most often
- Not affected by extreme values
- Used for either numerical or categorical data
- There may may be no mode
- There may be several modes
0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11
12 13 14
No Mode
Mode 9
21The Normal Curve
22Different Distributions Compared
23Variability
- Refers to the extent to which the scores on a
quantitative variable in a distribution are
spread out. - The range represents the difference between the
highest and lowest scores in a distribution. - A five number summary reports the lowest, the
first quartile, the median, the third quartile,
and highest score. - Five number summaries are often portrayed
graphically by the use of box plots.
24Variance
- The Variance, s2, represents the amount of
variability of the data relative to their mean - As shown below, the variance is the average of
the squared deviations of the observations about
their mean
- The Variance, s2, is the sample variance, and is
used to estimate the actual population variance,
s 2
25Standard Deviation
- Considered the most useful index of variability.
- It is a single number that represents the spread
of a distribution. - If a distribution is normal, then the mean plus
or minus 3 SD will encompass about 99 of all
scores in the distribution.
26Calculation of the Variance and Standard
Deviation of a Distribution
Raw Score Mean X X (X X)2
85 54 31 961 80 54 26 676 70 54 16 256 60 54 6 36
55 54 1 1 50 54 -4 16 45 54 -9 81 40 54 -14 196 30
54 -24 576 25 54 -29 841
404.44
Standard deviation (SD)
27Comparing Standard Deviations
Data A
Mean 15.5 S 3.338
11 12 13 14 15 16 17 18
19 20 21
Data B
Mean 15.5 S .9258
11 12 13 14 15 16 17 18
19 20 21
Data C
Mean 15.5 S 4.57
11 12 13 14 15 16 17 18
19 20 21
28Facts about the Normal Distribution
- 50 of all the observations fall on each side of
the mean. - 68 of scores fall within 1 SD of the mean in a
normal distribution. - 27 of the observations fall between 1 and 2 SD
from the mean. - 99.7 of all scores fall within 3 SD of the mean.
- This is often referred to as the 68-95-99.7 rule
29Fifty Percent of All Scores in a Normal Curve
Fall on Each Side of the Mean
30Probabilities Under the Normal Curve
31Standard Scores
- Standard scores use a common scale to indicate
how an individual compares to other individuals
in a group. - The simplest form of a standard score is a Z
score. - A Z score expresses how far a raw score is from
the mean in standard deviation units. - Standard scores provide a better basis for
comparing performance on different measures than
do raw scores. - A Probability is a percent stated in decimal form
and refers to the likelihood of an event
occurring. - T scores are z scores expressed in a different
form (z score x 10 50).
32Probability Areas Between the Mean and Different
Z Scores
33Examples of Standard Scores