Title: Descriptive statistics
1Descriptive statistics
for one variable
2What to describe?
- What is the location or center of the data?
(measures of location) - How do the data vary? (measures of
variability).
3Types of statistics
- Descriptive Statistics
- Gives numerical and graphic procedures to
summarize a collection of data in a clear and
understandable way
- Inferential Statistics
- Provides procedures to draw inferences about
a population from a sample
4Reasons for using statistics
- aid in summarization
- aid in getting at whats going on
- aid in extracting information from the data
- aid in communication
5(No Transcript)
6Frequency distribution
- The frequency with which observations are
assigned to each category or point on a
measurement scale. - Most basic form of descriptive statistic
- May be expressed as a percentage of the total
sample found in each category
Source Reasoning with Statistics, by Frederick
Williams Peter Monge, fifth edition, Harcourt
College Publishers.
7Frequency distribution
- The distribution is read differently depending
upon the measurement level - Nominal scales are read as discrete measurements
at each level (no ordering) - Ordinal measures show tendencies, but categories
should not be compared (ordering exists, but not
distance) - Interval (distance exists, but no ratios) and
ratio scales (ratios exist) all for comparison
among categories
8- Sex N Mean Median TrMean StDev SE
Mean - female 126 91.23 90.00 90.83 11.32
1.01 - male 100 96.79 110.00 105.62 17.39
1.74 -
- Minimum Maximum Q1 Q3
- female 65.00 120.00 85.00
98.25 - male 75.00 162.00 95.00
118.75
9(No Transcript)
10(No Transcript)
11Source Protecting Children from Harmful
Television TV Ratings and the V-chip Amy I.
Nathanson, PhD Lecturer, University of California
at Santa Barbara Joanne Cantor, PhD Professor,
Communication Arts, University of
Wisconsin-Madison
12Source http//www.elonka.com/kryptos/ Web page
on cryptography
13Ancestry of US residents
14Source UCLA International Institute
15(No Transcript)
16Source Cornell University website
17Source www.cit.cornell.edu/computer/students/band
width/charts.html
18Source www.cit.cornell.edu/computer/students/band
width/charts.html
19Source Verisign
20Search engine use
21The percentage of online searches done by US home
and work web surfers in July 2006
22NY Times
23Source Verisign
24Old Faithful Geyser
25- Duration in seconds of 272 eruptions of the Old
Faithful geyser. - library(datasets)
- gt faithful110,
- eruptions waiting
- 1 3.600 79
- 2 1.800 54
- 3 3.333 74
- 4 2.283 62
- 5 4.533 85
- 6 2.883 55
- 7 4.700 88
- 8 3.600 85
- 9 1.950 51
- 10 4.350 85
-
gt summary(faithful) eruptions waiting
Min. 1.600 Min. 43.0 1st
Qu. 2.163 1st Qu. 58.0 Median 4.000
Median 76.0 Mean 3.488 Mean
70.9 3rd Qu. 4.454 3rd Qu. 82.0
Max. 5.100 Max. 96.0
26(No Transcript)
27(No Transcript)
28(No Transcript)
29Normal distribution
- Many characteristics are distributed through the
population in a normal manner - Normal curves have well-defined statistical
properties - Parametric statistics are based on the assumption
that the variables are distributed normally - Most commonly used statistics
- This is the famous Bell curve where many cases
fall near the middle of the distribution and few
fall very high or very low - I.Q.
30Statistical properties of the normal distribution
31(No Transcript)
32I.Q. distribution
33(No Transcript)
34Measures of central tendency
- Mode (Mo) the most frequent score in a
distribution - good for nominal data
- Median (Md) the midpoint or midscore in a
distribution. - (50 cases above/50 cases below)
- insensitive to extreme cases
- --Interval or ratio
Source Reasoning with Statistics, by Frederick
Williams Peter Monge, fifth edition, Harcourt
College Publishers.
35Measures of central tendency
- Mean
- The average scoretotal score divided by the
number of scores - has a number of useful statistical properties
- however, can be sensitive to extreme scores
- many statistics based on mean
- Sensitive to outliers
- Extreme cases that just happened to end up in
your sample by chance
36Index of central tendency
Source http//www.uwsp.edu/psych/stat/5/skewnone.
gif
37Source Scianta.com
38Source www.wilderdom.com/.../L2-1UnderstandingIQ.
html
39Source CSAPs Data Pathways
40Measures of dispersion
- Look at how widely scattered over the scale the
scores are - Groups with identical means can be more or less
diverse - To find out how the group is distributed, we need
to know how far or close individual members are
from the mean - Like mean, only meaningful for interval or
ratio-level measures
41Measures of dispersion
- Range
- Distance between the highest and lowest
scores in a distribution - sensitive to extreme scores
- compensate by calculating interquartile range
(distance between the 25th and 75th percentile
points) which represents the range of scores for
the middle half of a distribution - Usually used in combination with other measures
of dispersion.
42Range
Source www.animatedsoftware.com/
statglos/sgrange.htm
43Source http//pse.cs.vt.edu/SoSci/converted/Dispe
rsion_I/box_n_hist.gif
44- Average Deviation (Mean Deviation)
- Merits
- 1. Easy to calculate and understand.
- 2. This can be calculated from any
average. - 3. It is less affected by extreme
observations. - Demerits
- 1. This is mathematically incomplete
because it ignores negative signs. - 2. As it can be calculated from any
average, it does not have certainty (i.e., it is
not a well defined measure). - 3. Its use is very limited in statistical
work.
45Measures of dispersion
- Variance (S2)
- Average of squared distances of individual points
from the mean - High variance means that most scores are far away
from the mean. Low variance indicates that most
scores cluster tightly about the mean.
46Standard Deviation (SD)
- A summary statistic of how much scores vary from
the mean - Square root of the Variance
- expressed in the original units of measurement
- Used in a number of inferential statistics
47Variance vs. Standard Deviation
Standard Deviation
Variance
Population
Sample
48Skewness of distributions
- Measures look at how lopsided distributions
arehow far from the ideal of the normal curve
they are - When the median and the mean are different, the
distribution is skewed. The greater the
difference, the greater the skew.
49- Distributions that trail away to the left are
negatively skewed and those that trail away to
the right are positively skewed - If the skewness is extreme, the researcher should
either transform the data to make them better
resemble a normal curve or else use a different
set of statisticsnonparametric statisticsto
carry out the analysis
50Different Shapes of Distributions
Source http//faculty.vassar.edu/lowry/f0204.gif
51Skewness of distributions
Source http//www.polity.org.za/html/govdocs/repo
rts/aids/images/image022.gif
52Distribution of posting frequency on Usenet
53Kurtosis
- Measures of kurtosis look at how sharply the
distribution rises to a peak and then drops away
54(No Transcript)
55(No Transcript)
56(No Transcript)
57(No Transcript)