Descriptive statistics - PowerPoint PPT Presentation

About This Presentation
Title:

Descriptive statistics

Description:

Descriptive statistics for one variable Measures of central tendency Mode (Mo): the most frequent score in a distribution good for nominal data Median ... – PowerPoint PPT presentation

Number of Views:186
Avg rating:3.0/5.0
Slides: 58
Provided by: JimH112
Category:

less

Transcript and Presenter's Notes

Title: Descriptive statistics


1
Descriptive statistics
for one variable
  • ?????

2
What to describe?
  • What is the location or center of the data?
    (measures of location)
  • How do the data vary? (measures of
    variability).

3
Types of statistics
  • Descriptive Statistics
  • Gives numerical and graphic procedures to
    summarize a collection of data in a clear and
    understandable way
  • Inferential Statistics
  • Provides procedures to draw inferences about
    a population from a sample

4
Reasons for using statistics
  • aid in summarization
  • aid in getting at whats going on
  • aid in extracting information from the data
  • aid in communication

5
(No Transcript)
6
Frequency distribution
  • The frequency with which observations are
    assigned to each category or point on a
    measurement scale.
  • Most basic form of descriptive statistic
  • May be expressed as a percentage of the total
    sample found in each category

Source Reasoning with Statistics, by Frederick
Williams Peter Monge, fifth edition, Harcourt
College Publishers.
7
Frequency distribution
  • The distribution is read differently depending
    upon the measurement level
  • Nominal scales are read as discrete measurements
    at each level (no ordering)
  • Ordinal measures show tendencies, but categories
    should not be compared (ordering exists, but not
    distance)
  • Interval (distance exists, but no ratios) and
    ratio scales (ratios exist) all for comparison
    among categories

8
  • Sex N Mean Median TrMean StDev SE
    Mean
  • female 126 91.23 90.00 90.83 11.32
    1.01
  • male 100 96.79 110.00 105.62 17.39
    1.74
  • Minimum Maximum Q1 Q3
  • female 65.00 120.00 85.00
    98.25
  • male 75.00 162.00 95.00
    118.75

9
(No Transcript)
10
(No Transcript)
11
Source Protecting Children from Harmful
Television TV Ratings and the V-chip Amy I.
Nathanson, PhD Lecturer, University of California
at Santa Barbara Joanne Cantor, PhD Professor,
Communication Arts, University of
Wisconsin-Madison
12
Source http//www.elonka.com/kryptos/ Web page
on cryptography
13
Ancestry of US residents
14
Source UCLA International Institute
15
(No Transcript)
16
Source Cornell University website
17
Source www.cit.cornell.edu/computer/students/band
width/charts.html
18
Source www.cit.cornell.edu/computer/students/band
width/charts.html
19
Source Verisign
20
Search engine use
21
The percentage of online searches done by US home
and work web surfers in July 2006
22
NY Times
23
Source Verisign
24
Old Faithful Geyser
25
  • Duration in seconds of 272 eruptions of the Old
    Faithful geyser.
  • library(datasets)
  • gt faithful110,
  • eruptions waiting
  • 1 3.600 79
  • 2 1.800 54
  • 3 3.333 74
  • 4 2.283 62
  • 5 4.533 85
  • 6 2.883 55
  • 7 4.700 88
  • 8 3.600 85
  • 9 1.950 51
  • 10 4.350 85

gt summary(faithful) eruptions waiting
Min. 1.600 Min. 43.0 1st
Qu. 2.163 1st Qu. 58.0 Median 4.000
Median 76.0 Mean 3.488 Mean
70.9 3rd Qu. 4.454 3rd Qu. 82.0
Max. 5.100 Max. 96.0
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
Normal distribution
  • Many characteristics are distributed through the
    population in a normal manner
  • Normal curves have well-defined statistical
    properties
  • Parametric statistics are based on the assumption
    that the variables are distributed normally
  • Most commonly used statistics
  • This is the famous Bell curve where many cases
    fall near the middle of the distribution and few
    fall very high or very low
  • I.Q.

30
Statistical properties of the normal distribution
31
(No Transcript)
32
I.Q. distribution
33
(No Transcript)
34
Measures of central tendency
  • Mode (Mo) the most frequent score in a
    distribution
  • good for nominal data
  • Median (Md) the midpoint or midscore in a
    distribution.
  • (50 cases above/50 cases below)
  • insensitive to extreme cases
  • --Interval or ratio

Source Reasoning with Statistics, by Frederick
Williams Peter Monge, fifth edition, Harcourt
College Publishers.
35
Measures of central tendency
  • Mean
  • The average scoretotal score divided by the
    number of scores
  • has a number of useful statistical properties
  • however, can be sensitive to extreme scores
  • many statistics based on mean
  • Sensitive to outliers
  • Extreme cases that just happened to end up in
    your sample by chance

36
Index of central tendency
Source http//www.uwsp.edu/psych/stat/5/skewnone.
gif
37
Source Scianta.com
38
Source www.wilderdom.com/.../L2-1UnderstandingIQ.
html
39
Source CSAPs Data Pathways
40
Measures of dispersion
  • Look at how widely scattered over the scale the
    scores are
  • Groups with identical means can be more or less
    diverse
  • To find out how the group is distributed, we need
    to know how far or close individual members are
    from the mean
  • Like mean, only meaningful for interval or
    ratio-level measures

41
Measures of dispersion
  • Range
  • Distance between the highest and lowest
    scores in a distribution
  • sensitive to extreme scores
  • compensate by calculating interquartile range
    (distance between the 25th and 75th percentile
    points) which represents the range of scores for
    the middle half of a distribution
  • Usually used in combination with other measures
    of dispersion.

42
Range
Source www.animatedsoftware.com/
statglos/sgrange.htm
43
Source http//pse.cs.vt.edu/SoSci/converted/Dispe
rsion_I/box_n_hist.gif
44
  • Average Deviation (Mean Deviation)
  • Merits      
  • 1. Easy to calculate and understand.     
  •  2. This can be calculated from any
    average.      
  • 3. It is less affected by extreme
    observations.      
  • Demerits      
  • 1. This is mathematically incomplete
    because it ignores negative signs.      
  • 2. As it can be calculated from any
    average, it does not have certainty (i.e., it is
    not a well defined measure).      
  • 3. Its use is very limited in statistical
    work.

45
Measures of dispersion
  • Variance (S2)
  • Average of squared distances of individual points
    from the mean
  • High variance means that most scores are far away
    from the mean. Low variance indicates that most
    scores cluster tightly about the mean. 

46
Standard Deviation (SD)
  • A summary statistic of how much scores vary from
    the mean
  • Square root of the Variance
  • expressed in the original units of measurement
  • Used in a number of inferential statistics

47
Variance vs. Standard Deviation
Standard Deviation
Variance
Population
Sample
48
Skewness of distributions
  • Measures look at how lopsided distributions
    arehow far from the ideal of the normal curve
    they are
  • When the median and the mean are different, the
    distribution is skewed. The greater the
    difference, the greater the skew.

49
  • Distributions that trail away to the left are
    negatively skewed and those that trail away to
    the right are positively skewed
  • If the skewness is extreme, the researcher should
    either transform the data to make them better
    resemble a normal curve or else use a different
    set of statisticsnonparametric statisticsto
    carry out the analysis

50
Different Shapes of Distributions
Source http//faculty.vassar.edu/lowry/f0204.gif
51
Skewness of distributions
Source http//www.polity.org.za/html/govdocs/repo
rts/aids/images/image022.gif
52
Distribution of posting frequency on Usenet
53
Kurtosis
  • Measures of kurtosis look at how sharply the
    distribution rises to a peak and then drops away

54
(No Transcript)
55
(No Transcript)
56
(No Transcript)
57
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com