Describing Data - PowerPoint PPT Presentation

About This Presentation
Title:

Describing Data

Description:

Country with 9 medals ranks 24th out of 55. There are 31 nations (56.36%) below ... The medal tally that corresponds to a 50th percentile is the one in the middle ... – PowerPoint PPT presentation

Number of Views:14
Avg rating:3.0/5.0
Slides: 23
Provided by: AlokSri8
Category:

less

Transcript and Presenter's Notes

Title: Describing Data


1
Describing Data
  • Descriptive Statistics
  • Central Tendency and Variation

2
Lecture Objectives
  • You should be able to
  • Compute and interpret appropriate measures of
    centrality and variation.
  • Recognize distributions of data.
  • Apply properties of normally distributed data
    based on the mean and variance.
  • Compute and interpret covariance and correlation.

3
Summary Measures
  • 1. Measures of Central Location
  • Mean, Median, Mode
  • 2. Measures of Variation
  • Range, Percentile, Variance, Standard
    Deviation
  • 3. Measures of Association
  • Covariance, Correlation

4
Measures of Central LocationThe Arithmetic Mean
It is the Arithmetic Average of data
values The Most Common Measure of
Central Tendency Affected by Extreme Values
(Outliers)
Sample Mean
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10 12
14
Mean 5
Mean 6
5
Median
Important Measure of Central Tendency In an
ordered array, the median is the middle
number. If n is odd, the median is the middle
number. If n is even, the median is the average
of the 2 middle numbers. Not Affected by Extreme
Values
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10 12
14
Median 5
Median 5
6
Mode
A Measure of Central Tendency Value that Occurs
Most Often Not Affected by Extreme Values There
May Not be a Mode There May be Several Modes Used
for Either Numerical or Categorical Data
0 1 2 3 4 5 6 7 8 9 10 11
12 13 14
0 1 2 3 4 5 6
No Mode
Mode 9
7
Measures of Variability
  • Range
  • The simplest measure
  • Percentile
  • Used with Median
  • Variance/Standard Deviation
  • Used with the Mean

8
Range
Difference Between Largest Smallest
Observations Range Ignores How
Data Are Distributed
  • Range 12 - 7 5

Range 12 - 7 5
9
Percentile
2008 Olympic Medal Tally for top 55 nations. What
is the percentile score for a country with 9
medals? What is the 50th percentile?
Obs Medals Obs Medals Obs Medals Obs Medals Obs Medals
1 110 12 24 23 10 34 6 45 3
2 100 13 19 24 9 35 6 46 3
3 72 14 18 25 8 36 6 47 2
4 47 15 18 26 8 37 5 48 2
5 46 16 16 27 7 38 5 49 2
6 41 17 15 28 7 39 5 50 2
7 40 18 14 29 7 40 4 51 2
8 31 19 13 30 6 41 4 52 1
9 28 20 11 31 6 42 4 53 1
10 27 21 10 32 6 43 4 54 1
11 25 22 10 33 6 44 3 55 1
10
Percentile - solutions
  • Order all data (ascending or descending).
  • Country with 9 medals ranks 24th out of 55. There
    are 31 nations (56.36) below it and 23 nations
    (41.82) above it. Hence it can be considered a
    57th or 58th percentile score.
  • The medal tally that corresponds to a 50th
    percentile is the one in the middle of the group,
    or the 28th country, with 7 medals. Hence the
    50th percentile (Median) is 7.
  • Now compute the first and third quartile values.

11
Box Plot
  • The box plot shows 5 points, as follows

12
Outliers
Interquartile Range (IQR) Q3 Q1 60-40
20 1 Step 1.5 IQR 1.520 30 Q1 30
40 - 30 10 Q3 30 60 30 90 Any point
outside the limits (10, 90) is considered an
outlier.
13
Variance
For the Population
For the Sample
Variance is in squared units, and can be
difficult to interpret. For instance, if data are
in dollars, variance is in squared dollars.
14
Standard Deviation
For the Population
For the Sample
Standard deviation is the square root of the
variance.
15
Computing Standard Deviation
Computing Sample Variance and Standard Deviation   Computing Sample Variance and Standard Deviation   Computing Sample Variance and Standard Deviation   Computing Sample Variance and Standard Deviation  
Mean of X   6    
   
    Deviation    
X From Mean Squared  
3 -3 9  
4 -2 4  
6 0 0  
8 2 4  
9 3 9  
    26 Sum of Squares
    6.50 Variance SS/n-1
    2.55 Stdev Sqrt(Variance)
16
The Normal Distribution
  • A property of normally distributed data is as
    follows

Distance from Mean Percent of observations included in that range
1 standard deviation Approximately 68
2 standard deviations Approximately 95
3 standard deviations Approximately 99.74
17
Comparing Standard Deviations
Mean 15.5 s 3.338
Mean 15.5 s .9258
Mean 15.5 s 4.57
11 12 13 14 15 16 17 18
19 20 21
18
Outliers
  • Typically, a number beyond a certain number of
    standard deviations is considered an outlier.
  • In many cases, a number beyond 3 standard
    deviations (about 0.25 chance of occurring) is
    considered an outlier.
  • If identifying an outlier is more critical, one
    can make the rule more stringent, and consider 2
    standard deviations as the limit.

19
Coefficient of Variation
Standard deviation relative to the mean. Helps
compare deviations for samples with different
means
20
Computing CV
  • Stock A Average Price last year 50
  • Standard Deviation 5
  • Stock B Average Price last year 100
  • Standard Deviation 5

Coefficient of Variation Stock A CV
10 Stock B CV 5
21
Standardizing Data
Obs Age Income Z-Age Z-Income
1 25 25000 -1.05 -1.13
2 28 52000 -0.86 -0.63
3 35 63000 -0.41 -0.43
4 36 74000 -0.34 -0.22
5 39 69000 -0.15 -0.31
6 45 80000 0.23 -0.11
7 48 125000 0.42 0.72
8 75 200000 2.15 2.11
         
Mean 41.38 86000.00    
Std Dev 15.63 53973.54    
Which of the two numbers for person 8 is farther
from the mean? The age of 75 or the income of
200,000?
Z scores tell us the distance from the mean,
measured in standard deviations
22
Measures of Association
  • Covariance and Correlation

Covariance measures the average product of the
deviations of two variables from their
means. Correlation is the standardized form of
covariance (divided by the product of their
standard deviations). Correlation is always
between -1 and 1.
Mean       Mean
2       9
Stdev 1   3.6  

X Dev Product Dev Y
1 -1 3 -3 6
2 0 0 -1 8
3 1 4 4 13
    7    
Covariance Covariance 3.5    
Correlation Correlation 0.97    
Write a Comment
User Comments (0)
About PowerShow.com