Data observation and Descriptive Statistics - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Data observation and Descriptive Statistics

Description:

Kurtosis. leptokurtic. mesokurtic. platykurtic. Skewness, Number of Modes, and Kurtosis in Distribution of Housing Prices. z - Scores ... – PowerPoint PPT presentation

Number of Views:280
Avg rating:3.0/5.0
Slides: 51
Provided by: Spy3
Category:

less

Transcript and Presenter's Notes

Title: Data observation and Descriptive Statistics


1
Data observation and Descriptive Statistics
2
Organizing Data
  • Frequency distribution
  • Table that contains all the scores along with the
    frequency (or number of times) the score occurs.
  • Relative frequency proportion of the total
    observations included in each score.

3
Frequency distribution
Amount f(frequency) rf(relative frequency)
0.00 2 0.125
0.13 1 0.0625
0.93 1 0.0625
1.00 1 0.0625
10.00 1 0.0625
32.00 1 0.0625
45.53 1 0.0625
56.00 1 0.0625
60.00 1 0.0625
63.25 1 0.0625
74.93 1 0.0625
80.00 1 0.0625
85.28 1 0.0625
115.35 1 0.0625
120.00 1 0.0625

n16 1.00
4
Organizing data
  • Class interval frequency distribution
  • Scores are grouped into intervals and presented
    along with frequency of scores in each interval.
  • Appears more organized, but does not show the
    exact scores within the interval.
  • To calculate the range or width of the interval
  • (Highest score lowest score) / of intervals
  • Ex 120 0 / 5 24

5
Class interval frequency distribution
Class interval f (frequency) rf ( relative frequency)
0-24 6 .375
25-48 2 .125
49-73 3 .1875
74-98 3 .1875
99-124 2 .125

n 16 1.00
6
Graphs
  • Bar graphs
  • Data that are collected on a nominal scale.
  • Qualitative variables or categorical variables.
  • Each bar represents a separate (discrete)
    category, and therefore, do not touch.
  • The bars on the x-axis can be placed in any
    order.

7
Bar Graph
8
Graphs
  • Histograms
  • To illustrate quantitative variables
  • Scores represent changes in quantity.
  • Bars touch each other and represent a variable
    with increasing values.
  • The values of the variable being measured have a
    specific order and cannot be changed.

9
Histogram
10
Frequency polygon
  • Line graph for quantitative variables
  • Represents continuous data (time, age, weight)

11
Frequency Polygon
  • AGE
  • 22.06
  • 24.05
  • 25.04
  • 25.04
  • 25.07
  • 25.07
  • 26.03
  • 26.11
  • 27.03
  • 27.11
  • 29.03
  • 29.05
  • 29.05
  • 34
  • 37.1
  • 53

12
Descriptive Statistics
  • Numerical measures that describe
  • Central tendency of distribution
  • Width of distribution
  • Shape of distribution

13
Central tendency
  • Describe the middleness of a data set
  • Mean
  • Median
  • Mode

14
Mean
  • Arithmetic average
  • Used for interval and ratio data
  • Formula for population mean ( µ pronounced
    mu)
  • µ ? X
  • _____
  • N
  • Formulas for sample mean

15
Mean
Amount f(frequency) rf(relative frequency)
0.00 2 0.125
0.13 1 0.0625
0.93 1 0.0625
1.00 1 0.0625
10.00 1 0.0625
32.00 1 0.0625
45.53 1 0.0625
56.00 1 0.0625
60.00 1 0.0625
63.25 1 0.0625
74.93 1 0.0625
80.00 1 0.0625
85.28 1 0.0625
115.35 1 0.0625
120.00 1 0.0625

46.53 n16 1
16
Mean
  • Not a good indicator of central tendency if
    distribution has extreme scores (high or low).
  • High scores pull the mean higher
  • Low scores pull the mean lower

17
Median
  • Middle score of a distribution once the scores
    are arranged in increasing or decreasing order.
  • Used when the mean might not be a good indicator
    of central tendency.
  • Used with ratio, interval and ordinal data.

18
Median
0.00 0.00
0.13
0.93
1.00
10.00
32.00
45.53
56.00
60.00
63.25
74.93
80.00
85.28
115.35
120.00
19
Mode
  • The score that occurs in the distribution with
    the greatest frequency.
  • Mode 0 no mode
  • Mode 1 unimodal
  • Mode 2 bimodal distribution
  • Mode 3 trimodal distribution

20
Mode
Amount f(frequency) rf(relative frequency)
0.00 2 0.125
0.13 1 0.0625
0.93 1 0.0625
1.00 1 0.0625
10.00 1 0.0625
32.00 1 0.0625
45.53 1 0.0625
56.00 1 0.0625
60.00 1 0.0625
63.25 1 0.0625
74.93 1 0.0625
80.00 1 0.0625
85.28 1 0.0625
115.35 1 0.0625
120.00 1 0.0625

46.53 n16 1
21
Measures of Variability
  • Range
  • From the lowest to the highest score
  • Variance
  • Average square deviation from the mean
  • Standard deviation
  • Variation from the sample mean
  • Square root of the variance

22
Measures of Variability
  • Indicate the degree to which the scores are
    clustered or spread out in a distribution.
  • Ex Two distributions of teacher to student
    ratio.
  • Which college has more variation?

College A College B
4 16
12 19
41 22
Sum 57 Sum 57
Mean 19 Mean 19
23
Range
  • The difference between the highest and lowest
    scores.
  • Provides limited information about variation.
  • Influenced by high and low scores.
  • Does not inform about variations of scores not at
    the extremes.
  • Examples
  • Range X(highest) X (lowest)
  • College A range 41- 4 37
  • College B range 22-16 6

24
Variance
  • Limitations of range require a more precise way
    to measure variability.
  • Deviation The degree to which the scores in a
    distribution vary from the mean.
  • Typical measure of variability standard
    deviation (SD)
  • Variance
  • The first step in calculating standard deviation

25
Variance
  • X Number of therapy sessions each student
    attended.
  • M 4.2

Deviation
Sum of deviations 0
26
Variance
  • In order to eliminate negative signs, we square
    the deviations.
  • Sum the deviations sum of squares or SS

27
Variance
  • Take the average of the SS
  • Ex SS 48.80
  • SD2 S(X-M)2
  • N
  • That is the average of the squared deviations
    from the mean
  • SD2 9.76

28
Standard Deviation
  • Standard deviation
  • Typical amount that the scores vary or deviate
    from the sample mean
  • SD S(X-M)2
  • N
  • That is, the square root of the variance
  • Since we take the square root, this value is now
    more representative of the distribution of the
    scores.

29
Standard Deviation
  • X 1, 2, 4, 4, 10
  • M 4.2
  • SD 3.12 (standard deviation)
  • SD2 9.76 (variance)
  • Always ask yourself do these data (mean and SD)
    make sense based on the raw scores?

30
Population Standard Deviation
  • The average amount that the scores in a
    distribution vary from the mean.
  • Population standard deviation
  • (s pronounced sigma)

31
(No Transcript)
32
Sample Standard Deviation
  • Sample is a subset of the population.
  • Use sample SD to estimate population SD.
  • Because samples are smaller than populations,
    there may be less variability in a sample.
  • To correct for this, we divide the sample by N
    1
  • Increases the standard deviation of the sample.
  • Provides a better estimate of population standard
    deviation.
  • s ?( X - X ) ²
  • _________
  • N - 1

Unbiased Sample estimator standard deviation
Population standard deviation
33
Sample Standard Deviation
X X - mean X - mean squared
0.00 -46.53 2,165.04
0.00 -46.53 2,165.04
0.13 -46.40 2,152.96
0.93 -45.60 2,079.36
1.00 -45.53 2,072.98
10.00 -36.53 1,334.44
32.00 -14.53 211.12
45.53 -1.00 1.00
56.00 9.47 89.68
60.00 13.47 181.44
63.25 16.72 279.56
74.93 28.40 806.56
80.00 33.47 1,120.24
85.28 38.75 1,501.56
115.35 68.82 4,736.19
120.00 73.47 5,397.84

46.53 N 16 SS 26,295.02
34
Types of Distributions
  • Refers to the shape of the distribution.
  • 3 types
  • Normal distribution
  • Positively skewed distribution
  • Negatively skewed distribution

35
Normal Distribution
  • Normal distributions Specific frequency
    distribution
  • Bell shaped
  • Symmetrical
  • Unimodal
  • Most distributions of variables found in nature
    (when samples are large) are normal
    distributions.

36
Normal Distribution
Mean, media and mode are equal and located in the
center.
37
Normal Distribution
38
Skewed distributions
  • When our data are not symmetrical
  • Positively skewed distribution
  • Negatively skewed distribution
  • Memory hint skew is where the tail is also the
    tail looks like a skewer and it points to the
    skew (either positive or negative direction)

39
Skewed Distributions
40
Kurtosis
  • Kurtosis - how flat or peaked a distribution is.
  • Tall and skinny versus short and wide
  • Mesokurtic normal
  • Leptokurtic tall and thin
  • Platykurtic short and fat (squatty like a
    platypus!)

41
Kurtosis
leptokurtic
platykurtic
mesokurtic
42
Skewness, Number of Modes, and Kurtosis in
Distribution of Housing Prices
43
z - Scores
  • In which country (US vs. England) is Homer
    Simpson considered overweight?
  • How can we make this comparison?
  • Need to convert weight in pounds and kilograms
    to a standardized scale.
  • Z- scores allow for scores from different
    distributions to be compared under standardized
    conditions.
  • The need for standardization
  • Putting two different variables on the same scale
  • z-score Transforming raw scores into
    standardized scores
  • z (X - µ)
  • s
  • Tell us the number of standard deviations a score
    is from the mean.

44
z- Scores
  • Class 1 M 46.53 SD 41.87 X
    54.76
  • Class 2 M 53.67 SD 18.23 X
    89.07
  • In which class did I have more money in
    comparison to the distribution of the other
    students?
  • Sample z-score z (X - M)
  • s
  • When we convert raw scores from different
    distributions to z-scores, these scores become
    part of the same z distribution and we
    can compare scores from different distributions.

45
z Distribution
  • Characteristics (regardless of the original
    distributions)
  • z score at the mean equals 0
  • Standard deviation equals 1

46
z distribution of exam scores
M 70 s 10
47
Standard normal distribution
  • If a z-distribution is normal, then we refer to
    it as a standard normal distribution.
  • Provides information about the proportion of
    scores that are higher or lower than any other
    score in the distribution.

48
Standard Normal Curve Table
  • Standard normal curve table (Appendix A)
  • Statisticians provided the proportion of scores
    that fall between any two z-scores.
  • What is the percentile rank of a z score of 1?
  • Percentile rank proportion of scores at or
    below a given raw score.
  • Ex SAT score 1350 M 1120 s 340
  • 75th percentile

49
Percentile Rank
  • The percentage of scores that your score is
    higher than.
  • 89th percentile rank for height
  • You are taller than 89 of the students in the
    class. (you are tall!)
  • Homer Simpson 4th percentile rank for
    intelligence.
  • he is smarter than 4 of the population (or
    96 of the population is smarter than Homer).
  • GRE score 88th percentile rank
  • Reading scores of grammar school 18th percentile
    rank

50
Review
  • Data organization
  • Frequency distribution, bar graph, histogram and
    frequency polygon.
  • Descriptive statistics
  • Central tendency middleness of a distribution
  • Mean, median and mode
  • Measures of variation the spread of a
    distribution
  • Range, standard deviation
  • Distributions can be normal or skewed (positively
    or negatively).
  • Z- scores
  • Method of transforming raw scores into standard
    scores for comparisons.
  • Normal distribution mean z-score 0 and
    standard deviation 1
  • Normal curve table shows the proportions of
    scores below the curve for a given z-score.
Write a Comment
User Comments (0)
About PowerShow.com