Numerical Methods of Descriptive Statistics - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Numerical Methods of Descriptive Statistics

Description:

Learn how to construct and apply box-plots with all the components. Text reference: 2[4-9,11] ... Several ways to construct one. It includes all the pieces: ... – PowerPoint PPT presentation

Number of Views:422
Avg rating:3.0/5.0
Slides: 29
Provided by: rayhag
Category:

less

Transcript and Presenter's Notes

Title: Numerical Methods of Descriptive Statistics


1
Numerical Methods of Descriptive Statistics
  • MSIT3000
  • Lecture 2

2
Objectives
  • Learn to calculate interpret measures for
  • Central tendency
  • Sample mean, median, mode
  • Variability
  • Standard deviation, variance, inter-quartile
    range.
  • Learn how to construct and apply box-plots with
    all the components.
  • Text reference 24-9,11
  • HLS reference Ch 2.

3
Key Terms
  • Central Tendency
  • Mean
  • Median
  • Mode
  • Variability
  • Range Inter-Quartile Range (IQR)
  • Variance or Standard Deviation
  • Skewness
  • Kurtosis

4
Sample vs Population
  • Similar measures are calculated, but these are
  • different...
  • ...conceptually
  • ...computationally
  • in notation.

5
Overview
6
Measures of Central Tendency
  • Mode
  • Median
  • Mean

7
Method of Averages
  • Sample mean vs population mean
  • The sample variance
  • the sample standard deviation

8
The sample mean
9
The Sample Variance and Sample Standard Deviation
10
Interpreting the Sample Standard Deviation
  • Question why dont we interpret the sample
    variance directly?
  • Hint What are the units of sample variance?
  • Two rules aid the interpretation of the sample
    standard deviation
  • Chebyshevs rule
  • The Empirical rule

11
Chebyshevs Rule
  • Mathematical Theorem
  • at least 1-1/k2 of the measurements will fall
    within k standard deviations of the mean
  • This provides a lower bound for the number of
    observations.
  • Assumptions essentially none.
  • Limitations weak result.

12
The Empirical Rule
  • Zero mathematical basis, rather a rule of thumb.
  • Assumptions the relative frequency distribution
    must be mound-shaped and symmetrical.
  • The rule states that within 1 standard deviation
    from the mean approximately 68 of the data will
    be clustered. 95 will be within 2s and 99.7 of
    the data will be within 3s of x-bar.

13
Preview of the z-score
  • The z-score measures how many standard deviations
    a variable is from the mean.
  • For example, if McDonalds has a mean profit
    margin of 12 of revenue, how many standard
    deviations from 12 is a store with a profit
    margin of 18?
  • Answer if s3, z (18-12)/3 2

14
How do you develop feel for the standard
deviation?
  • Do problems.
  • Read the text to develop a conceptual overview of
    statistics.
  • Try to frame problems in terms of standard
    deviations.
  • Try to come up with problems that can be
    addressed using the standard deviation but not
    otherwise.

15
Rank-ordering
16
Method Rank-ordering
  • Step 1 rank the observations
  • Step 2 count and identify appropriate cut-offs
  • Step 3 use the numbers from step 2 to identify
    possible and likely outliers

17
Calculated quantities - Percentiles
  • A percentile P is a value of x such that the
    given percentage of the data falls below x.
  • To determine the Pth percentile (QS 2.4)
  • Rank order the data.
  • Let l be the location of the Pth percentile in
    the ordered data

18
Calculated quantities - Percentiles
  • 3. If l is not an integer, then round l up to the
    next greatest integer. If l is an integer, the
    percentile P is the average of the the data
    values in position l and l1.
  • For example, the 50th percentile is called
    the median. If there are an odd number of
    observations (N), the median is the middle number
    of the ranked observations. If N is even, the
    median is the average of the two numbers in the
    middle.

19
Calculated quantities - Quartiles
  • In order to describe the data, we split it up
    into portions of roughly the same size.
  • The median (Q2 or 50th percentile) splits the
    data into two sets.
  • Q1 (or QL) is defined to be the 25th percentile
    and splits the first half of the data into two
    again.
  • Q3 (or QU) is defined to be the 75th percentile
    and splits the second half of the data into two.

20
Example Assume X is the number of courses a
student is taking.
21
Step 1Rank order the data
22
Calculate the quartiles
  • The median
  • l (M)7(50/100)3.5 l 4
  • We rounded up since 3.5 is not an integer.
  • Since l 4, the median 5
  • The lower upper quartiles
  • l (QL) 7(25/100) 1.75 l 2
  • QL 4
  • l (QU) 7(75/100) 5.25 l 6
  • QU 5

23
Measure of variability
  • Range
  • Sensitive to extreme values
  • Interquartile range (IQR)
  • IQR Qu QL
  • Easy to compute
  • Totally insensitive to extreme values
  • How does that compare to the sample standard
    deviation?

24
Outliers
  • Outliers are data points that do not fit the data
    set because
  • There was something wrong with the way the data
    was collected or
  • There was a unique change that made the outlier
    too different to compare to other data points or
  • Any other reason that makes the data points
    simply irrelevant to the rest of the data or
    problem at hand.
  • Often the most vital information is in the
    outliers!
  • How do we identify outliers without knowing
    anything more than what is included in the data?

25
Detection of outliers
  • Method of averages z-scores
  • Three or more standard deviations away from the
    mean indicates an observation may be an outlier.
  • Method of rank-ordering
  • The upper fence UF Qu 1.5IQR
  • The lower fence LF QL - 1.5IQR
  • Outer fences Add (or subtract) 3 IQRs instead
    of 1.5.
  • Observations outside the fences are potential
    outliers. Those outside the outer fences are
    almost certainly outliers, but we focus on the
    inner fences.

26
Graphical SummaryThe Box - Plot
  • Several ways to construct one.
  • It includes all the pieces
  • The median
  • The quartiles
  • The Upper Lower Fences
  • You can identify potential outliers as the
    observations outside the fences.

27
Problem in class
Find any outliers and draw a box-plot.
28
Conclusion
  • Objectives addressed
  • Learn measures for
  • Central tendency
  • variability
  • and how to interpret those measures.
  • Be able to calculate sample mean, sample
    variance, sample standard deviation construct a
    box-plot with all the components.
Write a Comment
User Comments (0)
About PowerShow.com