Methods for Describing Sets of Data - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Methods for Describing Sets of Data

Description:

Pareto Diagram. Graphical Methods for Describing Quantitative Data. The Data ... Pareto diagram. Graphical methods for Quantitative Data. Dot plot. Stem-and ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 34
Provided by: anna229
Category:

less

Transcript and Presenter's Notes

Title: Methods for Describing Sets of Data


1
Chapter 2
  • Methods for Describing Sets of Data

2
Objectives
  • Describe Data using Graphs
  • Describe Data using Charts

3
Describing Qualitative Data
  • Qualitative data are nonnumeric in nature
  • Best described by using Classes
  • 2 descriptive measures
  • class frequency number of data points in a
    class
  • class relative class frequency
  • frequency total number of data points in
    data set
  • class percentage class relative freq. x 100

4
Describing Qualitative Data Displaying
Descriptive Measures
  • Summary Table

Class Frequency
Class percentage class relative frequency x 100
5
Describing Qualitative Data Qualitative Data
Displays
  • Bar Graph

6
Describing Qualitative Data Qualitative Data
Displays
  • Pie chart

7
Describing Qualitative Data Qualitative Data
Displays
  • Pareto Diagram

8
Graphical Methods for Describing Quantitative Data
  • The Data

9
Graphical Methods for Describing Quantitative Data
  • For describing, summarizing, and detecting
    patterns in such data, we can use three graphical
    methods
  • dot plots
  • stem-and-leaf displays
  • histograms

10
Graphical Methods for Describing Quantitative Data
  • Dot Plot

11
Graphical Methods for Describing Quantitative Data
  • Stem-and-Leaf Display

12
Graphical Methods for Describing Quantitative Data
  • Histogram

13
Graphical Methods for Describing Quantitative Data
  • More on Histograms

Number of Observations in Data Set Number of Classes
Less than 25 5-6
25-50 7-14
More than 50 15-20
14
Summation Notation
  • Used to simplify summation instructions
  • Each observation in a data set is identified by a
    subscript
  • x1, x2, x3, x4, x5, . xn
  • Notation used to sum the above numbers together
    is

15
Summation Notation
  • Data set of 1, 2, 3, 4
  • Are these the same? and

16
Numerical Measures of Central Tendency
  • Central Tendency tendency of data to center
    about certain numerical values
  • 3 commonly used measures of Central Tendency
  • Mean
  • Median
  • Mode

17
Numerical Measures of Central Tendency
  • The Mean
  • Arithmetic average of the elements of the data
    set
  • Sample mean denoted by
  • Population mean denoted by
  • Calculated as and

18
Numerical Measures of Central Tendency
  • The Median
  • Middle number when observations are arranged in
    order
  • Median denoted by m
  • Identified as the observation if n is odd, and
    the mean of the and observations if n is even

19
Numerical Measures of Central Tendency
  • The Mode
  • The most frequently occurring value in the data
    set
  • Data set can be multi-modal have more than one
    mode
  • Data displayed in a histogram will have a modal
    class the class with the largest frequency

20
Numerical Measures of Central Tendency
  • The Data set 1 3 5 6 8 8 9 11 12
  • Mean
  • Median is the or 5th observation, 8
  • Mode is 8

21
Numerical Measures of Variability
  • Variability the spread of the data across
    possible values
  • 3 commonly used measures of Variability
  • Range
  • Variance
  • Standard Deviation

22
Numerical Measures of Variability
  • The Range
  • Largest measurement minus the smallest
    measurement
  • Loses sensitivity when data sets are large
  • These 2 distributionshave the same range.
  • How much does therange tell you about the data
    variability?

23
Numerical Measures of Variability
  • The Sample Variance (s2)
  • The sum of the squared deviations from the mean
    divided by (n-1). Expressed as units squared
  • Why square the deviations? The sum of the
    deviations from the mean is zero

24
Numerical Measures of Variability
  • The Sample Standard Deviation (s)
  • The positive square root of the sample variance
  • Expressed in the original units of measurement

25
Numerical Measures of Variability
  • Samples and Populations - Notation

Sample Population
Variance s2
Standard Deviation s
26
Numerical Measures of Relative Standing
  • Descriptive measures of relationship of a
    measurement to the rest of the data
  • Common measures
  • percentile ranking
  • z-score

27
Numerical Measures of Relative Standing
  • Percentile rankings make use of the pth
    percentile
  • The median is an example of percentiles.
  • Median is the 50th percentile 50 of
    observations lie above it, and 50 lie below it
  • For any p, the pth percentile has p of the
    measures lying below it, and (100-p) above it

28
Numerical Measures of Relative Standing
  • z-score the distance between a measurement x
    and the mean, expressed in standard units
  • Use of standard units allows comparison across
    data sets

29
Numerical Measures of Relative Standing
  • More on z-scores
  • Z-scores follow the empirical rule for mounded
    distributions

30
Methods for Detecting Outliers
  • Outlier an observation that is unusually large
    or small relative to the data values being
    described
  • Causes
  • Invalid measurement
  • Misclassified measurement
  • A rare (chance) event
  • 2 detection methods
  • Box Plots
  • z-scores

31
Methods for Detecting Outliers
  • Box Plots
  • based on quartiles, values that divide the
    dataset into 4 groups
  • Lower Quartile QL 25th percentile
  • Middle Quartile - median
  • Upper Quartile QU 75th percentile
  • Interquartile Range (IQR) QU - QL

32
Methods for Detecting Outliers
  • Box Plots
  • Not on plot inner and outer fences, which
    determine potential outliers

33
Methods for Detecting Outliers
  • Rules of thumb
  • Box Plots
  • measurements between inner and outer fences are
    suspect
  • measurements beyond outer fences are highly
    suspect
  • Z-scores
  • Scores of ?3 in mounded distributions (?2 in
    highly skewed distributions) are considered
    outliers
Write a Comment
User Comments (0)
About PowerShow.com