Chapter 3 Descriptive Statistics - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Chapter 3 Descriptive Statistics

Description:

The p quantile, denoted as Q(p), is the number such that p is the percentage of ... General procedure for finding the p quantile of an empirical distribution ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 53
Provided by: karl252
Category:

less

Transcript and Presenter's Notes

Title: Chapter 3 Descriptive Statistics


1
Chapter 3 - Descriptive Statistics
  • Given precise enough measurement, even supposedly
    constant process conditions produce differing
    responses.
  • For this reason we are not as interested in
    individual data values as we are in the pattern
    or distribution of the data as a whole.
  • Well use graphs to display the distribution and
    statistics to describe the distribution.

2
3.1 Graphing Quantitative Data
  • Dot diagram
  • Each observation is a dot placed at a position
    corresponding to its numerical value.
  • Stem-and-leaf plot
  • Made by using the last few digits of each data
    point to indicate where it falls
  • Both displays require that each data point covers
    the same amount of space.

3
Example 3.1
  • The government requires manufacturers to monitor
    the amount of radiation emitted through the
    closed door of a microwave. The following are
    radiation amounts emitted by 24 microwaves
    measured by one manufacturer.

4
Example 3.1
  • Its easiest to first order your data.

5
Dot Diagram
.00
.30
.20
.10
6
Stem-and-Leaf Plot
  • Use the first digit after the decimal place as
    the stem and the second as the leaf

7
Example 3.2
  • Other examples of splitting data into stem and
    leaf

8
Stem-and-leaf Plot Vocabulary
  • Back-to-back
  • Example Mens vs. Womens heights
  • Recorded in inches
  • Men Women 630 0064000
    65000000006600
  • Split Stems
  • There are multiple rows for each stem.
  • Bins are divided at between digits.
  • Womens heights639641647865011655

9
Graphing Quantitative Data
  • Frequency tables
  • Break data into intervals of equal length and
    then tally the data points in each interval
  • The number of intervals we use varies
  • Every data point is included in exactly one
    interval.
  • Histograms
  • Break data into intervals of equal length and
    then create a connected bar chart
  • Begin vertical axis at zero
  • Draw bars with equal width

10
Frequency Table
  • Data from example 3.1

11
Histogram
  • Data from example 3.1

6
4
2
.03
.21
.15
.09
.27
12
Histogram (different intervals)
8
6
4
2
.05
.25
.15
13
BAD Histogram
8
6
4
2
.05
.25
.15
14
Distributional Shapes
Bell-shaped
Right-skewed
Left-skewed
Bimodal (Multimodal)
Uniform (Unimodal)
Truncated
15
Bivariate Quantitative Data
  • Scatterplot
  • Plot each data point as a dot where its response
    variable and supervised variable
  • Look for patterns
  • Run Chart
  • Plot data points in order determined by the time
    of observation
  • Look for patterns

16
Example 3.3
  • Scatterplot for ACT score and Highschool GPA for
    12 students

The scatterplot of GPA against ACT score shows a
fairly strong, positive, linear relationship.
17
Example 3.4
strong, positive, linear relationship
strong, negative, linear relationship
strong curved relationship
18
Example 3.4 (continued)
weak, positive, linear relationship
weak, negative, linear relationship
no relationship
19
Example 3.5
  • Run Chart - Suppose that we plot the number of
    typing errors made per minute by a typist against
    time.

Pattern steady increase until we hit minute 10,
then there is a sharp drop followed by another
steady increase. It was discovered that the
typist was given a five minute break after minute
10.
20
3.2 Quantiles
  • The p quantile, denoted as Q(p), is the number
    such that p is the percentage of the distribution
    that lies to the left of Q(p), and 1-p is the
    percentage of the distribution that lies to the
    right of Q(p).

1/2
Relative Frequency
Q(1/3) 2/3
2/3 4/3 2
21
Quantiles (Book)
  • For empirical distributions, the above definition
    translates to
  • For an ordered data set x1 x2 xn
  • 1. For i 1,2,,n the p quantile of
    the data set is
  • the ith smallest data point, xi. That is

22
Quantiles (Book)
  • 2. For any p not equal to for some integer i
    n such
  • that , the p quantile is obtained by
    linear
  • interpolation between the values of
  • with corresponding that bracket p

23
Finding Quantiles
  • General procedure for finding the p quantile of
    an empirical distribution
  • Order data values x(1) x(2) x(n)
  • Set i np0.5
  • If i 1, 2, , n then
  • otherwise,

24
Example 3.5
  • Ten batteries were tested to determine how long
    the batteries would last (hrs) under normal
    conditions. Below are the ten values that were
    obtained
  • 100, 120, 80, 90, 95, 115, 120, 110, 105, 95
  • Give values for Q(.35) and Q(.42)
  • Give the values for Q(.68) and Q(.90)
  • Give the values for Q(.25), Q(.50) and Q(.75)
  • Step1 Order the data
  • 80,90,95,95,100,105,110,115,120,120

25
Example 3.5a
  • Give the values for Q(.35) and Q(.42)

26
Example 3.5b
  • Give the values for Q(0.68) and Q(0.90)

27
Example 3.5c
  • Give the values for Q(0.25),Q(.50) and Q(0.75)

28
Quantile Terminology
  • Special quantiles
  • Q(.25) Q1, 1st quartile, lower quartile
  • Q(.5) Q2, 2nd quartile, median
  • Q(.75) Q3, 3rd quartile, upper quartile
  • Special values associated with quartiles
  • Inter-quartile range (IQR) Q3 Q1
  • Upper fence Q3 1.5IQR
  • Lower fence Q1 1.5IQR

29
Boxplots
  • Another tool used to illustrate the distribution
  • Steps for making a boxplot
  • Order the observed data values
  • Find Q1, Q2, Q3, IQR, UF and LF
  • Draw a box that spans the IQR
  • Divide the box at the median (Q2)
  • Draw asterisks (or dots) for any data values less
    than the lower fence and any values greater than
    the upper fence
  • Draw a line from the sides of the box to the
    smallest value greater than the LF and the
    largest value smaller than the UF

30
Example 3.6
  • Draw a boxplot based on the 15 ordered values
    below
  • 75, 80, 80, 85, 90, 95, 95, 100, 105, 110, 110,
    115, 120, 120, 125
  • Find the necessary values

31
Example 3.6
  • Q186.25, Q2100, Q3113.75, so
  • IQR 113.75 86.25 41.25
  • UF Q(.75)1.5(IQR) 155
  • LF Q(.25) 1.5(IQR) 45
  • Identify all values outside the upper and lower
    fences none

32
Example 3.6
33
Q-Q Plots
  • Q-Q plot Quantile-Quantile Plot
  • Used for two data sets
  • We plot Q(p) for data set 1 (denoted Q1(p))
    versus Q(p) for data set 2
  • We will only deal with two data sets of the same
    size
  • Straight line indicates that the two data sets
    have the same distributional shape.

34
Example 3.7
  • Data Set 1 1, 2, 3, 4, 5
  • Data Set 2 6, 7, 8, 9, 10

35
Example 3.8
  • Data set 1 1, 5, 7, 8, 9, 10
  • Data set 2 -10, -9, -8, -7, -5, -1

36
Normal Probability Plot
  • A normal probability plot is a type of Q-Q plot
    that allows us to determine if the distribution
    of our data is bell-shaped (normal).
  • A straight line is indication that our data is
    normal/bell-shaped.
  • An S-shaped line indicates that our data is
    skewed.

37
Table 3.10
  • Table 3.10 (page 89) in the book gives some
    quantiles for a distribution that is known to be
    bell-shaped.
  • The numbers in the body of the table give Q(p)
    for p given by the margins of the table.
  • Q(.23) -.74
  • Row .2 and Column .03 give the p0.23 quantile as
    -.74
  • Q(.79) .81
  • Row .7 and Column .09 give the p0.79 quantile as
    .81

38
Example 3.9
  • Annual incomes (in thousands of dollars) for 8
    families (in a common geographical location) are
    given below 23, 31, 43, 47, 51, 58, 67, 83
  • Does this data appear to be from a bell-shaped
    distribution?
  • Remember p

39
Example 3.9
Bell-shaped distribution?
40
3.3 Numerical Measures
  • Measures of location
  • Median
  • Same as Q(.5)
  • Not affected by skew or outliers (extreme
    observations)
  • Sample Mean
  • For data , the mean is
    given as
  • Strongly affected by skew or outliers

41
Example 3.10
  • Data set 1 2, 3, 5, 8, 12
  • Median 5
  • Mean (235812)/5 6
  • Data set 2 2, 3, 5, 8, 102
  • Median 5
  • Mean 24

42
Measures of Spread
  • IQR
  • Measures the spread of the middle half of the
    data
  • Not sensitive to skew or outliers
  • Range
  • Highly sensitive to outliers

43
Measures of Spread
  • Sample Variance
  • How much the data is spread from the sample mean,
    .
  • Sensitive to outliers or skew
  • Sample Standard Deviation
  • Sensitive to outliers or skew

44
Example 3.11
  • Same data as example 3.9 23, 31, 43, 47,
    51, 58, 67, 83
  • Find the range
  • Find the mean
  • Find the variance
  • Find the standard deviation

45
Understanding Standard Deviation
  • Chebyschevs Theorem
  • For any data set and any number k larger than 1,
    a fraction of at least of the
    data are within ks of .
  • 3/4 of the data will be within 2 standard
    deviations of the mean, 8/9 of the data will be
    within 3 standard deviations of the mean, etc.
  • Standard deviation acts as a ruler

46
Statistics vs. Parameters
  • Numerical summaries of sample data are called
    statistics.
  • Numerical summaries of population data are called
    parameters.
  • Often represented by Greek letters

47
Plots of Summary Statistics
  • Example 9 from the book
  • Three different glues are tested with three
    different types of wood (3x3 factorial study) and
    the mean strength is calculated (based on 3
    observations) for each combination.

48
Example 3.12
  • A plot of the means categorized by glue and wood
    shows that pine and fir have similar gluing
    properties with pine being stronger. The gluing
    properties of Oak are much different (opposite
    trend).

250
oak
200
pine
150
fir
100
white
cascamite
carpenter
49
3.4 Statistics for Qualitative Data
  • The fraction of items in the sample with a
    particular characteristic is
  • The sample mean occurrences per unit of item is
  • is closer in meaning to than to .

50
Example 3.13
  • A random sample of students from ISU is taken in
    which there ends up being 210 freshmen, 171
    sophomores, 182 juniors, and 115 seniors.
  • Find for each of the classifications.

51
Example 3.14
  • When studying the number of towns reporting power
    outages caused by thunderstorms, it is found that
    there were 8 storms in which no outages were
    reported, 2 storms in which 3 outages were
    reported, and 5 storms in which 1 outage was
    reported. Find .

52
Plotting Qualitative Data
  • Bar Charts same as histograms, but without
    intervals
  • Segmented Bar Charts each bar is a divided
    between different levels of an additional
    variable
  • Run Chart taken on categorical times
  • Example daily
  • Read more in section 3.4.2
Write a Comment
User Comments (0)
About PowerShow.com