Chapter 6. Descriptive Statistics - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Chapter 6. Descriptive Statistics

Description:

Can we have information on the underlying probability distribution? ... 6.3.6 The 100p Sample Percentile (pth Sample Quantile) and sample quartile ... – PowerPoint PPT presentation

Number of Views:165
Avg rating:3.0/5.0
Slides: 29
Provided by: queueK
Category:

less

Transcript and Presenter's Notes

Title: Chapter 6. Descriptive Statistics


1
Chapter 6. Descriptive Statistics
  • 6.1 Experimentation
  • 6.2 Data Presentation
  • 6.3 Sample Statistics
  • 6.4 Examples

2
  • Data a mixture of nature and noise.
  • Is the noise manageable?
  • The noise is desired to be represented by a
    probability distribution.
  • Statistical inference
  • The science of deducing properties of an
    underlying probability distribution from data
  • Can we have information on the underlying
    probability distribution?
  • The information is given in the form of
    (functions of) data.

3
Figure 6.1 The relationship between probability
theory and statistical inference
4
6.1 Experimentation6.1.1 Samples
  • Population the set of all the possible
    observations available from a particular
    probability distribution.
  • Sample a subset of a population.
  • Random sample a sample where the elements are
    chosen at random from the population
  • A sample is desired to be representative of the
    population.
  • Types of observations numerical and categorical
    (nominal)
  • - numerical observation either integers or
    real numbers
  • - categorical observation a machine breakdown
    is classified as either mechanical, electrical or
    misuse.

5
6.1.2 Examples
  • Example 1 Machine breakdowns
  • Suppose that an engineer in charge of the
    maintenance of a machine keeps records on the
    breakdown causes over a period of a year.
  • Suppose that 46 breakdowns were observed by the
    engineer (see Figure 6.2).
  • What is the population from which this sample is
    drawn?
  • Factors to consider to check the representative
    of data
  • Quality of operators
  • Working load on the machine
  • Particularity of data observation (e.g., more
    rainy days than other years)

6
Figure 6.2 Data set of machine breakdowns
How representative is this years data set of
future years?
7
Figure 6.4Data set of defective computer chips
  • Example 2 Defective computer chips
  • The chip boxes are selected at random.

8
  • Points to check on data
  • What is the data type?
  • Are the data representative?
  • How is the randomness of the data realized?
  • Statistical problem
  • What is the population from which the data are
    sampled?

9
6.2 Data presentation
  • 6.2.1 Bar and Pareto charts
  • 6.2.2 Pie charts
  • 6.2.3 Histograms
  • 6.2.4 Outliers

10
Figure 6.7 Bar chart of machine breakdowns data
set
  • Bar chart uses bars to represent the data.
  • The length of a bar is proportional to the
    frequency

11
Figure 6.9 Pareto chart of customer complaints
for Internet company
  • Pareto chart is a bar chart where the categories
  • are arranged in order of decreasing frequency.

12
Figure 6.12Pie chart for machine breakdowns data
set
  • Pie Charts emphasize the proportion of each
    category.

13
Figure 6.14Histogram of computer chips data set
  • Histograms are used to represent numerical data.

14
Figure 6.16 Histograms of metal cylinder
diameter data set with different bandwidths
15
Figure 6.18 A histogram with positive (or right)
skewness
The right-hand tail is longer and flatter than
the left-hand tail
16
Figure 6.19 A histogram with negative (or left)
skewness
17
Figure 6.20 A histogram for a bimodal
distribution
18
Figure 6.21 Histogram of a data set with a
possible outlier
An outlier is an observation which is not from
the distribution from which the main body of the
sample is collected.
19
6.3 Sample statistics
  • 6.3.1 Sample mean of a data set
  • 6.3.2 Sample median
  • the value of the middle of the ordered data
    points
  • ex) if n 2k1 (odd), the sample median
  • if n 2k (even), the sample median
  • 6.3.3 Sample trimmed mean
  • A trimmed mean is obtained by deleting some of
    the largest and some of the smallest data
    observations.
  • Usually a 10 trimmed mean is employed where the
    top 10 of the data points are removed together
    with the bottom 10 of the data points.

20
Figure 6.22 Illustrative data set
21
Figure 6.23Relationship between the samplemean,
median, and trimmed meanfor positively and
negativelyskewed data sets
22
  • 6.3.4 Sample mode
  • For categorical or discrete data, the sample
    mode may be used to denote the category or data
    value that contains the largest number of data
    observations.
  • 6.3.5 Sample variance (s2)

23
  • 6.3.6 The 100p Sample Percentile (pth Sample
    Quantile) and sample quartile
  • The 100p sample percentile is the value such
    that at least 100p percent of the data are less
    than or equal to it and at least 100(1-p) percent
    are greater than or equal to it. If there are two
    values satisfying the condition, the 100p sample
    percentile is the arithmetic average of these two
    values.
  • a data set 2,5,6,7,8
  • - p 0.1, i.e., 10 percentile sample
    percentile is 2
  • - p 0.2, i.e., 20 percentile sample
    percentile is (25)/2 3.5
  • The 25 sample percentile the first quartile
  • The 50 sample percentile the second quartile
    or the sample median
  • The 75 sample percentile the third quartile

24
  • 6.3.7 Boxplots

Figure 6.24 Boxplot of a data set
25
  • 6..8. Coefficient of variation (CV)
  • the spread of the data relative to the middle
    value

26
  • Recall the Chebyshevs inequality
  • Let
  • Then,
  • In general,
  • Theorem the weak law of large numbers
  • Let be a sequence of i.i.d.
    random variables, each having mean and
    variance
  • Then, for any

27
  • (proof)

28
  • Homework 6
  • Read Chapter 6.
  • Review Chapter 1 Chapter 5.
  • Midterm Exam
  • Date 10.24 (Wed)
  • Time 900-1100
Write a Comment
User Comments (0)
About PowerShow.com