Chapter 6. Descriptive Statistics - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Chapter 6. Descriptive Statistics

Description:

Can we have information on the underlying probability distribution? ... 6.3.6 The 100p Sample Percentile (pth Sample Quantile) and sample quartile ... – PowerPoint PPT presentation

Number of Views:165

Avg rating:3.0/5.0

Slides: 29

Provided by: queueK

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 6. Descriptive Statistics

1
Chapter 6. Descriptive Statistics

6.1 Experimentation
6.2 Data Presentation
6.3 Sample Statistics
6.4 Examples

Data a mixture of nature and noise.
Is the noise manageable?
The noise is desired to be represented by a
probability distribution.
Statistical inference
The science of deducing properties of an
underlying probability distribution from data
Can we have information on the underlying
probability distribution?
The information is given in the form of
(functions of) data.

3
Figure 6.1 The relationship between probability
theory and statistical inference
4
6.1 Experimentation6.1.1 Samples

Population the set of all the possible
observations available from a particular
probability distribution.
Sample a subset of a population.
Random sample a sample where the elements are
chosen at random from the population
A sample is desired to be representative of the
population.
Types of observations numerical and categorical
(nominal)
- numerical observation either integers or
real numbers
- categorical observation a machine breakdown
is classified as either mechanical, electrical or
misuse.

5
6.1.2 Examples

Example 1 Machine breakdowns
Suppose that an engineer in charge of the
maintenance of a machine keeps records on the
breakdown causes over a period of a year.
Suppose that 46 breakdowns were observed by the
engineer (see Figure 6.2).
What is the population from which this sample is
drawn?
Factors to consider to check the representative
of data
Quality of operators
Working load on the machine
Particularity of data observation (e.g., more
rainy days than other years)

6
Figure 6.2 Data set of machine breakdowns
How representative is this years data set of
future years?
7
Figure 6.4Data set of defective computer chips

Example 2 Defective computer chips
The chip boxes are selected at random.

Points to check on data
What is the data type?
Are the data representative?
How is the randomness of the data realized?
Statistical problem
What is the population from which the data are
sampled?

9
6.2 Data presentation

6.2.1 Bar and Pareto charts
6.2.2 Pie charts
6.2.3 Histograms
6.2.4 Outliers

10
Figure 6.7 Bar chart of machine breakdowns data
set

Bar chart uses bars to represent the data.
The length of a bar is proportional to the
frequency

11
Figure 6.9 Pareto chart of customer complaints
for Internet company

Pareto chart is a bar chart where the categories
are arranged in order of decreasing frequency.

12
Figure 6.12Pie chart for machine breakdowns data
set

Pie Charts emphasize the proportion of each
category.

13
Figure 6.14Histogram of computer chips data set

Histograms are used to represent numerical data.

14
Figure 6.16 Histograms of metal cylinder
diameter data set with different bandwidths
15
Figure 6.18 A histogram with positive (or right)
skewness
The right-hand tail is longer and flatter than
the left-hand tail
16
Figure 6.19 A histogram with negative (or left)
skewness
17
Figure 6.20 A histogram for a bimodal
distribution
18
Figure 6.21 Histogram of a data set with a
possible outlier
An outlier is an observation which is not from
the distribution from which the main body of the
sample is collected.
19
6.3 Sample statistics

6.3.1 Sample mean of a data set
6.3.2 Sample median
the value of the middle of the ordered data
points
ex) if n 2k1 (odd), the sample median
if n 2k (even), the sample median
6.3.3 Sample trimmed mean
A trimmed mean is obtained by deleting some of
the largest and some of the smallest data
observations.
Usually a 10 trimmed mean is employed where the
top 10 of the data points are removed together
with the bottom 10 of the data points.

20
Figure 6.22 Illustrative data set
21
Figure 6.23Relationship between the samplemean,
median, and trimmed meanfor positively and
negativelyskewed data sets
22

6.3.4 Sample mode
For categorical or discrete data, the sample
mode may be used to denote the category or data
value that contains the largest number of data
observations.
6.3.5 Sample variance (s2)

6.3.6 The 100p Sample Percentile (pth Sample
Quantile) and sample quartile
The 100p sample percentile is the value such
that at least 100p percent of the data are less
than or equal to it and at least 100(1-p) percent
are greater than or equal to it. If there are two
values satisfying the condition, the 100p sample
percentile is the arithmetic average of these two
values.
a data set 2,5,6,7,8
- p 0.1, i.e., 10 percentile sample
percentile is 2
- p 0.2, i.e., 20 percentile sample
percentile is (25)/2 3.5
The 25 sample percentile the first quartile
The 50 sample percentile the second quartile
or the sample median
The 75 sample percentile the third quartile