Chapters 2 and 3 - PowerPoint PPT Presentation

1 / 65

About This Presentation

Title:

Chapters 2 and 3

Description:

Walgreens records the price of prescriptions. bought at their stores. ... Can Walgreens determine the mean cost of all prescriptions bought in the US? ... – PowerPoint PPT presentation

Number of Views:117

Avg rating:3.0/5.0

Slides: 66

Provided by: mickey9

Category:

more less

Transcript and Presenter's Notes

Title: Chapters 2 and 3

1
Chapters 2 and 3

Descriptive Methods

After collecting our data, we want to
get a better understanding of its various
aspects.
Data can be described numerically or
graphically.

3
Numerically Descriptive Methods

Numerical Data sample mean, sample median,
sample standard deviation, range, etc.
Categorical Data sample counts or sample
proportion

4
Graphical Descriptive Methods

Numerical Data histogram, boxplot, dotplot, stem
plot, etc.
Categorical Data barchart, pie chart, frequency
tables

5
Describing Numerical Data

The center of the data can be described
by the sample mean , sample median or
the sample mode.
is the usual average and is the
middle
number after sorting.
The mode is the number that occurs most often.

Suppose our data is
4 1 9 2 5
(4 1 9 2 5)/5 21/5 4.2
the middle number after sorting 4.
If the sample size is even, the median is the
average
of the 2 middle numbers.

Suppose that in the previous dataset,
9 was misreported as 99. Then remains at
4 but is now 22.2.
is more sensitive to unusual observations
known as outliers.

8
One of the marks is the sample mean and the other
is the sample median. Which one corresponds to
the green mark?
9
The mode

Ex 2 1 5 4 5 The mode is 5.
Ex 2 1 5 1 5 The mode is 1
and 5.
Ex 2 1 3 8 4 There is no mode.

The sample standard deviation, s, is
a measure of how spread out the data is.
The sample variance is s2.
We could also use the range as a measure of
the variability.
Range Max - Min

11
As the points move away from the xbar (the mark
in the center), the standard deviation
increases. Note The range of the last 3 are
about the same. The range can stay the same
but the variance increase.
12
Sample Final Problem
13
Sample Final Problem
14
Sample Final Problem
15
Describing Categorical Data

To describe categorical data, there are
only 2 statistics of interests sample
counts and sample proportions.
Ex Suppose 1 out of 20 people have
gum disease. The sample count is 1 and
the sample proportion is 1/20.

16
Statistic vs. Parameter

A statistic is a quantity associated with
the sample and a parameter is a quantity
associated with the population.

17
(No Transcript)
18
(No Transcript)
19
Example

A company manufactures bricks. They are
interested in their mean breaking strength.
Can they determine the average breaking strength
of 10 bricks?
Can they determine the mean breaking strength of
all bricks produced?

20
Example

Walgreens records the price of prescriptions
bought at their stores.
Can Walgreens determine the mean cost of all
prescriptions bought at Walgreens?
Can Walgreens determine the mean cost of all
prescriptions bought in the US?

21
Example 1

Use the calculator to find the following
statistics
for the data below
10 15 5 22 38 51
sample mean, sample variance, sample
median, range

22
Example 2

Find the sample mean and sample standard
deviation for the following data
2 2 2 2 2 5 5 5 7 7 7 7 8 8 8 8 8 8
To simplify putting the data into the calculator,
the next table will be useful.

Frequency refers to the number of times
each value occurs in the sample.

24
Example 3

Instead of knowing the
actual observations, we
only know the intervals
and the number of
observations in each.
Again, obtain the sample
mean and sample sd.

25
Graphically Describing Numerical Data

A histogram splits the data into intervals called
bins or classes. The number (frequency) or
percentage (relative frequency) of observations
in each interval is recorded. This is the height
of each bin.

26
Create a frequency histogram

Data
1.2 1.8
3.1 0.4
0.2 4.8
1.5 2.1
2.9 3.7

27
Create a relative frequency histogram

Data
1.2 1.8
3.1 0.4
0.2 4.8
1.5 2.1
2.9 3.7

28
The height of this bin is at approximately 18
which means there are 18 observations between 140
and 160.
These numbers on the vertical axis are all counts
which makes this a frequency histogram.
This bin ranges from 140 to 160.
29
Heights of Volcanoes
30

How many volcanoes are in the sample?
How many volcanoes are more than 8000 feet tall?
What percentage of the volcanoes are less than
4000 feet tall?
How many volcanoes are between 4000 and 6000 feet
tall?

31
Boxplots

The histogram on the
right has been split into 4
pieces so that each
consists of 25 of the
data.
These marks where each
piece is split is used to
create the boxplot.

32
(No Transcript)
33
The minimum (min) is approximately 11.
The maximum (max) Is approximately 35.
The second quartile (Q2) is approximately 20.
The third quartile (Q3) is approximately 27.
The first quartile (Q1) Is approximately 17.
These numbers are called the 5 number summary for
a boxplot.
34
Outliers

Outliers show up as
circles. In this case, it
is now the max.
This is the largest
observation that is
NOT an outlier.

35
Find the following

Q1
Q2
Q3
Range

Note The Interquartile Range (IQR) is Q3 Q1.
36
Shapes

The shape of the distribution of the data
can be classified in 3 ways
Skewed Left
Skewed Right
Symmetric

37
Skewed Right

Most of the data (perhaps 50 or so) is on the
left and as you move to the right, the
observations become more and more sparse.

38
Skewed Left

This is basically opposite of skewed right data.
Most of the data is on the right and is more and
more sparse as we move to the left.

39
Symmetric

For symmetric data, we expect the histogram and
boxplot to be symmetric.
For the boxplot, we should see these distances
being approximately equal.

40
Dot Plots

A dot plot places a dot for each observation.
For the dotplot above, approximately what is
sample size?
the sample median?
the range?

41
Stem Plots
Stems Leaves

For the stem plot on the
left, what is
the sample size?
range?
sample median?

42
Bar Chart

Approximately how many Toyotas are in the sample?
Can we all agree the shape is skewed left?

43
Pie Chart

If this is based on a sample of 250,
approximately
how many say they are somewhat interested in
professional soccer?

44
Z-scores

A z-score for an observation x is defined as
You can use either the population or
sample quantities here. That is,

45
The z-score for 180 is (180-173.59)/19.46
0.329 and the z-score for 110 is
(110-173.59)/19.46 -3.26 110 is more standard
deviations from the mean than 180 is even though
the z-score is negative.
46
Example

A data set has a mean of 200 and a
standard deviation of 30. For a data value of
245, what is the z-score?

47
Percentiles
60 of the distribution is shaded which means 40
remains unshaded.
60
40
This value is the 60th percentile, P60.
In general, the rth percentile is the value with
r of the data or distribution below it.
48
Finding the rth percentile

Example Find the 70th percentile of the
sample below.
29 29 30 31 31 32 32 32 32 32
32 33 33 33 33 34 34 34 34 36
36 37 38 38 38 39 39 43
If the data is not already sorted as it is above,
do that first.

There are n28 observations.
The 70th percentile is found by
n(0.7) 28(0.7) 19.6
Since 19.6 is not a whole number, go up to
the next integer, 20. The 70th percentile is
the
20th number from the bottom, 36.

The 25th percentile is found by
n(0.25) 28(0.25) 7
Since this is a whole number, the 25th
percentile is found by averaging the values in
the 7th and 8th positions. That is, the 25th
percentile is (32 32)/2 32.

51
For the sample below

n 40
32 33 38 39 40 41
42 43 44 44 45 46
46 47 48 48 49 53
53 54 55 55 55 56
58 58 59 59 60 61
61 62 63 64 67 68
68 69 72 74

Find the following percentiles
P13
P35

52
Normal Distribution
This distribution has mean 10 and standard
deviation 1.9.
This distribution has mean 2 and standard
deviation 3.
The mean is denoted by µ and the standard
deviation s.
53
Empirical Rule

For a data set having a distribution that is
approximately bell-shaped, the following 3
properties apply
About 68 of the data fall within 1 standard
deviation of the mean.
About 95 of the data fall within 2 standard
deviations of the mean.
About 99.7 of the data fall within 3 standard
deviations of the mean.

54
Approximate Percentages
55
Since this data looks normal, we can use the
Empirical Rule to conclude that approximately 95
of the observations are between 173.59 -
2(19.46) 134.67 and 173.59 2(19.46)
212.51
56

Consider and .
The z-score for 10.63 is _____.
The z-score for 8.222 is ______.

What then are the z-scores for the following?
The z-score for is _____.
The z-score for is _____.
The z-score for is _____.
The z-score for is _____.
The z-score for is _____.
The z-score for is _____.

58
Example

Birth weights are approximately bell-shaped
with mean 3410 g and sd 520 g.
Approximately what percentage of the birth
weights fall between 2370 and 4450 grams?
Between what 2 values will approximately 68 of
the birth weights fall between?

59
Example

The length of time car owners keep their cars
is bell-shaped with mean 7.513 years and
standard deviation 2.47 years.
Approximately what percentage of car owners keep
their cars between 5.043 and 9.983 years?
Between what 2 years do approximately 99.7 of
car owners keep their cars?

60
Match the symbol to the word.

Average
Sample Size
Population Mean
Sample Mean
Sample Variance
Sample Std. Dev.
Population Variance
Mean

What remains are other types of graphs
you can obtain. I will let you read about these
on your own.
Histogram for discrete data
Frequency Polygon
Ogive Curve
Pareto Chart

62
Discrete Data

The only observations in the sample are
1,2,3,4,5,6 and no others.
Notice that the numbers are in the middle of the
intervals.

63
Frequency Polygon

Rather than having rectangles, theres a single
point that represents the height at which the
frequency occurs.
And then you draw lines from one height to the
next.

64
Ogive (Pronounced oh-jive)
Approximately 12 of the numbers in the sample are
less than or equal to 2.
You could make rectangles as in a histogram if
you wanted to.
65
Pareto Chart