Chapters 2 and 3 - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

Chapters 2 and 3

Description:

Walgreens records the price of prescriptions. bought at their stores. ... Can Walgreens determine the mean cost of all prescriptions bought in the US? ... – PowerPoint PPT presentation

Number of Views:117
Avg rating:3.0/5.0
Slides: 66
Provided by: mickey9
Category:

less

Transcript and Presenter's Notes

Title: Chapters 2 and 3


1
Chapters 2 and 3
  • Descriptive Methods

2
  • After collecting our data, we want to
  • get a better understanding of its various
  • aspects.
  • Data can be described numerically or
  • graphically.

3
Numerically Descriptive Methods
  • Numerical Data sample mean, sample median,
    sample standard deviation, range, etc.
  • Categorical Data sample counts or sample
    proportion

4
Graphical Descriptive Methods
  • Numerical Data histogram, boxplot, dotplot, stem
    plot, etc.
  • Categorical Data barchart, pie chart, frequency
    tables

5
Describing Numerical Data
  • The center of the data can be described
  • by the sample mean , sample median or
  • the sample mode.
  • is the usual average and is the
    middle
  • number after sorting.
  • The mode is the number that occurs most often.

6
  • Suppose our data is
  • 4 1 9 2 5
  • (4 1 9 2 5)/5 21/5 4.2
  • the middle number after sorting 4.
  • If the sample size is even, the median is the
    average
  • of the 2 middle numbers.

7
  • Suppose that in the previous dataset,
  • 9 was misreported as 99. Then remains at
  • 4 but is now 22.2.
  • is more sensitive to unusual observations
  • known as outliers.

8
One of the marks is the sample mean and the other
is the sample median. Which one corresponds to
the green mark?
9
The mode
  • Ex 2 1 5 4 5 The mode is 5.
  • Ex 2 1 5 1 5 The mode is 1
    and 5.
  • Ex 2 1 3 8 4 There is no mode.

10
  • The sample standard deviation, s, is
  • a measure of how spread out the data is.
  • The sample variance is s2.
  • We could also use the range as a measure of
  • the variability.
  • Range Max - Min

11
As the points move away from the xbar (the mark
in the center), the standard deviation
increases. Note The range of the last 3 are
about the same. The range can stay the same
but the variance increase.
12
Sample Final Problem
13
Sample Final Problem
14
Sample Final Problem
15
Describing Categorical Data
  • To describe categorical data, there are
  • only 2 statistics of interests sample
  • counts and sample proportions.
  • Ex Suppose 1 out of 20 people have
  • gum disease. The sample count is 1 and
  • the sample proportion is 1/20.

16
Statistic vs. Parameter
  • A statistic is a quantity associated with
  • the sample and a parameter is a quantity
  • associated with the population.

17
(No Transcript)
18
(No Transcript)
19
Example
  • A company manufactures bricks. They are
  • interested in their mean breaking strength.
  • Can they determine the average breaking strength
    of 10 bricks?
  • Can they determine the mean breaking strength of
    all bricks produced?

20
Example
  • Walgreens records the price of prescriptions
  • bought at their stores.
  • Can Walgreens determine the mean cost of all
    prescriptions bought at Walgreens?
  • Can Walgreens determine the mean cost of all
    prescriptions bought in the US?

21
Example 1
  • Use the calculator to find the following
    statistics
  • for the data below
  • 10 15 5 22 38 51
  • sample mean, sample variance, sample
  • median, range

22
Example 2
  • Find the sample mean and sample standard
  • deviation for the following data
  • 2 2 2 2 2 5 5 5 7 7 7 7 8 8 8 8 8 8
  • To simplify putting the data into the calculator,
  • the next table will be useful.

23
  • Frequency refers to the number of times
  • each value occurs in the sample.

24
Example 3
  • Instead of knowing the
  • actual observations, we
  • only know the intervals
  • and the number of
  • observations in each.
  • Again, obtain the sample
  • mean and sample sd.

25
Graphically Describing Numerical Data
  • A histogram splits the data into intervals called
    bins or classes. The number (frequency) or
    percentage (relative frequency) of observations
    in each interval is recorded. This is the height
    of each bin.

26
Create a frequency histogram
  • Data
  • 1.2 1.8
  • 3.1 0.4
  • 0.2 4.8
  • 1.5 2.1
  • 2.9 3.7

27
Create a relative frequency histogram
  • Data
  • 1.2 1.8
  • 3.1 0.4
  • 0.2 4.8
  • 1.5 2.1
  • 2.9 3.7

28
The height of this bin is at approximately 18
which means there are 18 observations between 140
and 160.
These numbers on the vertical axis are all counts
which makes this a frequency histogram.
This bin ranges from 140 to 160.
29
Heights of Volcanoes
30
  • How many volcanoes are in the sample?
  • How many volcanoes are more than 8000 feet tall?
  • What percentage of the volcanoes are less than
    4000 feet tall?
  • How many volcanoes are between 4000 and 6000 feet
    tall?

31
Boxplots
  • The histogram on the
  • right has been split into 4
  • pieces so that each
  • consists of 25 of the
  • data.
  • These marks where each
  • piece is split is used to
  • create the boxplot.

32
(No Transcript)
33
The minimum (min) is approximately 11.
The maximum (max) Is approximately 35.
The second quartile (Q2) is approximately 20.
The third quartile (Q3) is approximately 27.
The first quartile (Q1) Is approximately 17.
These numbers are called the 5 number summary for
a boxplot.
34
Outliers
  • Outliers show up as
  • circles. In this case, it
  • is now the max.
  • This is the largest
  • observation that is
  • NOT an outlier.

35
Find the following
  • Q1
  • Q2
  • Q3
  • Range

Note The Interquartile Range (IQR) is Q3 Q1.
36
Shapes
  • The shape of the distribution of the data
  • can be classified in 3 ways
  • Skewed Left
  • Skewed Right
  • Symmetric

37
Skewed Right
  • Most of the data (perhaps 50 or so) is on the
    left and as you move to the right, the
    observations become more and more sparse.

38
Skewed Left
  • This is basically opposite of skewed right data.
    Most of the data is on the right and is more and
    more sparse as we move to the left.

39
Symmetric
  • For symmetric data, we expect the histogram and
    boxplot to be symmetric.
  • For the boxplot, we should see these distances
    being approximately equal.

40
Dot Plots
  • A dot plot places a dot for each observation.
  • For the dotplot above, approximately what is
  • sample size?
  • the sample median?
  • the range?

41
Stem Plots
Stems Leaves
  • For the stem plot on the
  • left, what is
  • the sample size?
  • range?
  • sample median?

42
Bar Chart
  • Approximately how many Toyotas are in the sample?
  • Can we all agree the shape is skewed left?

43
Pie Chart
  • If this is based on a sample of 250,
    approximately
  • how many say they are somewhat interested in
  • professional soccer?

44
Z-scores
  • A z-score for an observation x is defined as
  • You can use either the population or
  • sample quantities here. That is,

45
The z-score for 180 is (180-173.59)/19.46
0.329 and the z-score for 110 is
(110-173.59)/19.46 -3.26 110 is more standard
deviations from the mean than 180 is even though
the z-score is negative.
46
Example
  • A data set has a mean of 200 and a
  • standard deviation of 30. For a data value of
  • 245, what is the z-score?

47
Percentiles
60 of the distribution is shaded which means 40
remains unshaded.
60
40
This value is the 60th percentile, P60.
In general, the rth percentile is the value with
r of the data or distribution below it.
48
Finding the rth percentile
  • Example Find the 70th percentile of the
  • sample below.
  • 29 29 30 31 31 32 32 32 32 32
  • 32 33 33 33 33 34 34 34 34 36
  • 36 37 38 38 38 39 39 43
  • If the data is not already sorted as it is above,
  • do that first.

49
  • There are n28 observations.
  • The 70th percentile is found by
  • n(0.7) 28(0.7) 19.6
  • Since 19.6 is not a whole number, go up to
  • the next integer, 20. The 70th percentile is
    the
  • 20th number from the bottom, 36.

50
  • The 25th percentile is found by
  • n(0.25) 28(0.25) 7
  • Since this is a whole number, the 25th
  • percentile is found by averaging the values in
  • the 7th and 8th positions. That is, the 25th
  • percentile is (32 32)/2 32.

51
For the sample below
  • n 40
  • 32 33 38 39 40 41
  • 42 43 44 44 45 46
  • 46 47 48 48 49 53
  • 53 54 55 55 55 56
  • 58 58 59 59 60 61
  • 61 62 63 64 67 68
  • 68 69 72 74
  • Find the following percentiles
  • P13
  • P35

52
Normal Distribution
This distribution has mean 10 and standard
deviation 1.9.
This distribution has mean 2 and standard
deviation 3.
The mean is denoted by µ and the standard
deviation s.
53
Empirical Rule
  • For a data set having a distribution that is
  • approximately bell-shaped, the following 3
  • properties apply
  • About 68 of the data fall within 1 standard
    deviation of the mean.
  • About 95 of the data fall within 2 standard
    deviations of the mean.
  • About 99.7 of the data fall within 3 standard
    deviations of the mean.

54
Approximate Percentages
55
Since this data looks normal, we can use the
Empirical Rule to conclude that approximately 95
of the observations are between 173.59 -
2(19.46) 134.67 and 173.59 2(19.46)
212.51
56
  • Consider and .
  • The z-score for 10.63 is _____.
  • The z-score for 8.222 is ______.

57
  • What then are the z-scores for the following?
  • The z-score for is _____.
  • The z-score for is _____.
  • The z-score for is _____.
  • The z-score for is _____.
  • The z-score for is _____.
  • The z-score for is _____.

58
Example
  • Birth weights are approximately bell-shaped
  • with mean 3410 g and sd 520 g.
  • Approximately what percentage of the birth
    weights fall between 2370 and 4450 grams?
  • Between what 2 values will approximately 68 of
    the birth weights fall between?

59
Example
  • The length of time car owners keep their cars
  • is bell-shaped with mean 7.513 years and
  • standard deviation 2.47 years.
  • Approximately what percentage of car owners keep
    their cars between 5.043 and 9.983 years?
  • Between what 2 years do approximately 99.7 of
    car owners keep their cars?

60
Match the symbol to the word.
  • Average
  • Sample Size
  • Population Mean
  • Sample Mean
  • Sample Variance
  • Sample Std. Dev.
  • Population Variance
  • Mean

61
  • What remains are other types of graphs
  • you can obtain. I will let you read about these
  • on your own.
  • Histogram for discrete data
  • Frequency Polygon
  • Ogive Curve
  • Pareto Chart

62
Discrete Data
  • The only observations in the sample are
    1,2,3,4,5,6 and no others.
  • Notice that the numbers are in the middle of the
    intervals.

63
Frequency Polygon
  • Rather than having rectangles, theres a single
    point that represents the height at which the
    frequency occurs.
  • And then you draw lines from one height to the
    next.

64
Ogive (Pronounced oh-jive)
Approximately 12 of the numbers in the sample are
less than or equal to 2.
You could make rectangles as in a histogram if
you wanted to.
65
Pareto Chart
  • Put simply, a pareto chart is nothing more than a
    special bar chart.
  • Its for categorical data.
  • The bars are sorted in order of frequencies.
Write a Comment
User Comments (0)
About PowerShow.com