Title: Business Statistics for Managerial Decision Making
1Business Statistics for Managerial Decision
Making
2Introduction
- Descriptive Statistics
- Methods that organize and summarize data aid in
effective presentation and increased
understanding. - Bar charts, tabular displays, various plots of
economic data, averages and percentages. - Often the individuals or objects studied by an
investigator come from a much larger collection,
and the researchers interest goes beyond just
data summarization.
3Introduction
- Population
- The entire collection of individuals or objects
about which information is desired. - Sample
- A subset of the population selected in some
prescribed manner for study.
4Introduction
- Inferential Statistics
- Involves generalizing from a sample to the
population from which it was selected. - This type of generalization involves some risk,
since a conclusion about the population will be
reached based on the basis of available, but
incomplete, information. - An important aspect in the development of
inference techniques involves quantifying the
associated risks.
5Individuals and variables
- Individuals
- are the objects described by a set of data.
- They may be people, but they may also be
business firms, common stocks, or other objects. - A Variable
- is any characteristic of an individual.
- A variable can take different values for
different individuals.
6Categorical Quantitative Variables
- A Categorical Variable places an individual into
one of several groups or categories. - A Quantitative Variable takes numerical values
for which arithmetic operations such as adding
and averaging make sense. - The distribution of a variable tell us what
values it takes and how often it takes these
values.
7Example
8Example
9Discrete and Continuous Variable
- With numerical data (quantitative variables), it
is useful to make a further distinction. - Numerical data is discrete if the possible values
are isolated points on the number line. - Numerical data is continuous if the set of
possible values form an entire interval on the
number line.
10Stem plot
- To make a stem plot
- Separate each observation into a stem consisting
of all but the final (rightmost) digit and a
leaf, the final digit. Stems may have as many
digits as needed, but each leaf contains only a
single digit. - Write the stems in a vertical column with the
smallest at the top, and draw a vertical line at
the right of this column. - Write each leaf in the row to the right of its
stem, in increasing order out from the stem.
11Stem plot
12Frequency Distribution
- A frequency distribution for categorical data is
a table that displays the categories,
frequencies, and relative frequencies. - Example
- The increasing emphasis on exercise has resulted
in an increase of sport related injuries. A
listing of the 82 sample observations would look
something like this - F, Sp, Sp, Co, F, L, F, Ch, De, L, Sp, Di, St,
Cn,
13Frequency Distribution
- The following coding is used
- Sp Sprain, St Strain, Di dislocation,
- Co Contusion, L laceration,
- Cn Concussion, F fracture,
- Ch chronic, De dental
14Frequency Distribution
15Bar Graph
16Pie Chart
17Frequency Distribution for Discrete Numerical
Data
- Discrete numerical data almost always results
from counting. - In such cases, each observation is a whole
number. - For example, if the possible values are 0, 1, 2,
3, , then these are listed in column, and a
running tally is kept as a single pass is made
through the data
18Frequency Distribution for Discrete Numerical Data
- Example
- A sample of 708 bus drivers employed by public
corporations was selected, and the number of
traffic accidents in which each was involved
during a 4-year period was determined. A listing
of the 708 sample observations would look
something like this - 3, 0, 6, 0, 0, 2, 1, 4, 1,
19Frequency Distribution for Discrete Numerical Data
20Bar Graph
21Frequency Distributions for Continuous Data
- The difficulty with continuous data, such as
observations on the unemployment rate by state,
is that there is no natural categories. - Therefore we define our own categories. by
marking off some intervals on horizontal
unemployment rate axis as picture below. - 1.00 9.00
22Frequency Distributions for Continuous Data
- If the smallest rate were 1.5, and the largest
was 8.9, we might use the intervals of width 1
with the first one starting at 1 and the last one
ending at 9. - Each data value should fall in exactly one of
these intervals.
23Frequency Distributions for Continuous Data
24Frequency Distributions for Continuous Data
25Histograms
- Mark the boundaries of the class intervals on a
horizontal axis. - Draw a vertical scale marked with either relative
frequencies or frequencies. - The rectangle corresponding to a particular
interval is drawn directly above the interval. - The height of each rectangle is then the class
frequency or relative frequency.
26Histograms
27Histograms
28Examining a Distribution
- In any graph of data, look for overall pattern
and for striking deviation from that pattern. - You can describe the overall pattern of a
histogram by its shape, center, and spread. - An important kind of deviation is an outlier, an
individual value that falls outside the overall
pattern.
29Symmetric Skewed Distribution
- A distribution is symmetric if the right and left
sides of the histogram are approximately mirror
images of each other. - A distribution is skewed to the right if the
right side of the histogram ( containing the half
of the observations with larger values) extends
much farther out than the left side. - It is skewed to the left if the left side of the
histogram extends much farther out than the right
side.
30Symmetric Distribution
31Skewed to the Right
32Symmetric Distribution
33Numerical Summary Measures
- Describing the center of a data set.
- Mean
- Median
- Describing the variability in a data set.
- Variance, standard deviation
- Quartiles
34The Mean
- To find the mean of a set of observations, add
their values and divide by the number of
observations. If the n observations are
, their mean is - In a more compact notation,
-
-
35The Median
- The Median M is the midpoint of a distribution,
the number such that half of the observations are
smaller and the other half are larger. To find
the median of a distribution - Arrange all observations in order of size, from
smallest to largest. - If the number of observations n is odd, the
median M is the center observation in the ordered
list. - If the number of observations n is even, the
median M is the mean of the two center
observations in the ordered list.
36The Quartiles Q1 and Q3
- To calculate the quartiles
- Arrange the observations in increasing order and
locate the median M in the ordered list of
observations. - The first quartile Q1 is the median of the
observations whose position in the ordered list
is to the left of the location of the overall
median. - The third quartile Q3 is the median of the
observations whose position in the ordered list
is to the right of the location of the overall
median.
37The Five Number Summary and Box-Plot
- The five number summary of a distribution
consists of the smallest observation, the first
quartile, the median, the third quartile, and the
largest observation, written in order from
smallest to largest. In symbols, the five number
summary is - Minimum Q1 M Q3 Maximum
38The Five Number Summary and Box-Plot
- A box-plot is a graph of the five number Summary.
- A central box spans the quartiles.
- A line in the box marks the median.
- Lines extend from the box out to the smallest and
largest observations. - Box-plots are most useful for side-by-side
comparison of several distributions.
39Example
40The Standard Deviation s
- The Variance s2 of a set of observations is the
average of the squares of the deviations of the
observations from their mean. In symbols, the
variance of n observations is -
- or, more compactly,
-
-
41The Standard Deviation s
- The standard deviation s is the square root of
the variance s2 -
42Choosing a Summary
- The five number summary is usually better than
the mean and standard deviation for describing a
skewed distribution or a distribution with
extreme outliers. Use , - and s only for reasonably symmetric
distributions that are free of outliers.
43Strategies for Exploring Data
- Plot the data
- Make a graph, usually a histogram or a stem-plot.
- Look at the distribution of the variable for
- overall pattern (shape, center, spread).
- striking deviations such as outliers.
- Calculate a numerical summary to briefly describe
center and spread. - Describe the overall pattern with a smooth curve.
44Density Curves
- Sometimes the overall pattern (the distribution
of the variable) of a large number of
observations is so regular that we can describe
it by a smooth curve, called Density curve. - The curve is a mathematical model for the
distribution.
45Density Curve
- Histogram of the city gas mileage (miles per
gallon) of 856, 2001 model year motor vehicle. - The smooth curve, density curve, shows the
overall shape of the distribution.
46Density Curve
- The proportion of cars with gas mileage less
than 20 from the histogram is -
-
47Density Curve
- The proportion of cars with gas mileage less than
20 from the density curve is .410 - The area under the density curve gives a good
approximation of areas given by histogram.
48Density Curve
- A density curve is a curve that
- Is always on or above the horizontal axis.
- Has area exactly 1 underneath it.
- A density curve describes the overall pattern of
a distribution. - The area under the curve and above any range of
values is the proportion of all observations that
fall in that range.
49Median and mean of a Density Curve
- The median of a density curve is the point that
divides the area under the curve in Half.
50Median and Mean of a Density Curve
- The mean of a density curve is the balance point,
at which the curve would balance if made of solid
material.
51Median and Mean of a Density Curve
- The median and mean are the same for a symmetric
density curve. - They both are at the center of the curve.
52Median and Mean of a Density Curve
- The mean of a skewed curve is pulled away from
the median in the direction of the long tail.
53Normal Density Curve
- These density curves, called normal curves, are
- Symmetric
- Single peaked
- Bell shaped
- Normal curves describe normal distributions.
54Normal Density Curve
- The exact density curve for a particular normal
distribution is described by giving its mean ?
and its standard deviation ?. - The mean is located at the center of the
symmetric curve and it is the same as the median. - The standard deviation ? controls the spread of a
normal curve.
55Normal Density Curve
56The 68-95-99.7 Rule
- Although there are many normal curve, They all
have common properties. In particular, all Normal
distributions obey the following rule. - In a normal distribution with mean ? and standard
deviation ? - 68 of the observations fall within ? of the mean
?. - 95 of the observations fall within 2? of ?.
- 99.7 of the observations fall within 3? of ?.
57The 68-95-99.7 Rule
58The 68-95-99.7 Rule
59Standard Normal Distribution
- The standard Normal distribution is the Normal
distribution N(0, 1) with mean - ? 0 and standard deviation ? 1.
60The standard Normal Table
- What is the area under the standard normal curve
between z 0 and z 2.3? - Compact notation
-
- P .9893 - .5 .4893
61Finding the area under a normal curve
- State the problem in terms of the observed
variable x. - Standardize x to restate the problem in terms of
a standard normal variable z - Draw a picture to show the area under the
standard Normal curve. - Find the required area under the standard Normal
curve Using table A and the fact that the total
area under the curve is 1.
62Example
- The annual rate of return on stock indexes
(which combine many individual stocks) is
approximately Normal. Since 1954, the Standard
Poors 500 stock index has had a mean yearly
return of about 12, with standard deviation of
16.5. Take this Normal distribution to be the
distribution of yearly returns over a long
period. The market is down for the year if the
return on the index is less than zero. In what
proportion of years is the market down?
63Example
- State the problem
- Call the annual rate of return for Standard
Poors 500-stocks Index x. The variable x has the
N(12, 16.5) distribution. We want the proportion
of years with - X lt 0.
- Standardize
- Subtract the mean, then divide by the standard
deviation, to turn x into a standard Normal z -
64Example
- Draw a picture to show the standard normal curve
with the area of interest shaded. - Use the table
- The proportion of observations less than
- - 0.73 is .2327.
- The market is down on an annual basis about
23.27 of the time.
65Example
- What percent of years have annual return between
12 and 50? - State the problem
- Standardize
-
-
66Example
- Draw a picture.
- Use table.
- The area between 0 and 2.30 is the area below
2.30 minus the area below 0. - 0.9893- .50 .4893
67Finding a Value when Given a Proportion
- Sometimes we may want to find the observed value
with a given proportion of observations above or
below it. - To do this, use table A backward. Find the given
proportion in the body of the table, read the
corresponding z from the left column and top row,
then unstandardize to get the observed value.
68Example
- Miles per gallon ratings of compact cars (2001
model year) follow approximately the N(25.7,
5.88) distribution. How many miles per gallon
must a vehicle get to place in the top 10 of all
2001 model year compact cars?
69Example
- We want to find the miles per gallon rating x
with area 0.1 to its right under the Normal Curve
with mean 25.7 and standard deviation 5.88. That
is the same as finding the miles per gallon
rating x with area 0.9 to its left.
70Example
- Look in the body of Table A for the entry closest
to 0.9. It is 0.8997. This is the entry
corresponding to z 1.28.
71Example
- Unstandardize to transform the solution from the
z back to the original x scale. -
72Standard Normal Distribution
- If a variable x has any normal distribution N(?,
?) with mean ? and standard deviation ?, then the
standardized variable - has the standard Normal distribution.
- This standardized value is often called z-score.
-
-
73The standard Normal Table
- Table A is a table of area under the standard
Normal curve. The table entry for each value z is
the area under the curve to the left of z. - Or you can use the applet at the following site.
- http/www.stat.sc.eduwest/applets/normaldemo.html
74The standard Normal Table
- What is the area under the standard normal curve
to the right of - z - 2.15?
- Compact notation
-
- P 1 - .0158 .9842
-