Title: Business Statistics August 2003
1Business StatisticsAugust 2003
- Professor Vijay Mehrotra
- San Francisco State University
2Whats the Deal With the Prof?
- New to SFSU Faculty This Year After Long Business
Career - Extensive Early Professional Experience
- Caddy, security guard, phone operator, smuggler,
busboy, waiter, and shoe shine guy - M.S. and Ph.D. in Operations Research From
Stanford University - Possible Investigation?
- Years of consulting
- Best way to avoid schoolwork
- Also known for magazine column
- Was It Something I Said
- http//www.lionhrtpub.com/ORMS.shtml
3Whats the Deal With the Prof?
- Professional Career Summary
- 1987 1992 Grad Student, Consultant
- Clients IBM, HP, National Semiconductor, PGE
- 1993 1994 Consultant, DFI
- Developing large software models for
transportation operations analysis - 1994 2002 CEO, Onward Inc.
- Operations management consulting firm
- Grew from 3 founders to 28 total staff
- 2002 - 2003 VP, Blue Pumpkin Software
- Our products focus on forecasting and scheduling
people in call center operations - 2003 - Faculty, San Francisco State
- Very happy to be here!
4My Teaching Approach
- The Big Ideas
- Key Terms
- Concepts
- Powerpoint, Blackboard
- Examples
- Blackboard, Excel
- Allow Time For QA,
- Working Through Good Problems
5My Expectations of You
- Before Class
- Do the recommended reading for this class
- Work on recommended homework from last class
- During Class
- Come to class - be on time
- Take notes, ask questions
- NO CELL PHONES
- After Class
- Reading, Problems
- Stay Current on Material
- This really helps!
- ABSOLUTELY No Cheating
- All infractions will be dealt with very harshly
6Outline of Course Topics
- Overview, Variability, Data, Measures, and Graphs
(2 weeks) - Text Chapters 1-3
- Probability Concepts and Distributions (4 Weeks)
- Sampling and Estimation (2 weeks)
- Hypothesis Testing (3 weeks)
- Regression Analysis (3 weeks)
7Section 1Overview, Variability, Data, Measures,
and Graphs
8The Big Picture for This Class
Ignorance
Uncertainty
Risk
Certainty
9The Big Ideas Section 1
- There is more data being gathered than ever
- but the world is full of variability
uncertainty - What to try to control/influence?
- What to try to understand?
- In business, we use Probability and Statistics
to - Reduce confusion, deal with complexity,
understand uncertainty ? increased business
opportunity - We begin with some basic ways of looking at data
- Statistical measures, graphical views
- Learn how to defend against statistical liars
- Calculators and spreadsheets are everywhere
- so deal with it!
10Outline Section 1Getting Into It
- Uncertainty and Variability
- Data and Why We Care About It
- Refresher on Summation
- Standard Statistical Measures
- Graphical Views of Data
11Confusing, Complex, Uncertain
- What is the chance that Vijay will make it from
home to SFSU in less than 30 minutes today?
Every day this week?
12Confusing, Complex, Uncertain
- VARIABILITY Controllable
- What can we control (or influence)?
- Choose whether to drive, BART, or bus
- Choose time of day
- Get to the bus or BART stop on time
13Confusing, Complex, Uncertain
- VARIABILITY - Uncontrollable
- What is out of our hands?
- Driver, Vehicle, System State
- Red Lights, Other Drivers Behavior
- Weather, Number of Other Passengers
14Confusing, Complex, Uncertain
- DEMAND Our factory will produce 100,000 units
next month. How likely are we to sell all of
them? - SUPPLY We have estimated demand from each of
our distributors. How many units should we
produce? Will it change if we change our price? - PRICE If we increase our prices by 10, what is
the overall effect on our revenues?
15Confusing, Complex, Uncertain
- Increases in
- Competition
- Speed of Change
- Speed of Response
- International Trade
- Outsourcing
- Customization
- Systems and Efficiency
- Bottom Line
- Increased Pressure on Business
- Increased Emphasis on Reducing Controllable
Variability - Increased Uncontrollable Variability
16Outline Section 1Getting Into It
- Uncertainty and Variability
- Data and Why We Care About It
- Refresher on Summation
- Standard Statistical Measures
- Graphical Views of Data
17Basic Questions About Data
- What is data ?
- Where do you get it?
- What is it good for?
18What is Data?
- Data Warehousing Institute Definition
- Facts, numbers, or text that can be processed to
produce information, usually through a calculator
or computer. - Textbook Definition
- Uh, well, its data, you know
- Vijays Definition
- Numerical input for analysis
- NOTE 90 of the worlds data is held in an
unstructured fashion. What do you think of this?
19What Does Data Come From?
20Why Care About Data and Statistics?
Ignorance
Uncertainty
Risk
Data typically has both a cost and a value.
Which one is greater?
Certainty
21Many People Work Hard to Get Good Data
- Customer Relationship Management
- Sales Force Performance Tracking
- Call Center Data Collection
- Operations
- Sales Order Processing
- Bill of Materials
- Order Status Tracking
- Government
- Census Bureau
- Consumer Price Index
22Many People Work Hard to Get Good Data
- NIELSEN PEOPLE METER is programmed with the age
and gender of each household member. Viewers
enter their code when they begin watching
visitors can log their presence as well. The
meter records which channels are tuned by sensing
the frequencies emitted by the cable box, TV or
videocassette recorder. - EVERY DAY, in some 5,000 homes throughout the
U.S., People Meters gather data on who watched
what, when and for how long. - AT STAGGERED TIMES throughout the night, all the
meters call Nielsen's mainframe computer system
in Dunedin, Fla., and transfer their daily
viewing records via modem. - BY MORNING, Nielsen has assembled and processed
its sample of the nation's viewing behavior. TV
executives and other subscribers can log in to
Nielsen's data network to learn which shows were
hits and which flopped. - VERY WEEK subscribers receive a detailed report
chronicling how many Nielsen household viewers
were watching television during any given quarter
hour and how specific programs fared against
their competition. - This COSTS A LOT OF MONEY to do.
Source Edgar W. Aust, senior vice president of
engineering and technology for Nielsen Media
Research in Dunedin, Fla.
23Neilsen Media Research
In 1936 engineer Arthur C. Nielsen, Sr., attended
a demonstration at the Massachusetts Institute of
Technology of a mechanical device that could keep
a record of the station to which a radio was
tuned at any given moment. Nielsen bought the
technology practically on the spot and six years
later launched the Nielsen Radio Index, which
analyzed the listening habits of 800 homes.
Later, he adapted the same technology to the new
medium of television, creating a ratings system
that nearly all American broadcasters use today
to help determine the popularity of their
programs. Over the years, Nielsen Media Research
has used several methods to collect viewing
information, including surveys and volunteer
diaries. In 1986 the company supplanted these
with an electronic device called a People Meter.
The meter is now connected to televisions and
telephone lines in about 5,000 households
throughout the U.S. Nielsen households are
selected from a sample that is statistically
representative of the television-viewing
population. Each household receives nominal
compensation--about 50 and occasional gifts--for
their cooperation. In order to keep the sample
representative, viewers can participate for only
two years. As they watch TV, volunteers press
buttons to indicate their presence. The People
Meter records the gender and age of each viewer,
as well as the time spent watching each channel
frequency. Every night the device transmits that
household's data by modem to Nielsen's central
computer in Florida, which assembles the data
into a ratings database. To meet the changing
needs of broadcasters and sponsors, the
technology continues to evolve. In 1986 Nielsen
introduced a system that uses computerized
pattern recognition to identify particular
commercials as they are broadcast. Future
versions of the People Meter now under
development will monitor codes embedded into
digital TV signals to verify which programs are
on the air. They will also use image-recognition
computers to identify viewers the moment they hit
the couch. Source EDGAR W. AUST, senior vice
president of engineering and technology for
Nielsen Media Research in Dunedin, Fla.
24Classifying Data Types
- Discrete Values can be represented as separate,
distinct points on a number line - Number of customer visits to a store
- Number of shares traded in a day
- Continuous Possible values represented as a
continuum on a number line - Weight of a shipment
- Height of the players on an NBA team
- Time spent manufacturing a product
25Classifying Data Types
- Nominal Data Numbers that label qualitative
differences - Citizenship Variable
- 1 US Citizen 2 Foreign Citizen
- Ordinal Data Assigned numbers that indicate
rank order - Example Grade Points
26Classifying Data Types
- Interval Data -- Intervals between numbers can be
compared, but not ratios - Calendar Years, Fahrenheit Temperatures
- Ratio Data -- Ratios and Intervals can be
compared in a meaningful way - Height, weight, length, time
27Sample Data
28From Data to Statistics
Ignorance
Uncertainty
Risk
Certainty
29Outline Section 1Getting Into It
- Uncertainty and Variability
- Data and Why We Care About It
- Refresher on Summation
- Standard Statistical Measures
- Graphical Views of Data
30Quick Refresher Subscripts and Summations
- We will often deal with a list of measurements or
observations. A subscript identifies a
particular observation in the list. - Examples X2, X7, W3
- A summation sign (S) indicates addition.
- Example S X means the sum of all the values
of X
31Quick RefresherRules of Summation
- S cX c S X
- S c nc
- S (X Y) S X S Y
- c a constant
- n total number of observations
- But note S XY does not equal S X S Y
- S X2 does not equal (S X)2
- S (X Y) 2 does not equal S X2 S Y2
-
32Quick RefresherApplying the Rules in Different
Ways
33Outline Section 1Getting Into It
- Uncertainty and Variability
- Data and Why We Care About It
- Refresher on Summation
- Standard Statistical Measures
- Graphical Views of Data
34How to Describe a Set of Data?
- One Variable
- Measures of Central Tendency
- What is average, typical, most likely,
normal, expected, common, predictable for
this group? - We are going to add one more salesperson to our
company. How much more revenue will we get? - Measures of Dispersion
- How spread out, dispersed, diffuse,
varied, different are these values? - Are all of our factories doing about the same or
are there significant differences? Why?
35How to Describe a Set of Data?
- One Variable
- Measures of Central Tendency
- What is average, typical, most likely,
normal, expected, common, predictable for
this group? - We are going to add one more salesperson to our
company. How much more revenue will we get? - Measures of Dispersion
- How spread out, dispersed, diffuse,
varied, different are these values from one
another? - Are all of our factories doing about the same or
are there significant differences? Why?
36Measures of Central Tendency
- Mean (or Average) S X / n
- Known as for sample.
- Known as m for population.
- Median the middle value, X(n1)/2
- Mode most frequently observed value
- (Note n is the sample size)
37Example of separate accounts that each
customer has with our bank Raw Data
2,1,6,2,3,3,7,5,2,4,5,4,6,6,7,
6,3,2,3,6,3,5,6,5,6,2,7,3 S X / n
120 / 28 4.29 accounts 1,2,2,2,2,2,3,3,3,3,3,3
,4,4,5,5,5,5,6,6,6,6,6,6,6,7,7,7
Median X(n1)/2 X14.5 4.5 accounts
Mode 6 accounts
38Measures of DispersionWhy Do We Care??
- Baseball Example
- Pennys Team Batting Average .290
- Joes Team Batting Average .290
- Who would you rather play against? Do you
know? Do you know? Do you know?
?
39Measures of DispersionWhy Do We Care??
?
40Measures of Dispersion
- VARIANCE
- Known as s2 for a sample
- Known as ?2 for population
- What does that mean??
or
(Note n is the sample size)
41Calculating the Variance
2
X
X -
( X - )
2
4.29
-2.29
5.22
1
4.29
-3.29
10.80
6
4.29
1.71
2.94
2
4.29
-2.29
5.22
3
4.29
-1.29
1.65
3
4.29
-1.29
1.65
7
4.29
2.71
7.37
5
4.29
0.71
0.51
2
4.29
-2.29
5.22
4
4.29
-0.29
0.08
5
4.29
0.71
0.51
4
4.29
-0.29
0.08
6
4.29
1.71
2.94
6
4.29
1.71
2.94
7
4.29
2.71
7.37
6
4.29
1.71
2.94
3
4.29
-1.29
1.65
2
4.29
-2.29
5.22
3
4.29
-1.29
1.65
6
4.29
1.71
2.94
3
4.29
-1.29
1.65
5
4.29
0.71
0.51
6
4.29
1.71
2.94
5
4.29
0.71
0.51
6
4.29
1.71
2.94
2
4.29
-2.29
5.22
7
4.29
2.71
7.37
3
4.29
-1.29
1.65
120
0
91.71
42Using the Computational Formula for Calculating
the Variance
43The Standard Deviation
- The standard deviation is the square root of the
variance. It is called s for a sample, or ? for
a population. - For the example s ?3.4 1.84
- One use of the standard deviation is the 3-Sigma
Rule. This rule says that it is very unusual to
find any observations in the data greater than
the mean plus 3 times s, and also any
observations less than the mean minus 3 times s. - GEs 6 Sigma Program
44Other Measures of Variation
- Range highest minus lowest value
- Example Range of ages in playground7 years
(oldest) - 1 year (youngest) 6 years - Mean Absolute Deviation (MAD)
The MAD is the average distance from the mean.
45Calculating the MAD
46Percentiles Somewhere BetweenCentral
Tendency and Variation
- Percentiles shows position of a value
- the pth percentile is the value such that at
least p of all values in the data set are at or
below it and at least (100-p) are at or above
it. - Arrange the data in ascending order.
- Compute a value i (p/100)n, where p is the
percentile to be calculated and n is the number
of data items. - If i is not an integer, round up. The next
integer greater than i is the subscript of the
pth percentile. - If i is an integer, then the pth percentile is
approximated by (XiXi1)/2
47Examples Calculating Percentiles
1,2,2,2,2,2,3,3,3,3,3,3,4,4,5,5,5,5,6,6,6,6,6,6,6,
7,7,7
Estimate the 75th percentile i (p/100)n
(75/100)(28) 21 (i is an integer) 75th
percentile ? (X21 X22)/2 (6 6)/2 6
Estimate the 19th percentile i (p/100)n
(19/100)(28) 5.32 (i is not an integer) 21st
percentile ? X6 2
48Other Common Terms
- Quartiles
- The 25th percentile is the first quartile
- The 50th percentile is the second quartile
- The 75th percentile is the third quartile
- The 100th percentile is the fourth quartile
- Deciles
- The 10th percentile is the first decile
- The 20th percentile is the second decile,
- etc.
49Ethics Everybody Tries Managing the Statistics
Virtually all of the published studies have
been criticized as biased and methodologically
flawed. To promote one therapy over another, some
doctors claim success rates that are based on
small numbers of favorable outcomesthat are
misleading. Jerome Groopman, The Prostate
Paradox, The New Yorker, May 29, 2000
50Using Data/Stats Deceptively
- Hiding the real story
- What is Said
- There is some risk in this deal. However, the
average return in the next three years is
12,400,129. - What is NOT Said
- Theres a 10 chance that we will have a loss of
5,000,000 - How to Defend
- Understand what is being said
- Learn to look at data
51Using Data/Stats Deceptively
- Sneaky Graphical Views
- 8 mm vs 4.7 mm
- How to Defend
- Understand what is actually being graphed!
- Read The Visual Display of Quantitative
Information by Edward Tufte
52Frequency Data Grouped Data
- Frequency Distributions
- Absolute Frequencies f(X)
- Relative Frequencies p(X)
- Cumulative Frequencies
- Calculating the mean, variance, and standard
deviation with grouped data - This will lead us into graphical views of data
53But first, a little math Weighted Averages
Where x represents the values of the variable and
w represents the weight on each value.
Formula
Example Calculating GPA
Course Units Grade Grade Points
Comp. Sci 2 C 2 English 5
A 4 Math 3 B 3
54Definitions
- Absolute Frequency f(X)
- A count of the number of times that a particular
value of the variable X occurs - Relative Frequency p(X)
- The fraction or percentage of times that a
particular value of X occurs - Histograms and Frequency Curves
- Graphs of frequencies of X
55Example Overdue Mortgage Data
Record of of Months Overdue 1,2,2,2,2,2,3,3,3,3,
3,3,4,4,5,5,5,5,6,6,6,6,6,6,6,7,7,7
Delinquency Level (Months)
X
f(X)
p(X)
1
1
0.04
( 1/28)
2
5
0.18
( 5/28)
3
6
0.21
( 6/28)
4
2
0.07
( 2/28)
5
4
0.14
( 4/28)
6
7
0.25
( 7/28)
7
3
0.11
( 3/28)
Total
28
1.00
56Cumulative Frequencies
Delinquency Levels (Months)
X
f(X)
p(X)
Cum f(X)
Cum p(X)
1
1
0.04
1
0.04
2
5
0.18
6
0.22
3
6
0.21
12
0.43
4
2
0.07
14
0.50
5
4
0.14
18
0.64
6
7
0.25
25
0.89
7
3
0.11
28
1.00
Total
28
1.00
Cumulative absolute frequency measures the number
of subjects at or below the indicated value of
X. Cumulative relative frequency measures the
proportion (or percentage) of subjects at or
below the indicated value of X. It also gives an
estimateof the percentile.
57Calculating the Mean of a Frequency Distribution
Using Absolute Frequencies
58Calculating the Mean of a Frequency Distribution
Using Relative Frequencies
59Calculating the Variance and Standard Deviation
- Using absolute frequencies, f(X)
- s2 S (X - )2 . f(X) / (n - 1)
-
- Using relative frequencies, p(X)
- ?2 S (X - )2 . p(X)
- Note The standard deviation is, as before, the
square root of the variance.
60Calculating the Var and StDev Computational
Formula
IMPORTANT NOTE N total number of
observations n total number of data groups Mi
Class midpoint for group i
61Calculating the Var and StDev Computational
Formula
STEP 1 Calculate the Sample Mean
62Calculating the Var and StDev Computational
Formula
STEP 2 Calculate the Squares of the Class
Midpoints
63Calculating the Var and StDev Computational
Formula
STEP 3 Calculate the Products and the Sum of
Products
64Calculating the Var and StDev Computational
Formula
STEP 4 Compute the Value from (3) and (4)
S2 58.48
65Outline Section 1Getting Into It
- Uncertainty and Variability
- Data and Why We Care About It
- Refresher on Summation
- Standard Statistical Measures
- Graphical Views of Data
66Batting Average Comparisons
67Fundamental Graphs and Plots
- Basic Frequency Plots
- Histograms
- Pareto Charts
- Pie Charts
- Cumulative Frequency Plots
- Time Series Plots
- LATER Scatter Plots
68Frequency Plots Histograms, Pie Charts, and
Pareto Plots
- Basic Concepts of Frequency
- Absolute Frequency
- How many in this group?
- Relative Frequency
- What in this group?
- Cumulative Frequency
- Only applicable for ordered data
- How many in this group and below?
- More Examples From the World of Baseball!
69Ladies and Gentlemen, Your 2003 San Francisco
Giants!
70Ladies and Gentlemen, Your 2003 San Francisco
Giants!
- Frequency Plot
- X-axis group names or ranges, Y-axis or
71Ladies and Gentlemen, Your 2003 San Francisco
Giants!
- Paret Plot - Frequency Plot with groups ordered
based on relative number of observations
72Ladies and Gentlemen, Your 2003 San Francisco
Giants!
- Pie Chart
- Typically used for
- Different visual image
- Total pie 100
- Whats wrong with this picture?
73Ladies and Gentlemen, Your 2003 San Francisco
Giants!
- Grouping
- Help to identify trends
- Whenever possible, define groups that are of
significant size - When a looks odd to you, ask questions
- sample size?
- Group definitions?
74Cumulative Frequency
- Grouping
- Helps us to assess level of concentration
- How much market share do the top 3 chemical
companies have? - Useful for very basic risk estimates
- What are my chances of bringing no less than
4,000,000 based on my historical sales data?
75Cumulative FrequencyExample 1
76Cumulative FrequencyExample 1
77Cumulative FrequencyExample 2
78Cumulative FrequencyExample 2
Cumulative Distribution of Regional Sales
Revenues
79But first, a little math A Note on
Transforming Variables
- Suppose you have two variables, x and y, such
that y ax b
a b
VAR(y) a2VAR(x)
STD DEV(y) aSTD DEV(x)
- Example 1 The average wholesale price of a
bottle of wine - at Kermits Restaurant is 6, with a standard
deviation of - 2. The retail price that the customer pays
is equal to the - wholesale price plus a markup of 150 plus a
5 corkage - fee. What are the mean and standard deviation
of the retail - prices?
80Summary Section 1Getting Into It
- Today, Uncertainty is a certainty ?
- If we have all the data, we can
- Plot it
- Calculate descriptive statistics
- Mean and variance are key ones
- Make judgement calls and go on with our lives
- How often do we have all the data??
- Not very often!!!
- So what can we possibly do if we dont have all
the data? - This is where were going next!