Business Statistics August 2003 - PowerPoint PPT Presentation

1 / 80
About This Presentation
Title:

Business Statistics August 2003

Description:

New to SFSU Faculty This Year After Long Business Career. Extensive Early Professional Experience ... is held in an unstructured fashion. What do you think of ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 81
Provided by: jcala
Category:

less

Transcript and Presenter's Notes

Title: Business Statistics August 2003


1
Business StatisticsAugust 2003
  • Professor Vijay Mehrotra
  • San Francisco State University

2
Whats the Deal With the Prof?
  • New to SFSU Faculty This Year After Long Business
    Career
  • Extensive Early Professional Experience
  • Caddy, security guard, phone operator, smuggler,
    busboy, waiter, and shoe shine guy
  • M.S. and Ph.D. in Operations Research From
    Stanford University
  • Possible Investigation?
  • Years of consulting
  • Best way to avoid schoolwork
  • Also known for magazine column
  • Was It Something I Said
  • http//www.lionhrtpub.com/ORMS.shtml

3
Whats the Deal With the Prof?
  • Professional Career Summary
  • 1987 1992 Grad Student, Consultant
  • Clients IBM, HP, National Semiconductor, PGE
  • 1993 1994 Consultant, DFI
  • Developing large software models for
    transportation operations analysis
  • 1994 2002 CEO, Onward Inc.
  • Operations management consulting firm
  • Grew from 3 founders to 28 total staff
  • 2002 - 2003 VP, Blue Pumpkin Software
  • Our products focus on forecasting and scheduling
    people in call center operations
  • 2003 - Faculty, San Francisco State
  • Very happy to be here!

4
My Teaching Approach
  • The Big Ideas
  • Key Terms
  • Concepts
  • Powerpoint, Blackboard
  • Examples
  • Blackboard, Excel
  • Allow Time For QA,
  • Working Through Good Problems

5
My Expectations of You
  • Before Class
  • Do the recommended reading for this class
  • Work on recommended homework from last class
  • During Class
  • Come to class - be on time
  • Take notes, ask questions
  • NO CELL PHONES
  • After Class
  • Reading, Problems
  • Stay Current on Material
  • This really helps!
  • ABSOLUTELY No Cheating
  • All infractions will be dealt with very harshly

6
Outline of Course Topics
  • Overview, Variability, Data, Measures, and Graphs
    (2 weeks)
  • Text Chapters 1-3
  • Probability Concepts and Distributions (4 Weeks)
  • Sampling and Estimation (2 weeks)
  • Hypothesis Testing (3 weeks)
  • Regression Analysis (3 weeks)

7
Section 1Overview, Variability, Data, Measures,
and Graphs

8
The Big Picture for This Class
Ignorance
Uncertainty
Risk
Certainty
9
The Big Ideas Section 1
  • There is more data being gathered than ever
  • but the world is full of variability
    uncertainty
  • What to try to control/influence?
  • What to try to understand?
  • In business, we use Probability and Statistics
    to
  • Reduce confusion, deal with complexity,
    understand uncertainty ? increased business
    opportunity
  • We begin with some basic ways of looking at data
  • Statistical measures, graphical views
  • Learn how to defend against statistical liars
  • Calculators and spreadsheets are everywhere
  • so deal with it!

10
Outline Section 1Getting Into It
  • Uncertainty and Variability
  • Data and Why We Care About It
  • Refresher on Summation
  • Standard Statistical Measures
  • Graphical Views of Data

11
Confusing, Complex, Uncertain
  • What is the chance that Vijay will make it from
    home to SFSU in less than 30 minutes today?
    Every day this week?

12
Confusing, Complex, Uncertain
  • VARIABILITY Controllable
  • What can we control (or influence)?
  • Choose whether to drive, BART, or bus
  • Choose time of day
  • Get to the bus or BART stop on time

13
Confusing, Complex, Uncertain
  • VARIABILITY - Uncontrollable
  • What is out of our hands?
  • Driver, Vehicle, System State
  • Red Lights, Other Drivers Behavior
  • Weather, Number of Other Passengers

14
Confusing, Complex, Uncertain
  • DEMAND Our factory will produce 100,000 units
    next month. How likely are we to sell all of
    them?
  • SUPPLY We have estimated demand from each of
    our distributors. How many units should we
    produce? Will it change if we change our price?
  • PRICE If we increase our prices by 10, what is
    the overall effect on our revenues?

15
Confusing, Complex, Uncertain
  • Increases in
  • Competition
  • Speed of Change
  • Speed of Response
  • International Trade
  • Outsourcing
  • Customization
  • Systems and Efficiency
  • Bottom Line
  • Increased Pressure on Business
  • Increased Emphasis on Reducing Controllable
    Variability
  • Increased Uncontrollable Variability

16
Outline Section 1Getting Into It
  • Uncertainty and Variability
  • Data and Why We Care About It
  • Refresher on Summation
  • Standard Statistical Measures
  • Graphical Views of Data

17
Basic Questions About Data
  • What is data ?
  • Where do you get it?
  • What is it good for?

18
What is Data?
  • Data Warehousing Institute Definition
  • Facts, numbers, or text that can be processed to
    produce information, usually through a calculator
    or computer.
  • Textbook Definition
  • Uh, well, its data, you know
  • Vijays Definition
  • Numerical input for analysis
  • NOTE 90 of the worlds data is held in an
    unstructured fashion. What do you think of this?

19
What Does Data Come From?
  • The Data Fairy?

20
Why Care About Data and Statistics?
Ignorance
Uncertainty
Risk
Data typically has both a cost and a value.
Which one is greater?
Certainty
21
Many People Work Hard to Get Good Data
  • Customer Relationship Management
  • Sales Force Performance Tracking
  • Call Center Data Collection
  • Operations
  • Sales Order Processing
  • Bill of Materials
  • Order Status Tracking
  • Government
  • Census Bureau
  • Consumer Price Index

22
Many People Work Hard to Get Good Data
  • NIELSEN PEOPLE METER is programmed with the age
    and gender of each household member. Viewers
    enter their code when they begin watching
    visitors can log their presence as well. The
    meter records which channels are tuned by sensing
    the frequencies emitted by the cable box, TV or
    videocassette recorder.
  • EVERY DAY, in some 5,000 homes throughout the
    U.S., People Meters gather data on who watched
    what, when and for how long.
  • AT STAGGERED TIMES throughout the night, all the
    meters call Nielsen's mainframe computer system
    in Dunedin, Fla., and transfer their daily
    viewing records via modem.
  • BY MORNING, Nielsen has assembled and processed
    its sample of the nation's viewing behavior. TV
    executives and other subscribers can log in to
    Nielsen's data network to learn which shows were
    hits and which flopped.
  • VERY WEEK subscribers receive a detailed report
    chronicling how many Nielsen household viewers
    were watching television during any given quarter
    hour and how specific programs fared against
    their competition.
  • This COSTS A LOT OF MONEY to do.

Source Edgar W. Aust, senior vice president of
engineering and technology for Nielsen Media
Research in Dunedin, Fla.
23
Neilsen Media Research
In 1936 engineer Arthur C. Nielsen, Sr., attended
a demonstration at the Massachusetts Institute of
Technology of a mechanical device that could keep
a record of the station to which a radio was
tuned at any given moment. Nielsen bought the
technology practically on the spot and six years
later launched the Nielsen Radio Index, which
analyzed the listening habits of 800 homes.
Later, he adapted the same technology to the new
medium of television, creating a ratings system
that nearly all American broadcasters use today
to help determine the popularity of their
programs. Over the years, Nielsen Media Research
has used several methods to collect viewing
information, including surveys and volunteer
diaries. In 1986 the company supplanted these
with an electronic device called a People Meter.
The meter is now connected to televisions and
telephone lines in about 5,000 households
throughout the U.S. Nielsen households are
selected from a sample that is statistically
representative of the television-viewing
population. Each household receives nominal
compensation--about 50 and occasional gifts--for
their cooperation. In order to keep the sample
representative, viewers can participate for only
two years. As they watch TV, volunteers press
buttons to indicate their presence. The People
Meter records the gender and age of each viewer,
as well as the time spent watching each channel
frequency. Every night the device transmits that
household's data by modem to Nielsen's central
computer in Florida, which assembles the data
into a ratings database. To meet the changing
needs of broadcasters and sponsors, the
technology continues to evolve. In 1986 Nielsen
introduced a system that uses computerized
pattern recognition to identify particular
commercials as they are broadcast. Future
versions of the People Meter now under
development will monitor codes embedded into
digital TV signals to verify which programs are
on the air. They will also use image-recognition
computers to identify viewers the moment they hit
the couch. Source EDGAR W. AUST, senior vice
president of engineering and technology for
Nielsen Media Research in Dunedin, Fla.
24
Classifying Data Types
  • Discrete Values can be represented as separate,
    distinct points on a number line
  • Number of customer visits to a store
  • Number of shares traded in a day
  • Continuous Possible values represented as a
    continuum on a number line
  • Weight of a shipment
  • Height of the players on an NBA team
  • Time spent manufacturing a product

25
Classifying Data Types
  • Nominal Data Numbers that label qualitative
    differences
  • Citizenship Variable
  • 1 US Citizen 2 Foreign Citizen
  • Ordinal Data Assigned numbers that indicate
    rank order
  • Example Grade Points

26
Classifying Data Types
  • Interval Data -- Intervals between numbers can be
    compared, but not ratios
  • Calendar Years, Fahrenheit Temperatures
  • Ratio Data -- Ratios and Intervals can be
    compared in a meaningful way
  • Height, weight, length, time

27
Sample Data
28
From Data to Statistics
Ignorance
Uncertainty
Risk
Certainty
29
Outline Section 1Getting Into It
  • Uncertainty and Variability
  • Data and Why We Care About It
  • Refresher on Summation
  • Standard Statistical Measures
  • Graphical Views of Data

30
Quick Refresher Subscripts and Summations
  • We will often deal with a list of measurements or
    observations. A subscript identifies a
    particular observation in the list.
  • Examples X2, X7, W3
  • A summation sign (S) indicates addition.
  • Example S X means the sum of all the values
    of X

31
Quick RefresherRules of Summation
  • S cX c S X
  • S c nc
  • S (X Y) S X S Y
  • c a constant
  • n total number of observations
  • But note S XY does not equal S X S Y
  • S X2 does not equal (S X)2
  • S (X Y) 2 does not equal S X2 S Y2

32
Quick RefresherApplying the Rules in Different
Ways
33
Outline Section 1Getting Into It
  • Uncertainty and Variability
  • Data and Why We Care About It
  • Refresher on Summation
  • Standard Statistical Measures
  • Graphical Views of Data

34
How to Describe a Set of Data?
  • One Variable
  • Measures of Central Tendency
  • What is average, typical, most likely,
    normal, expected, common, predictable for
    this group?
  • We are going to add one more salesperson to our
    company. How much more revenue will we get?
  • Measures of Dispersion
  • How spread out, dispersed, diffuse,
    varied, different are these values?
  • Are all of our factories doing about the same or
    are there significant differences? Why?

35
How to Describe a Set of Data?
  • One Variable
  • Measures of Central Tendency
  • What is average, typical, most likely,
    normal, expected, common, predictable for
    this group?
  • We are going to add one more salesperson to our
    company. How much more revenue will we get?
  • Measures of Dispersion
  • How spread out, dispersed, diffuse,
    varied, different are these values from one
    another?
  • Are all of our factories doing about the same or
    are there significant differences? Why?

36
Measures of Central Tendency
  • Mean (or Average) S X / n
  • Known as for sample.
  • Known as m for population.
  • Median the middle value, X(n1)/2
  • Mode most frequently observed value
  • (Note n is the sample size)

37
Example of separate accounts that each
customer has with our bank Raw Data
2,1,6,2,3,3,7,5,2,4,5,4,6,6,7,
6,3,2,3,6,3,5,6,5,6,2,7,3 S X / n
120 / 28 4.29 accounts 1,2,2,2,2,2,3,3,3,3,3,3
,4,4,5,5,5,5,6,6,6,6,6,6,6,7,7,7
Median X(n1)/2 X14.5 4.5 accounts
Mode 6 accounts
38
Measures of DispersionWhy Do We Care??
  • Baseball Example
  • Pennys Team Batting Average .290
  • Joes Team Batting Average .290
  • Who would you rather play against? Do you
    know? Do you know? Do you know?

?
39
Measures of DispersionWhy Do We Care??
?
40
Measures of Dispersion
  • VARIANCE
  • Known as s2 for a sample
  • Known as ?2 for population
  • What does that mean??

or
(Note n is the sample size)
41
Calculating the Variance
2
X
X -
( X - )
2
4.29
-2.29
5.22
1
4.29
-3.29
10.80
6
4.29
1.71
2.94
2
4.29
-2.29
5.22
3
4.29
-1.29
1.65
3
4.29
-1.29
1.65
7
4.29
2.71
7.37
5
4.29
0.71
0.51
2
4.29
-2.29
5.22
4
4.29
-0.29
0.08
5
4.29
0.71
0.51
4
4.29
-0.29
0.08
6
4.29
1.71
2.94
6
4.29
1.71
2.94
7
4.29
2.71
7.37
6
4.29
1.71
2.94
3
4.29
-1.29
1.65
2
4.29
-2.29
5.22
3
4.29
-1.29
1.65
6
4.29
1.71
2.94
3
4.29
-1.29
1.65
5
4.29
0.71
0.51
6
4.29
1.71
2.94
5
4.29
0.71
0.51
6
4.29
1.71
2.94
2
4.29
-2.29
5.22
7
4.29
2.71
7.37
3
4.29
-1.29
1.65
120
0
91.71
42
Using the Computational Formula for Calculating
the Variance
43
The Standard Deviation
  • The standard deviation is the square root of the
    variance. It is called s for a sample, or ? for
    a population.
  • For the example s ?3.4 1.84
  • One use of the standard deviation is the 3-Sigma
    Rule. This rule says that it is very unusual to
    find any observations in the data greater than
    the mean plus 3 times s, and also any
    observations less than the mean minus 3 times s.
  • GEs 6 Sigma Program

44
Other Measures of Variation
  • Range highest minus lowest value
  • Example Range of ages in playground7 years
    (oldest) - 1 year (youngest) 6 years
  • Mean Absolute Deviation (MAD)

The MAD is the average distance from the mean.
45
Calculating the MAD
46
Percentiles Somewhere BetweenCentral
Tendency and Variation
  • Percentiles shows position of a value
  • the pth percentile is the value such that at
    least p of all values in the data set are at or
    below it and at least (100-p) are at or above
    it.
  • Arrange the data in ascending order.
  • Compute a value i (p/100)n, where p is the
    percentile to be calculated and n is the number
    of data items.
  • If i is not an integer, round up. The next
    integer greater than i is the subscript of the
    pth percentile.
  • If i is an integer, then the pth percentile is
    approximated by (XiXi1)/2

47
Examples Calculating Percentiles
1,2,2,2,2,2,3,3,3,3,3,3,4,4,5,5,5,5,6,6,6,6,6,6,6,
7,7,7
Estimate the 75th percentile i (p/100)n
(75/100)(28) 21 (i is an integer) 75th
percentile ? (X21 X22)/2 (6 6)/2 6
Estimate the 19th percentile i (p/100)n
(19/100)(28) 5.32 (i is not an integer) 21st
percentile ? X6 2
48
Other Common Terms
  • Quartiles
  • The 25th percentile is the first quartile
  • The 50th percentile is the second quartile
  • The 75th percentile is the third quartile
  • The 100th percentile is the fourth quartile
  • Deciles
  • The 10th percentile is the first decile
  • The 20th percentile is the second decile,
  • etc.

49
Ethics Everybody Tries Managing the Statistics
Virtually all of the published studies have
been criticized as biased and methodologically
flawed. To promote one therapy over another, some
doctors claim success rates that are based on
small numbers of favorable outcomesthat are
misleading. Jerome Groopman, The Prostate
Paradox, The New Yorker, May 29, 2000
50
Using Data/Stats Deceptively
  • Hiding the real story
  • What is Said
  • There is some risk in this deal. However, the
    average return in the next three years is
    12,400,129.
  • What is NOT Said
  • Theres a 10 chance that we will have a loss of
    5,000,000
  • How to Defend
  • Understand what is being said
  • Learn to look at data

51
Using Data/Stats Deceptively
  • Sneaky Graphical Views
  • 8 mm vs 4.7 mm
  • How to Defend
  • Understand what is actually being graphed!
  • Read The Visual Display of Quantitative
    Information by Edward Tufte

52
Frequency Data Grouped Data
  • Frequency Distributions
  • Absolute Frequencies f(X)
  • Relative Frequencies p(X)
  • Cumulative Frequencies
  • Calculating the mean, variance, and standard
    deviation with grouped data
  • This will lead us into graphical views of data

53
But first, a little math Weighted Averages
Where x represents the values of the variable and
w represents the weight on each value.
Formula
Example Calculating GPA
Course Units Grade Grade Points
Comp. Sci 2 C 2 English 5
A 4 Math 3 B 3
54
Definitions
  • Absolute Frequency f(X)
  • A count of the number of times that a particular
    value of the variable X occurs
  • Relative Frequency p(X)
  • The fraction or percentage of times that a
    particular value of X occurs
  • Histograms and Frequency Curves
  • Graphs of frequencies of X

55
Example Overdue Mortgage Data
Record of of Months Overdue 1,2,2,2,2,2,3,3,3,3,
3,3,4,4,5,5,5,5,6,6,6,6,6,6,6,7,7,7
Delinquency Level (Months)
X
f(X)
p(X)
1
1
0.04
( 1/28)
2
5
0.18
( 5/28)
3
6
0.21
( 6/28)
4
2
0.07
( 2/28)
5
4
0.14
( 4/28)
6
7
0.25
( 7/28)
7
3
0.11
( 3/28)
Total
28
1.00
56
Cumulative Frequencies
Delinquency Levels (Months)
X
f(X)
p(X)
Cum f(X)
Cum p(X)
1
1
0.04
1
0.04
2
5
0.18
6
0.22
3
6
0.21
12
0.43
4
2
0.07
14
0.50
5
4
0.14
18
0.64
6
7
0.25
25
0.89
7
3
0.11
28
1.00
Total
28
1.00
Cumulative absolute frequency measures the number
of subjects at or below the indicated value of
X. Cumulative relative frequency measures the
proportion (or percentage) of subjects at or
below the indicated value of X. It also gives an
estimateof the percentile.
57
Calculating the Mean of a Frequency Distribution
Using Absolute Frequencies
58
Calculating the Mean of a Frequency Distribution
Using Relative Frequencies
59
Calculating the Variance and Standard Deviation
  • Using absolute frequencies, f(X)
  • s2 S (X - )2 . f(X) / (n - 1)
  • Using relative frequencies, p(X)
  • ?2 S (X - )2 . p(X)
  • Note The standard deviation is, as before, the
    square root of the variance.

60
Calculating the Var and StDev Computational
Formula
IMPORTANT NOTE N total number of
observations n total number of data groups Mi
Class midpoint for group i
61
Calculating the Var and StDev Computational
Formula
STEP 1 Calculate the Sample Mean
62
Calculating the Var and StDev Computational
Formula
STEP 2 Calculate the Squares of the Class
Midpoints
63
Calculating the Var and StDev Computational
Formula
STEP 3 Calculate the Products and the Sum of
Products
64
Calculating the Var and StDev Computational
Formula
STEP 4 Compute the Value from (3) and (4)
S2 58.48
65
Outline Section 1Getting Into It
  • Uncertainty and Variability
  • Data and Why We Care About It
  • Refresher on Summation
  • Standard Statistical Measures
  • Graphical Views of Data

66
Batting Average Comparisons
67
Fundamental Graphs and Plots
  • Basic Frequency Plots
  • Histograms
  • Pareto Charts
  • Pie Charts
  • Cumulative Frequency Plots
  • Time Series Plots
  • LATER Scatter Plots

68
Frequency Plots Histograms, Pie Charts, and
Pareto Plots
  • Basic Concepts of Frequency
  • Absolute Frequency
  • How many in this group?
  • Relative Frequency
  • What in this group?
  • Cumulative Frequency
  • Only applicable for ordered data
  • How many in this group and below?
  • More Examples From the World of Baseball!

69
Ladies and Gentlemen, Your 2003 San Francisco
Giants!
70
Ladies and Gentlemen, Your 2003 San Francisco
Giants!
  • Frequency Plot
  • X-axis group names or ranges, Y-axis or

71
Ladies and Gentlemen, Your 2003 San Francisco
Giants!
  • Paret Plot - Frequency Plot with groups ordered
    based on relative number of observations

72
Ladies and Gentlemen, Your 2003 San Francisco
Giants!
  • Pie Chart
  • Typically used for
  • Different visual image
  • Total pie 100
  • Whats wrong with this picture?

73
Ladies and Gentlemen, Your 2003 San Francisco
Giants!
  • Grouping
  • Help to identify trends
  • Whenever possible, define groups that are of
    significant size
  • When a looks odd to you, ask questions
  • sample size?
  • Group definitions?

74
Cumulative Frequency
  • Grouping
  • Helps us to assess level of concentration
  • How much market share do the top 3 chemical
    companies have?
  • Useful for very basic risk estimates
  • What are my chances of bringing no less than
    4,000,000 based on my historical sales data?

75
Cumulative FrequencyExample 1
76
Cumulative FrequencyExample 1
77
Cumulative FrequencyExample 2
78
Cumulative FrequencyExample 2
Cumulative Distribution of Regional Sales
Revenues
79
But first, a little math A Note on
Transforming Variables
  • Suppose you have two variables, x and y, such
    that y ax b

a b
VAR(y) a2VAR(x)
STD DEV(y) aSTD DEV(x)
  • Example 1 The average wholesale price of a
    bottle of wine
  • at Kermits Restaurant is 6, with a standard
    deviation of
  • 2. The retail price that the customer pays
    is equal to the
  • wholesale price plus a markup of 150 plus a
    5 corkage
  • fee. What are the mean and standard deviation
    of the retail
  • prices?

80
Summary Section 1Getting Into It
  • Today, Uncertainty is a certainty ?
  • If we have all the data, we can
  • Plot it
  • Calculate descriptive statistics
  • Mean and variance are key ones
  • Make judgement calls and go on with our lives
  • How often do we have all the data??
  • Not very often!!!
  • So what can we possibly do if we dont have all
    the data?
  • This is where were going next!
Write a Comment
User Comments (0)
About PowerShow.com