Title: Introduction to Data Analysis and Decision Making
1Introduction to Data Analysis and Decision Making
2Data Analysis
- Describing data and datasets
- Making inferences from data and datasets
- Searching for relationships in data and datasets
3Decision Making
- Optimization
- Decision analysis with uncertainty
- Sensitivity Analysis
4Uncertainty
- Measuring uncertainty
- Modeling and simulation
5What is Management Science?
- Logical, systematic approach to decision making
using quantitative methods. - Science ?Scientific methods used to solve
business related problems. - Goal for this class logically approach and
solve many different problems.
6Management Science Approach to Problem Solving
- Observation
- Definition of the Problem
- Constructing the Model
- Solving the Model/problem
- Implementation of Solution
- (process is never really complete)
7Observation
- Identify the problem
- Problem does not imply that there is something
wrong with the process - Problem could imply need for improvement
8Definition of the Problem
- Clearly define problem
- Prevents incorrect/inappropriate solution
- Listing goals could be helpful
9Constructing the Model
- Represents the problem in abstract form
- Schematic, scale, mathematical relationship
between variables (equation) - Ex Income Hours Worked Pay
10Components of the Model
- Variable/Decision Variables
- Independent
- Dependent
- Objective Function
- Parameter
- Constraints
11Model Solution
- Same as solving the problem
- Ex Z 20X 5X
- subject to
- 4X 100
- Solution
- X25 ?Z 375
12Implementation of Solution
- Solution aids us in making a decision but does
not constitute the actual decision making.
13Example
- Blue Ridge Hot Tubs manufactures and sell hot
tubs. The company needs to decide how many hot
tubs to produce during the next production cycle.
The company buys prefabricated fiberglass hot
tub shells from a local supplier and adds pump
and tubing to the shells to create his hot tubs.
The company has 200 pumps available. Each hot
tub requires 9 hours of labor. The company
expects to have 1,566 production labor hours
during the next production cycle. A profit of
350 will be earned on each hot tub sold. The
company is confident that all of the hot tubs
will sell. The question is, how many should be
produced if the company wants to maximize profits
during the next production cycle?
14Msci Approach to Problem Solving
- Problem Determine of hot tubs to produce
- Definition Maximize profit within the
constraints of the labor hours and materials
available - Model Max Z 350X
- subject to
- 9X ? 1,566 labor hours
- Solution X 174 Z 350(174) 60,900
- Implementation Recommend making 174 hot tubs
15A Generic Mathematical Model
Y f(X1, X2, , Xk)
Where
Y dependent variable (a bottom line performance
measure) Xi independent variables (inputs
having an impact on Y) f(.) function defining
the relationship between the Xi and Y
16Categories of Mathematical Models
Model Independent OR/MS Category Form of
f(.) Variables Techniques
Prescriptive known, known or under LP, Networks,
IP, well-defined decision makers CPM, EOQ,
NLP, control GP, MOLP Predictive unknown, know
n or under Regression Analysis,
ill-defined decision makers Time Series
Analysis, control Discriminant
Analysis Descriptive known, unknown
or Simulation, PERT, well-defined uncertain Queue
ing, Inventory Models
17Example Spring Mills
- 280 observations
- Three variables per observation
- Relatively large dataset
18Background Information
- Spring Mills produces and distributes a wide
variety of manufactured goods. It has a large
number of customers. - Spring Mills classifies these customers as small,
medium, or large, depending on the volume of
business each does with them. - Recently they have noticed a problem with
accounts receivable. They are not getting paid by
their customers in as timely a manner as they
would like. This obviously costs them money.
19RECEIVE.XLS
- Spring Mills has gathered data on 280 customer
accounts. - For each of these accounts the data set lists
three variables - Size - The size of the customer (coded 1 for
small, 2 for medium, 3 for large). - Days - The number of days since the customer was
billed. - Amount - The amount the customer owes.
- What information can we obtain from this data?
20Summary Measures for Combined Data
21Scatterplot Amount vs DaysAll Customers
22Scatterplot Amount vs DaysSmall Customers
23Scatterplot Amount vs DaysMedium Customers
24Scatterplot Amount vs DaysLarge Customers
25Analysis -- continued
- There is obviously a lot going on here and it is
evident form the charts. We point out the
following - there are considerably fewer large customers than
small or medium customers. - the large customers tend to owe considerably more
than small or medium customers. - the small customers do not tend to be as long
overdue as the large and medium customers. - there is no relationship between Days and Amount
for the small customers, but there is a definite
positive relationship between these variables for
the medium and large customers.
26Findings
- If Spring Mills really wants to decrease
receivables, it might want to target the
medium-sized customer group, from which it is
losing the most interest. - Or it could target the large customers because
they owe the most on average. - The most appropriate action depends on the cost
and effectiveness of targeting any particular
customer group. However, the analysis presented
here gives the company a much better picture of
whats currently going on.
27Modeling and Models
- Graphical models
- Algebraic models
- Spreadsheet models
28The Modeling Process
- Define the problem
- Collect and summarize data
- Formulate a model
- Verify the model
- Select one or more suitable decisions
- Present the results to the organization
- Implement the model and update through time
29Describing DataThe Basics
30Descriptive vs Inferential Statistics
- Descriptive statistics
- The process of applying a method of analysis to a
set of data in order to better understand the
information contained within. - Inferential statistics
- Using a (sub)set of data (a sample) to predict
behavior of a larger set of data (the population).
31Population
- Definition
- Set of existing units (usually people, objects,
transactions, or events) or - Every element in a group that is the subject of
interest - Depends upon the problem or situation
- Examples
- College students, Honda Accords, cash sales
32Population Parameters and Sample Statistics
A population parameter is number calculated from
all the population measurements that describes
some aspect of the population. The population
mean, denoted ?, is a population parameter and is
the average of the population measurements. A
point estimate is a one-number estimate of the
value of a population parameter. A sample
statistic is number calculated using sample
measurements that describes some aspect of the
sample.
33Measures of Central Tendency
Mean, ? The average or expected value Median,
Md The middle point of the ordered
measurements Mode, Mo The most frequent value
34The Mean
Population X1, X2, , XN
m
Population Mean
35Relationships Among Mean, Median and Mode
36Variables
- Definition
- Characteristic or property of an individual
population unit - Particular characteristics or properties may vary
among units in a population - Examples
- Starting salary of MBA college graduates
- Price of peanut butter at grocery stores
37Measurement
- Definition
- The process of quantifying information
- Quantitative variables
- Test scores, product and process measurements,
survey results, etc. - Qualitative variables
- Product rating, arbitrary scales, etc.
38Sample
- Definition
- Subset of the units of the population
- Example
- 100 GPAs from all finance majors
- Tool wear on 3 machines out of 45 machines
- Notes
- A random sample implies no statistical bias
- A census includes all population members
39Statistical Inference
- Definition
- Estimation, prediction, or other generalizations
about a population based on information contained
in a sample. - Example
- Based on a 5 year sample of similar weather
patterns, predicting the chance of rain today.
40Reliability of the Inference
- Four items discussed thus far allow for
statistical inference - A population, variable(s) of interest, a sample,
and an inference. - Fifth Item A measure of the reliability of the
inference. - How good the inference is, i.e. how much
confidence can we place in the inference?
41Example
- The approval rating of the President what does
it really mean? - Uses a sample from the population to infer the
percentage of the population that approves of his
overall performance. - Implies that 55 of the population approves of
the presidents performance plus or minus 5,
i.e. between 50 and 60.
42Process Statistics
- A process transforms inputs into outputs
- A manufacturing process which transforms aluminum
sheet into aluminum cans. - A service process which offers financial advice
based on a customers input. - Samples are obtained from a process and
statistical procedures can then be applied to
make inferences about the process itself.
43Sampling a Process
Process A sequence of operations that takes
inputs (labor, raw materials, methods, machines,
and so on) and turns them into outputs (products,
services, and the like.)
A process is in statistical control if it
displays constant level and constant variation.
44Types of Data
- Data can be classified into four types
- Nominal
- Ordinal
- Interval
- Ratio
45Nominal Data
- Classify the members of the sample into
categories (Categorical Data). - Examples
- An individuals religious affiliation
- Gender of applicants
- An individuals political party affiliation
- No mathematical properties, i.e. numerical values
are only codes.
46Ordinal Data
- Units of the sample can be ordered with respect
to the variable of interest. - Examples
- Size of rental cars.
- Ranking of microbrews with respect to taste.
- Ranking of consumer preferences for a product.
- No mathematical properties in that the difference
between ranking values is meaningless.
47Interval Data
- Sample measurements enable comparisons between
members of the sample, i.e. the differences
between samples has meaning. - Examples
- Temperature or pressure readings.
- Machine speeds
- Can add and subtract but cannot multiply or
divide origin has no meaning.
48Ratio Data
- Equal distance between numbers imply equal
distances between the values of the
characteristic being measured, i.e. zero
represents the absence of the characteristic
being measured. - Examples
- Sales revenue for a product or service.
- Unemployment rate.
49Classes of Data
- Data can be classified as either being
- Qualitative data - nominal, ordinal, or
- Quantitative data - interval, ratio.
- Numerical data can also be discrete (countable)
or continuous. - Spreadsheet (or Database)
- Variable (or Field)
- Observation (or Record)
50Describing DataGraphs and Tables
51Displaying Data
- For both Qualitative and Quantitative Data
- Pie Charts
- Bar Graphs (Bar Charts)
- Histograms
- Frequency Tables
- Stem and Leaf Diagrams
52Pie Chart Example
- 1999 Cigarette Sales (in billions) by company
- Philip Morris, 211.8
- Reynolds, 189.7
- Brown and Williamson, 69.1
- Lorillard, 48.6
- American, 43.9
- Liggett, 29.8
53Bar Graph Example
- 1999 Cigarette Sales (in billions) by company
- Philip Morris, 211.8
- Reynolds, 189.7
- Brown and Williamson, 69.1
- Lorillard, 48.6
- American, 43.9
- Liggett, 29.8
54Histogram Example
- Percentage of Sales Revenue spent on Advertising
for a sample of 35 Fortune 500 companies - 1 to 3 (4)
- 3 to 5 (9)
- 5 to 7 (11)
- 7 to 9 (8)
- 9 to 11 (3)
55Measurement Classes
- Intervals are called measurement classes
- A count of the members of a measurement class is
the frequency. - The proportion of members in a measurement class
is the relative frequency. For a given interval,
this proportion is calculated by dividing the
frequency of the measurement class by the sample
size.
56Relative Frequency
- Frequency Table
- Divide range into intervals of equal size.
- Count the number of sample members that fall
within the ranges.
57Relative Frequency Histogram Example
- Percentage of Sales Revenue spent on Advertising
for a sample of 35 Fortune 500 companies - 1 to 3 (4/350.114)
- 3 to 5 (9/350.257)
- 5 to 7 (11/350.314)
- 7 to 9 (8/350.229)
- 9 to 11 (3/350.086)
58Stem and Leaf Diagrams
- Data is displayed graphically
- The stem is the portion of the data to the left
of the decimal point. - The leaf is the portion of data to the right of
the decimal point. - Graphical representation much like Histogram.
59The Effect of Measurement Class Size on a
Histogram
- A Histogram showing greater detail can be
obtained by - Decreasing class size (which increases the number
of classes), or - Increasing sample size (which increases the
number of members in each class).
60Excel and StatPro Add-in Demonstration
- Frequency tables
- Histograms
- Scatterplots
- Time series plots