Lecture 1: Thu, Sept 5 - PowerPoint PPT Presentation

1 / 49

About This Presentation

Title:

Lecture 1: Thu, Sept 5

Description:

... between Coca-Cola and Pepsi displayed in their marketing campaigns. ... Suppose, as part of a Pepsi marketing campaign, 1,000 cola consumers are given a ... – PowerPoint PPT presentation

Number of Views:136

Avg rating:3.0/5.0

Slides: 50

Provided by: str2

Category:

more less

Transcript and Presenter's Notes

Title: Lecture 1: Thu, Sept 5

1
Lecture 1 Thu, Sept 5

Introduction/Syllabus (web page)
Todays material
Key Statistical Concepts
Types of Data
Pie and Bar Charts
Histograms, Stem-and-Leaf Plots
Scatter Plots
Intro to JMP-IN (Xr 2.94 2.95)
Homework Assignment

2
Key Definitions

Statistics the art of data analysis. Involves
classifying, summarizing, organizing, and
interpreting numerical information.
Population the set of all items of interest in a
statistical problem.
Sample a subset of items in the population.
Descriptive Statistics a body of methods used to
summarize and organize the characteristics of
sample data.
Inferential Statistics a body of methods used to
draw inferences about characteristics of
populations based on sample data.

Variable characteristic or property of an
individual item of a population or sample.
Observation the value assigned to a variable.
Parameter descriptive measure of a population.
Statistic descriptive measure of a sample.
Statistical Inference process of making an
estimate, prediction or decision about a
population based on information contained in a
sample.
Measure of Reliability a statement about the
degree of uncertainty.

4
Example Cola Wars

Cola wars is the popular term for the intense
competition between Coca-Cola and Pepsi displayed
in their marketing campaigns. Their campaigns
have featured movie and television stars, rock
videos, athletic endorsements, and claims of
consumer preference based on taste tests.
Suppose, as part of a Pepsi marketing campaign,
1,000 cola consumers are given a blind taste test
(ie, a taste test in which the two brand names
are disguised). Each consumer is asked to state
their gender, age and a preference for brand A or
brand B.

a. Describe the population.
b. Describe the variables of interest.
c. Describe the sample.
d. Describe the inference about the taste
preference.
e. Assume the cola preferences of 1,000 consumers
were indicated in a taste test. Describe how the
reliability of an inference concerning the
preferences of all cola consumers in the Pepsi
bottlers marketing region could be measured.

6
Solutions

a. Population of interest the collection or set
of all cola consumers.
b. Variables of interest gender, age and cola
preference.
c. Sample 1,000 cola consumers selected from the
population of all cola consumers.
d. Inference of interest generalization of the
cola preferences of the 1,000 sampled consumers
to the population of all cola consumers. In
particular, the preferences of the consumers in
the sample can be used to estimate the percentage
of all cola consumers who prefer each brand.

e. When the preferences of 1,000 consumers who
are used to estimate the preference of all
consumers in the region, the estimate will not
exactly mirror the preferences of the population.
For example, if the taste test shows that 56 of
the 1,000 consumers chose Pepsi, it does not
follow (nor is it likely) that exactly 56 of all
cola drinkers in the region prefer Pepsi.
Nevertheless, we can use sound statistical
reasoning (which is presented later in the
course) to ensure that our sampling procedure
will generate estimates that are almost certainly
within a specified limit of the true percentage
of all consumers who prefer Pepsi.
For example, such reasoning might assure us
that the estimate of the preference for Pepsi
from the sample is almost certainly within 5 of
the actual population preference. The implication
is that the actual preference for Pepsi is
between 51 ie, (56-5) and 61 ie, (565)-
that is, (56 5) This interval represents a
measure of reliability for the inference.

8
Types of Data (Chapter 2)

Quantitative Data are obtained when the variable
being observed takes numerical values.
Qualitative Data are obtained when the variable
being observed can only be categorized into
different groups (classes).
Ranked Data variable is categorized into
different groups, but the groups are ranked.

9
Questions

In the Cola Wars example, what type of data are
the variables of interest?
Gender
Age
Cola preference
Give one example of each type of data numerical,
categorical, ranked.

10
Types of data - examples
Interval data
Nominal
Age - income 55 75000 42 68000 . . . .
Person Marital status 1 married 2 single 3 sin
gle . . . .
Weight gain 10 5 . .
Computer Brand 1 IBM 2 Dell 3 IBM . . . .
11
Types of data - examples
Interval data
Nominal data
With nominal data, all we can do is, calculate
the proportion of data that falls into each
category.
Age - income 55 75000 42 68000 . . . .
Weight gain 10 5 . .
IBM Dell Compaq Other Total 25
11 8 6 50
50 22 16 12
12
Types of data analysis

Knowing the type of data is necessary to properly
select the technique to be used when analyzing
data.
Type of analysis allowed for each type of data
Interval data arithmetic calculations
Nominal data counting the number of observation
in each category
Ordinal data - computations based on an ordering
process

13
Cross-Sectional/Time-Series Data

Cross sectional data is collected at a certain
point in time
Marketing survey (observe preferences by gender,
age)
Test score in a statistics course
Starting salaries of an MBA program graduates
Time series data is collected over successive
points in time
Weekly closing price of gold
Amount of crude oil imported monthly

14
Graphical Techniques for Qualitative Data

How to summarize? Count the number of times and
compute the proportion of times of the occurrence
of each value of the data.
Pie Chart is a circle divided into a number of
slices that represent the various categories such
that the size of each slice is proportional to
the percentage corresponding to that category.
Bar Chart uses bars to represent the frequencies
(or relative frequencies) such that the height of
each bar equals the frequency or relative
frequency of each of the categories.

15
Turboprop Airplanes

In 1994, a spate of small aircraft crashes made
the safety of turboprop airplanes an issue. As
part of an analysis of different types of
accidents, Airjet Ltd determined where accidents
occurred for both turboprop airplanes and jets in
the period 1984-1993. The data are stored using
the following format

Results for turboprops are stored in column 1
(n260) Results for jets are stored in column 2
(n298).
Identify the type of data stored in each column.
Use two pie charts to summarize these data.
Does it appear that turboprop airplanes and jets
have similar accident patterns?

17
(No Transcript)
18
Graphical Techniques for Quantitative Data

Frequency Distribution a table that groups data
in non-overlapping intervals called classes and
records the number of observations (frequencies)
in each class.

Frequency Histogram is created by drawing
rectangles. The bases of the rectangles
correspond to the class interval, and the height
of each rectangle equals the number of
observations in that class.
Stem-and-Leaf Displays similar to histogram but
with each observation represented by leafs. (see
description next page)
Ogive is the graphical representation of the
cumulative relative frequency distribution.

20
Shapes of histograms
Symmetry

There are four typical shape characteristics

21
Shapes of histograms
Skewness
Negatively skewed
Positively skewed
22
Modal classes

A modal class is the one with the largest number
of observations.
A unimodal histogram

The modal class
23
Modal classes
A bimodal histogram
A modal class
A modal class
24
Bell shaped histograms

Many statistical techniques require that the
population be bell shaped.
Drawing the histogram helps verify the shape of
the population in question

25
Example MBA Salaries

The table contains the top salary offer (in
thousands of dollars) received by each member of
a sample of 50 MBA students who recently
graduated from the Graduate School of Management
at Rutgers, the state university of New Jersey.

26
MBA Salary Data
27
Frequency Distribution
28
Histogram of MBA Salaries
29
Shapes of Histograms

Symmetric histogram which if you draw a line
down the middle looks identical on both sides
Positively skewed histogram with a long tail
extending to the right
Negatively skewed histogram with a long tail
extending to the left
Bell-shaped histogram looks like a bell
Number of modal classes the number of distinct
peaks in a histogram

30
Stem-and-Leaf Plot

Split each datum into stem and leaf
Stem the first part of the number
Leaves last digit of number
Examples
?

31
Stem-and-Leaf Example 2

32
Histogram Stem-and-Leaf
33
Cumulative Frequency Distribution
34
Histogram Ogive Plot
35
Example Production

In order to estimate how long it will take to
produce a particular product, a manufacturer will
study the relationship between production time
per unit time and the number of units that have
been produced. The line or curve characterizing
this relationship is called a learning curve
(Adler and Clark, Management Science, Mar 1991).
Twenty-five employees, all of whom were
performing the same production task for the 10th
time, were observed. Each persons task
completion time (in minutes) was recorded. The
same 25 employees were observed again the 30th
time they performed the same task and the 50th
time they performed the task. The resulting
completion times are shown in the table below.

Use a statistical software package to construct a
frequency histogram for each of the three data
sets.
Compare the histograms. Does it appear that the
relationship between task completion and the
number of times the task is performed is in
agreement with the observations note above about
production processes in general? Explain.

37
(No Transcript)
38
Graphical Techniques for 2 Quantitative Variables

Scatter Plot
Graphical method to describe the relationship
between two quantitative variables
Two-dimensional plot, with one variables values
plotted along the vertical axis and the other
along the horizontal axis.

39
Typical Patterns of Scatter Diagrams
Negative linear relationship
Positive linear relationship
No relationship
Negative nonlinear relationship
Nonlinear (concave) relationship
This is a weak linear relationship.A non linear
relationship seems to fit the data better.
40
House Sales and Mortgage Levels

The economics department of a national investment
banking firm is conducting a study to determine
how house sales are related to mortgage rate
levels. The number of house sales are related to
mortgage rate levels. The number of houses sold
and the average monthly mortgage rate for 36
months recorded.

a. Draw a scatter diagram for these data with
number of houses sold on the vertical axis.
b. Describe the relationship between mortgage
rates and number of homes sold.

42
Graphing the Relationship Between Two Nominal
Variables

We create a contingency table.
This table lists the frequency for each
combination of values of the two variables.
We can create a bar chart that represent the
frequency of occurrence of each combination of
values.

43
Contingency table

Example 2.8
To conduct an efficient advertisement campaign
the relationship between occupation and
newspapers readership is studied. The following
table was created

44
Contingency table

Solution
If there is no relationship between occupation
and newspaper read, the bar charts describing the
frequency of readership of newspapers should look
similar across occupations.

45
Bar charts for a contingency table
Blue-collar workers prefer the Star and the
Sun.
White-collar workers and professionals mostly
read the Post and the Globe and Mail
46
2.6 Describing Time-Series Data

Data can be classified according to the time it
is collected.
Cross-sectional data are all collected at the
same time.
Time-series data are collected at successive
points in time.
Time-series data is often depicted on a line
chart (a plot of the variable over time).

47
Line Chart

Example 2.9
The total amount of income tax paid by
individuals in 1987 through 1999 are listed
below.
Draw a graph of this data and describe the
information produced

48
Line Chart
For the first five years total tax was
relatively flat From 1993 there was a rapid
increase in tax revenues.
Line charts can be used to describe nominal data
time series.
49
Homework Assignment 1