Lecture 1: Thu, Sept 5 - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Lecture 1: Thu, Sept 5

Description:

... between Coca-Cola and Pepsi displayed in their marketing campaigns. ... Suppose, as part of a Pepsi marketing campaign, 1,000 cola consumers are given a ... – PowerPoint PPT presentation

Number of Views:136
Avg rating:3.0/5.0
Slides: 50
Provided by: str2
Category:
Tags: lecture | pepsi | sept | thu

less

Transcript and Presenter's Notes

Title: Lecture 1: Thu, Sept 5


1
Lecture 1 Thu, Sept 5
  • Introduction/Syllabus (web page)
  • Todays material
  • Key Statistical Concepts
  • Types of Data
  • Pie and Bar Charts
  • Histograms, Stem-and-Leaf Plots
  • Scatter Plots
  • Intro to JMP-IN (Xr 2.94 2.95)
  • Homework Assignment

2
Key Definitions
  • Statistics the art of data analysis. Involves
    classifying, summarizing, organizing, and
    interpreting numerical information.
  • Population the set of all items of interest in a
    statistical problem.
  • Sample a subset of items in the population.
  • Descriptive Statistics a body of methods used to
    summarize and organize the characteristics of
    sample data.
  • Inferential Statistics a body of methods used to
    draw inferences about characteristics of
    populations based on sample data.

3
  • Variable characteristic or property of an
    individual item of a population or sample.
  • Observation the value assigned to a variable.
  • Parameter descriptive measure of a population.
  • Statistic descriptive measure of a sample.
  • Statistical Inference process of making an
    estimate, prediction or decision about a
    population based on information contained in a
    sample.
  • Measure of Reliability a statement about the
    degree of uncertainty.

4
Example Cola Wars
  • Cola wars is the popular term for the intense
    competition between Coca-Cola and Pepsi displayed
    in their marketing campaigns. Their campaigns
    have featured movie and television stars, rock
    videos, athletic endorsements, and claims of
    consumer preference based on taste tests.
    Suppose, as part of a Pepsi marketing campaign,
    1,000 cola consumers are given a blind taste test
    (ie, a taste test in which the two brand names
    are disguised). Each consumer is asked to state
    their gender, age and a preference for brand A or
    brand B.

5
  • a. Describe the population.
  • b. Describe the variables of interest.
  • c. Describe the sample.
  • d. Describe the inference about the taste
    preference.
  • e. Assume the cola preferences of 1,000 consumers
    were indicated in a taste test. Describe how the
    reliability of an inference concerning the
    preferences of all cola consumers in the Pepsi
    bottlers marketing region could be measured.

6
Solutions
  • a. Population of interest the collection or set
    of all cola consumers.
  • b. Variables of interest gender, age and cola
    preference.
  • c. Sample 1,000 cola consumers selected from the
    population of all cola consumers.
  • d. Inference of interest generalization of the
    cola preferences of the 1,000 sampled consumers
    to the population of all cola consumers. In
    particular, the preferences of the consumers in
    the sample can be used to estimate the percentage
    of all cola consumers who prefer each brand.

7
  • e. When the preferences of 1,000 consumers who
    are used to estimate the preference of all
    consumers in the region, the estimate will not
    exactly mirror the preferences of the population.
    For example, if the taste test shows that 56 of
    the 1,000 consumers chose Pepsi, it does not
    follow (nor is it likely) that exactly 56 of all
    cola drinkers in the region prefer Pepsi.
  • Nevertheless, we can use sound statistical
    reasoning (which is presented later in the
    course) to ensure that our sampling procedure
    will generate estimates that are almost certainly
    within a specified limit of the true percentage
    of all consumers who prefer Pepsi.
  • For example, such reasoning might assure us
    that the estimate of the preference for Pepsi
    from the sample is almost certainly within 5 of
    the actual population preference. The implication
    is that the actual preference for Pepsi is
    between 51 ie, (56-5) and 61 ie, (565)-
    that is, (56 5) This interval represents a
    measure of reliability for the inference.

8
Types of Data (Chapter 2)
  • Quantitative Data are obtained when the variable
    being observed takes numerical values.
  • Qualitative Data are obtained when the variable
    being observed can only be categorized into
    different groups (classes).
  • Ranked Data variable is categorized into
    different groups, but the groups are ranked.

9
Questions
  • In the Cola Wars example, what type of data are
    the variables of interest?
  • Gender
  • Age
  • Cola preference
  • Give one example of each type of data numerical,
    categorical, ranked.

10
Types of data - examples
Interval data
Nominal
Age - income 55 75000 42 68000 . . . .
Person Marital status 1 married 2 single 3 sin
gle . . . .
Weight gain 10 5 . .
Computer Brand 1 IBM 2 Dell 3 IBM . . . .
11
Types of data - examples
Interval data
Nominal data
With nominal data, all we can do is, calculate
the proportion of data that falls into each
category.
Age - income 55 75000 42 68000 . . . .
Weight gain 10 5 . .
IBM Dell Compaq Other Total 25
11 8 6 50
50 22 16 12
12
Types of data analysis
  • Knowing the type of data is necessary to properly
    select the technique to be used when analyzing
    data.
  • Type of analysis allowed for each type of data
  • Interval data arithmetic calculations
  • Nominal data counting the number of observation
    in each category
  • Ordinal data - computations based on an ordering
    process

13
Cross-Sectional/Time-Series Data
  • Cross sectional data is collected at a certain
    point in time
  • Marketing survey (observe preferences by gender,
    age)
  • Test score in a statistics course
  • Starting salaries of an MBA program graduates
  • Time series data is collected over successive
    points in time
  • Weekly closing price of gold
  • Amount of crude oil imported monthly

14
Graphical Techniques for Qualitative Data
  • How to summarize? Count the number of times and
    compute the proportion of times of the occurrence
    of each value of the data.
  • Pie Chart is a circle divided into a number of
    slices that represent the various categories such
    that the size of each slice is proportional to
    the percentage corresponding to that category.
  • Bar Chart uses bars to represent the frequencies
    (or relative frequencies) such that the height of
    each bar equals the frequency or relative
    frequency of each of the categories.

15
Turboprop Airplanes
  • In 1994, a spate of small aircraft crashes made
    the safety of turboprop airplanes an issue. As
    part of an analysis of different types of
    accidents, Airjet Ltd determined where accidents
    occurred for both turboprop airplanes and jets in
    the period 1984-1993. The data are stored using
    the following format

16
  • Results for turboprops are stored in column 1
    (n260) Results for jets are stored in column 2
    (n298).
  • Identify the type of data stored in each column.
  • Use two pie charts to summarize these data.
  • Does it appear that turboprop airplanes and jets
    have similar accident patterns?

17
(No Transcript)
18
Graphical Techniques for Quantitative Data
  • Frequency Distribution a table that groups data
    in non-overlapping intervals called classes and
    records the number of observations (frequencies)
    in each class.

19
  • Frequency Histogram is created by drawing
    rectangles. The bases of the rectangles
    correspond to the class interval, and the height
    of each rectangle equals the number of
    observations in that class.
  • Stem-and-Leaf Displays similar to histogram but
    with each observation represented by leafs. (see
    description next page)
  • Ogive is the graphical representation of the
    cumulative relative frequency distribution.

20
Shapes of histograms
Symmetry
  • There are four typical shape characteristics

21
Shapes of histograms
Skewness
Negatively skewed
Positively skewed
22
Modal classes
  • A modal class is the one with the largest number
    of observations.
  • A unimodal histogram

The modal class
23
Modal classes
A bimodal histogram
A modal class
A modal class
24
Bell shaped histograms
  • Many statistical techniques require that the
    population be bell shaped.
  • Drawing the histogram helps verify the shape of
    the population in question

25
Example MBA Salaries
  • The table contains the top salary offer (in
    thousands of dollars) received by each member of
    a sample of 50 MBA students who recently
    graduated from the Graduate School of Management
    at Rutgers, the state university of New Jersey.

26
MBA Salary Data
27
Frequency Distribution
28
Histogram of MBA Salaries
29
Shapes of Histograms
  • Symmetric histogram which if you draw a line
    down the middle looks identical on both sides
  • Positively skewed histogram with a long tail
    extending to the right
  • Negatively skewed histogram with a long tail
    extending to the left
  • Bell-shaped histogram looks like a bell
  • Number of modal classes the number of distinct
    peaks in a histogram

30
Stem-and-Leaf Plot
  • Split each datum into stem and leaf
  • Stem the first part of the number
  • Leaves last digit of number
  • Examples
  • ?

31
Stem-and-Leaf Example 2
  • ?

32
Histogram Stem-and-Leaf
33
Cumulative Frequency Distribution
34
Histogram Ogive Plot
35
Example Production
  • In order to estimate how long it will take to
    produce a particular product, a manufacturer will
    study the relationship between production time
    per unit time and the number of units that have
    been produced. The line or curve characterizing
    this relationship is called a learning curve
    (Adler and Clark, Management Science, Mar 1991).
  • Twenty-five employees, all of whom were
    performing the same production task for the 10th
    time, were observed. Each persons task
    completion time (in minutes) was recorded. The
    same 25 employees were observed again the 30th
    time they performed the same task and the 50th
    time they performed the task. The resulting
    completion times are shown in the table below.

36
  • Use a statistical software package to construct a
    frequency histogram for each of the three data
    sets.
  • Compare the histograms. Does it appear that the
    relationship between task completion and the
    number of times the task is performed is in
    agreement with the observations note above about
    production processes in general? Explain.

37
(No Transcript)
38
Graphical Techniques for 2 Quantitative Variables
  • Scatter Plot
  • Graphical method to describe the relationship
    between two quantitative variables
  • Two-dimensional plot, with one variables values
    plotted along the vertical axis and the other
    along the horizontal axis.

39
Typical Patterns of Scatter Diagrams
Negative linear relationship
Positive linear relationship
No relationship
Negative nonlinear relationship
Nonlinear (concave) relationship
This is a weak linear relationship.A non linear
relationship seems to fit the data better.
40
House Sales and Mortgage Levels
  • The economics department of a national investment
    banking firm is conducting a study to determine
    how house sales are related to mortgage rate
    levels. The number of house sales are related to
    mortgage rate levels. The number of houses sold
    and the average monthly mortgage rate for 36
    months recorded.

41
  • a. Draw a scatter diagram for these data with
    number of houses sold on the vertical axis.
  • b. Describe the relationship between mortgage
    rates and number of homes sold.

42
Graphing the Relationship Between Two Nominal
Variables
  • We create a contingency table.
  • This table lists the frequency for each
    combination of values of the two variables.
  • We can create a bar chart that represent the
    frequency of occurrence of each combination of
    values.

43
Contingency table
  • Example 2.8
  • To conduct an efficient advertisement campaign
    the relationship between occupation and
    newspapers readership is studied. The following
    table was created

44
Contingency table
  • Solution
  • If there is no relationship between occupation
    and newspaper read, the bar charts describing the
    frequency of readership of newspapers should look
    similar across occupations.

45
Bar charts for a contingency table
Blue-collar workers prefer the Star and the
Sun.
White-collar workers and professionals mostly
read the Post and the Globe and Mail
46
2.6 Describing Time-Series Data
  • Data can be classified according to the time it
    is collected.
  • Cross-sectional data are all collected at the
    same time.
  • Time-series data are collected at successive
    points in time.
  • Time-series data is often depicted on a line
    chart (a plot of the variable over time).

47
Line Chart
  • Example 2.9
  • The total amount of income tax paid by
    individuals in 1987 through 1999 are listed
    below.
  • Draw a graph of this data and describe the
    information produced

48
Line Chart
For the first five years total tax was
relatively flat From 1993 there was a rapid
increase in tax revenues.
Line charts can be used to describe nominal data
time series.
49
Homework Assignment 1
  • Due next Thursday, Sept 19, at the start of
    class.
  • Full assignment will be posted on the Stat 101
    web page this Thu at 5pm.
Write a Comment
User Comments (0)
About PowerShow.com