Summarizing Relationships among variables - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

Summarizing Relationships among variables

Description:

More generally, ordinary least square estimation assume that, between variable Y ... More Topics on Ordinary Least Square Estimations ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 58
Provided by: Rafael120
Category:

less

Transcript and Presenter's Notes

Title: Summarizing Relationships among variables


1
Lecture 3-2
  • Summarizing Relationships among variables

2
Numerical measures of summarizing the
relationship between two variables
  • To think of what numerical measures we need to
    represent relationships between variables, see
    the following three pairs of scatter plots.

3
Example 1 Relationships between the returns of
different stocks
Stock C Return
Stock A return













Scatter plot I
Scatter Plot II








Stock B return
Stock D return




4
Example 1 (Continued)
  • Scatter Plot I shows a positive relationship
    while scatter plot II shows a negative
    relationship.
  • We need a numerical measure that shows the
    direction of the relationship.
  • For this purpose, we use Covariance

5
Example 2 Relationships between advertisement
spending and revenue
Product I shows a clear linear relationship
between the advertisement spending and revenue,
while product II does not show much of a
relationship. We need to have a numerical measure
that shows the strength of linear relationship
between two variables. We use
Correlation Coefficient
6
Example 3 Number of promotion and sales
Promotion seems to be more effective for Product
A than product B in the sense that additional
promotion brings greater increase in revenue
(i.e., the slope is steeper). To measure the
effectiveness of the promotion, we use
Regression Analysis
7
Numerical measures of summarizing relationships
  • This lecture covers the following topics
  • Covariance
  • Correlation coefficient
  • Regression Analysis

8
Covariance
  • Covariance is a numerical measure that shows the
    direction of the relationship between two
    variables.
  • Covariance is one of the most fundamental
    numerical measures of the relationship between
    two variables. It will appear in many areas
    (i.e., computation of returns of a portfolio of
    stocks)
  • In the following slides, we will learn the logic
    behind the derivation of covariance.

9
How to measure the direction of the relationship
y
z





Box I
Box IV
Box I
Box IV






















Box II
Box III
Box II
Box III




x
w
Positive Relationship
Negative relationship
10
How to measure the direction of the relationship
  • From the previous two scatter plots, notice that
  • When two variables show a positive relationship,
    there are more data points in Box I and Box III,
    than in Box II and Box IV
  • When two variables show a negative relationship,
    there are more data points in Box II and Box IV,
    than Box I and Box III.
  • We use these facts to measure the direction of
    the relationship.

11
How to measure the direction of the relationship
Example
  • The data shows the relationships between the
    number of promotions and revenue. (It is same
    data set used in the previous class. Revenue is
    now denoted in 1000 yen)
  • Suppose you want to know if there is positive
    relationship between these two variables. Next
    slide is the scatter plot of this relationship.

12
How to measure the direction of the relationship
Example, contd
  • Number of promotions and revenue appears to have
    a positive relationship.
  • Notice that most of the data points are either in
    Box I or Box III
  • What can we say about Box I and Box III? See the
    next slide

13
How to measure the direction of the relationship
Example, contd
  • For each data point, you can compute the
    distances from the means.
  • Then we can notice that, for any data points in
    Box I, both of the distances are positive. For
    any data points in Box III, both of the distances
    are negative.
  • See the next slide

Distance from the mean of X (X- the mean of X)
Distance from the mean of Y (Y- the mean of Y)
14
(No Transcript)
15
How to measure the direction of the relationship
Example, contd
  • For a data point in Box I, distances from the
    means are both positive. That is, both (X- )
    and (Y- ) are positive.
  • Therefore, if we multiply the two distances
    together, we will have a positive number
  • For a data point in Box III, distance from the
    means are both negative. That is (X- ) and
    (Y- ) are both negative.
  • Therefore, if we multiply the two distances
    together, we will again have a positive number.
  • Now, what we can say about Box II and Box IV? See
    next slide.

16
(No Transcript)
17
How to measure the direction of the relationship
Example, contd
  • For any points in box II and box IV, one distance
    will be positive and the other distance will be
    negative. So if we multiply them together, we
    will have a negative number.

18
How to measure the direction of the relationship
Example, contd
  • Consider, for each data point, you compute the
    distances from the means, then multiply them
    together. Further, consider you sum all the
    multiplied distances together. If the resulting
    number is positive, this roughly indicates that
    there are more data points in Box I and Box III
    than Box II and Box IV. This in turn indicates
    that the data shows positive relationship. If the
    resulting number is negative, this indicates a
    negative relationship.
  • This is the basic idea of measuring the direction
    of the relationship between two variables, and
    this is the first step to compute Covariance.

19
Computation of the Sample Covariance
  • The sample covariance is computed in the
    following way.
  • Compute the mean for each variable.
  • For each observation, and for each variable,
    compute the distances from the means, i.e.
    compute (X- ) and (Y- ). Then multiply
    them together.
  • Sum all the multiplied differences.
  • Divide the sum of the multiplied differences by
    n-1, (that is the number of observations minus
    1).

20
Computation of Sample covarianceExercise
  • Open Computation of Covariance data set.
  • Using data on the sheet data 1, compute the
    covariance between the number of promotions and
    the revenue.

21
Exercise, contd
  • The covariance between the number of promotions
    and revenue is 2561.8
  • Positive covariance indicates that the number of
    promotions and revenue have a positive
    relationship.

22
Characteristics of Covariance
  • If covariance is positive, the two variables have
    a positive relationship
  • If covariance is negative, the two variables have
    a negative relationship.
  • A large value of covariance does not indicate
    that the two variables have a strong linear
    relationship.

23
A note on Covariance
  • One may be tempted to conclude that if the
    covariance is larger, the relationship between
    two variables is stronger (in the sense that they
    have stronger linear relationship)
  • However, this is not true. To see this, go over
    the next example.

24
A note on Covariance, example
  • Open the data Computation of Covariance, work
    sheet data 2. Compute the covariance between
    variable X and Y.
  • (The data 2 is in fact the same as data 1.
    Only the difference is, the revenue is measure in
    1000 yen for data 1, while it is measure in 1 yen
    for data 2.)

25
Example, contd
  • The covariance for data 2 is 2561805. This
    compares the covariance for data 1 which was
    2561.8.
  • Even if data 1 and data 2 show exactly the same
    relationship, covariance for data 2 is much
    larger. This is simply because the unit of
    measurement for revenue is different between data
    1 and data2.
  • This shows that a larger covariance does not mean
    a stronger relationship. (In this particular
    example, relationship is exactly the same.)
  • To show the strength of the relationship, we use
    Correlation coefficient.

26
Sample Correlation CoefficientThe measure of the
strength of linear relationship
  • Correlation coefficient between X and Y, denoted
    as rxy, is computed as

27
Characteristics of Correlation Coefficient
  • The correlation coefficient ranges from 1 to 1
    with,
  • rxy 1 indicates a perfect positive linear
    relationship the X and Y points would plot an
    increasing straight line.
  • rxy 0 indicates no linear relationship between
    X and Y.
  • rxy -1 indicates a perfect negative linear
    relationship the X and Y points would plot a
    decreasing straight line.
  • Positive correlations indicate positive or
    increasing linear relationships with values
    closer to 1 indicating data points closer to a
    straight line and closer to 0 indicating greater
    deviations from a straight line.
  • Negative correlations indicate decreasing linear
    relationships with values closer to 1 indicating
    points closer to a straight line and closer to 0
    indicating greater deviations from a straight
    line.
  • Correlation coefficient is not the slope of the
    relationship.

28
Correlation CoefficientExercise
  • Open Computation of Covariance. Compute
    correlation coefficient between the number of
    promotion and revenue for both data 1 and data 2.

29
Correlation Coefficient exercise
  • Exercise 1 Open Data set Correlation
    Coefficient Exercise 1. This data set shows the
    relationships between advertisement cost and
    revenue for two different products. First,
    produce a scatter plot for each product. Then
    compute correlation coefficient for each product.
  • Exercise 2 Open data set Correlation
    Coefficient Exercise 2. This data set contains
    two pairs of variables. First, make a scatter
    plot for each pair in a single graph. Second,
    compute correlation coefficient for each pair of
    the variables.

30
Exercise 1, Answer
Product I shows strong positive linear
relationship between advertisement cost and
revenue. Correlation coefficient is 0.95, which
is close to 1. Product II does not show much
linear relationship. The correlation coefficient
is close to 0.05, which is close to 0.
31
Exercise 2 (Answer)
32
Correlation Coefficient Exercise 2 (Answer)
  • First, for both pairs, the correlation
    coefficients are -1. This means that the
    relationships are perfectly (negatively) linear
    for both pairs of variables.
  • Also note that, even though the slope for the
    pair I is much steeper, the correlation
    coefficients are the same for both pairs. This
    shows that correlation coefficient is not the
    slope of the relationship.

33
Correlation Coefficient
  • To have more idea about the coefficient
    correlation, see the following slides

34
Scatter Plots and Correlation(Figure 3.6)
Y
X
(a) r .8
35
Scatter Plots and Correlation(Figure 3.6)
Y
X
(b)r -.8
36
Scatter Plots and Correlation(Figure 3.6)
Y
X
(c) r 0
37
Understanding the mathematical notation for the
covariance and correlation coefficient.
  • This is a typical data format for the use of
    describing two variables.
  • Using this format, we would like to represent the
    covariance, and the correlation coefficient using
    mathematical notations.

38
Understanding the mathematical notation for the
sample covariance and sample correlation
coefficient.
Covariance is computed by summing the last colum,
then divide the sum by (n-1). Therefore, the
mathematical notation for the covariance is given
by Next Slide
39
Mathematical Notation for the sample covariance
The mathematical notation for covariance between
variable X and variable Y, denoted by either
Cov(X,Y) or sxy, is given as
  • where xi and yi are the observed values, and
    are the sample means, and n is the sample
    size.

40
Mathematical Notation for the sample correlation
coefficient
  • The sample correlation coefficient, rxy, is
    computed by the equation

Sx is the standard deviation of variable X. Sy is
the standard deviation for variable Y.
41
3. Ordinary Least Square estimation-A Regression
Analysis-
  • This is the scatter plot we saw in Lecture 3-1.
    From the graph, we can see that promotion is more
    effective for product A than product B.
  • Then, how do we measure the effectiveness of
    promotions?
  • Correlation coefficient cannot be used for this
    purpose since it is not the measure of the slope

42
Ordinary Least Square estimationA Regression
Analysis
To measure the effectiveness of promotion for
each product, we use regression analysis. In
this handout, we will talk about a type of
regression analysis called Ordinary Least Square
Estimation
43
Ordinary Least Square Estimation
  • Ordinary Least Square (OLS) estimation is a
    method to find a linear equation that best fits
    the data. Left hand graph is a simple scatter
    plot of the relationship between the number of
    promotions and the revenue from product A. The
    right hand side graph shows the OLS estimation of
    the linear relationship between the number of
    promotion and revenue for the product A.
  • Next several slides show the logic behind the OLS
    estimation.

44
Ordinary Least Square Estimation(Two variable
case)
  • Ordinary Least Square Estimation assume that the
    number of promotions and the revenue from the
    product has the following relationship.
  • More generally, ordinary least square estimation
    assume that, between variable Y and variable X,
    there is a following linear relationship.
  • An equation, like above, that describes a
    relationship among variables is called a model,
    or regression equation. The model above
    contains two parameters, ?0 and ?1 that are
    defined as model coefficients. The coefficient
    ?0, is the intercept on the Y-axis and the
    coefficient ?1 is the slope. (The slope is the
    change in Y for every unit change in X.)

45
Ordinary Least Square Estimation(Two variable
case)
  • Ordinary Least Square Estimation is a method to
    find (estimate) the values for ß0 and ß1 that
    make the equation fits the data best.
  • The criteria to choose (estimate) the values for
    ß0 and ß1 is described in the following slides.

46
Ordinary Least Square EstimationCriteria to
estimate the parameter values
The we choose (estimate) the values for ß0 and ß1
so that the sum of the squared distances from
the equation to each data point is minimized.
(Therefore, this estimation is called ordinary
least square estimation.) Excel automatically
estimates these values.
Y (Sales from Product A)
ei
(xi, yi)
Distance from the equation to ith data point
X (number of promotion)
47
Ordinary Least Square Estimation Using Excel,
Example
Excel can estimate the linear equation model, and
draw the line at the same time. The estimated ß0
105827, and ß199060. Exercise Open Data OLS
Exercise 1-Promotion and Sales and reproduce
this figure.
48
Things we can do with OLS
  • Using the estimated equation, we can
  • Find the effect of promotion on the revenue for
    product A.
  • Forecast revenue for different number of
    promotions.
  • Find the number of promotions necessary to
    achieve your sales goal.

49
Effect of promotion on the sales of product A
The estimated slope parameter ß1 is the estimated
effect of promotion on the revenue from product
A. ß199,060 means that if you increase the
number of promotion by one, the revenue would
increase by 99,060 on average.
50
Forecasting Revenue
  • Estimated equation can be used to forecast
    revenue for different number of promotions.
  • Suppose that you would like to know what would be
    the expected revenue from product A if the number
    of promotions is 12. Then expected revenue given
    the number of promotion equal 12 can be computed
    as
  • (Expected revenue when number of promotion
    is 12) 9906012 105827 1,294,547
  • So you would expect the revenue to be roughly 1.3
    million yen.

51
Finding the number of promotions that achieve
sales goal
  • Suppose that you would like achieve the sales of
    3,000,000. How many promotions are necessary to
    achieve this goal?
  • To answer this question, simply solve the
    following equation for X.
  • 3,000,00099060X105827
  • X29.2
  • Therefore, if you would like to achieve at least
    3,000,000, you would need to utilize promotion 30
    times.

52
Exercise
  • Open data OLS Exercise 1-Promotion and Sales.
    Plot the relationship between number of promotion
    and revenue for product A and product B.
  • Estimate the following equation
  • (revenue) ß0ß1(number of promotion)
  • separately for product A and Product B
    using OLS.
  • Are the effect of promotion different for product
    A and Product B?
  • What would be the revenue from Product B if the
    number of promotion is 12.
  • Suppose the sale goal from product B is
    1,000,000.How many promotions are necessary to
    achieve this goal?

53
More Topics on Ordinary Least Square Estimations
  • Above graph shows a relationship between
    advertisement cost and revenue along with the
    estimated linear equation.
  • The estimated slope coefficient is 13.4, which
    means that every 1000 yen you spend on
    advertisement, revenue increases by 13.4 thousand
    yen. Next Page

54
More Topics on Ordinary Least Square Estimations
However, the graph seems to indicate that there
is not much relationship between advertisement
spending and revenue. When we estimate linear
equation, we typically would like to know if
advertisement has any effect on the revenue. To
answer such a question, just estimating ß0 and ß1
is not enough. We need more information.
55
More Topics on Ordinary Least Square Estimations
To answer the following question, Would the
advertisement have any impact on the revenue?,
we use the concept of hypothesis testing using
t-statistics. This is the topic for the next
class.
56
Topics to be covered next week
  • We will cover several more topics on ordinary
    least square estimation, which include
  • Testing whether advertisement spending has any
    effect on revenue, using t-statistics.
  • Ordinary Least Square estimation when there are
    more explanatory variables.
  • Ordinary Least Square estimation when you have a
    panel data ( repeated observations over time)
  • Analyzing the effect of a policy change (i.e, a
    new introduction of tax, change in compensation
    scheme etc) using OLS.

57
This lecture note covered topics from
  • Textbook 3.2
  • Textbook 3.3 (p65-p67)
Write a Comment
User Comments (0)
About PowerShow.com