General Announcement (01.07.2004) - PowerPoint PPT Presentation

About This Presentation
Title:

General Announcement (01.07.2004)

Description:

All course s, solutions to quizzes and solutions to assignments 1 and 3 as well as practice questions on probability have been uploaded on the Intranet. – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 41
Provided by: sami128
Category:

less

Transcript and Presenter's Notes

Title: General Announcement (01.07.2004)


1
  • General Announcement (01.07.2004)
  1. All course slides, solutions to quizzes and
    solutions to assignments 1 and 3 as well as
    practice questions on probability have been
    uploaded on the Intranet.
  2. All quizzes and assignments have been corrected.
    The corrected documents can be seen by
    approaching Mr. Parmanand Bhuye in Secretarial
    section on ground floor (in front of reception).
    You can report totalling mistake, if any.
  3. Mark Sheet for the above will be put on notice
    board by today evening.

2
  • QUANTITATIVE METHODS 1

SAMIR K. SRIVASTAVA
3
Correlation and Regression
  • Univariate vs. Bivariate data (Multivariate)
  • More than one attribute for each member of
    population.
  • Height Weight
  • Absenteeism Production
  • Advertising Expenditure Sales Volume
  • Unemployment Crime Rate
  • Rainfall Food Production
  • Web Site Visitor Profile

4
Correlation and Regression
  • Are the two attributes related to each other?
  • Can we use one to predict the other?
  • Can we change one to control the other?
  • Predictor Variable and Response Variable
  • Relationship may be
  • Positive or negative (or nonexistent)
  • Weak or strong
  • Two variables are said to be correlated if value
    of one is indicative of the value of the other.

5
Organizing Bivariate DataScatter Plots
Negatively Correlated
Positively Correlated
Loosely Correlated
Strongly Correlated
Not Correlated
6
Measuring the Strength of Correlation
  • Can we define a quantitative measure of strength
    of correlation?
  • Covariance is such a measure.
  • Looks similar to variance.
  • Can be positive as well as negative.
  • When will it have a positive vs. negative value?
  • A High vs. Low value?

7
Measuring the Strength of Correlation
8
Coefficient of Correlation
  • Suppose we wish to measure the strength of
    correlation on a scale of 0 to 1
  • Is Covariance an appropriate measure?
  • What if we multiply all X values by a constant?
  • The measure should not be affected by change of
    scale.
  • Coefficient of Correlation
  • r ?xy/(?x.?y)
  • Value of r lies between -1 and 1
  • Values close to 0 indicate little or no
    correlation
  • Values close to 1 or -1 indicate a very strong
    correlation.

9
Illustration
x
1.25
1.75
2.25
2.00
2.50
2.25
2.70
2.50
17.50
x 2.15
y
125
105
65
85
75
80
50
55
640
y 80
x-x
-0.9
-0.4
0.1
-0.15
0.35
0.1
0.55
0.35
0

y-y
45
25
-15
5
-5
0
-30
-25
0

(x-x)2
0.8100
0.1600
0.0100
0.0225
0.1225
0.0100
0.3025
0.1225
1.560
Sxx
(y-y)2
2025
625
225
25
25
0
900
625
4450
Syy
(x-x)(y-y)
-40.50
-10.00
-1.50
-0.75
-1.75
0
-16.50
-8.75
-79.75
Sxy
10
Correlation and Causation
  • Is there a causal relationship between the two
    variables?
  • Rainfall Food production
  • Absenteeism Production
  • Advertising Expenditure Sales
  • Strange, Spurious or nonsense correlations
  • Teachers salaries liquor sales
  • Divorce rate death rate (negative
    correlation)

11
Correlation and Causation
  • Spurious correlation is due to a third Lurking
    Variable
  • Economic Growth ? higher salaries, higher liquor
    consumption
  • Age ? older couples have fewer divorces, but
    higher death rate.
  • To establish causation between variables,
    establish
  • Consistency (relationship true in a variety of
    contexts)
  • Responsiveness (change in one precedes change in
    other)
  • Mechanism (manner in which change in X changes Y)

12
Regression
  • Francis Galton Introduced the term in 1877
  • Height of children ? Mean of population
  • Predicting one variable from another
  • Relationships of association, not causal
  • Relating variables mathematically
  • Linear or non-linear
  • Bivariate Linear between two variables

13
Bivariate Regression Assumptions
  • Assumptions for bivariate regression
  • 1. Random sample
  • Ideally N gt 20
  • But different rules of thumb exist. (10, 30,
    etc.)
  • 2. Variables are linearly related
  • i.e., the mean of Y increases linearly with X
  • Check scatter plot for general linear trend
  • Watch out for non-linear relationships (e.g.,
    U-shaped)

14
Bivariate Regression Assumptions
  • 3. Y is normally distributed for every outcome
    of X in the population
  • Conditional normality
  • Ex Years of Education X, Job Prestige (Y)
  • Suppose we look only at a sub-sample X 12
    years of education
  • Is a histogram of Job Prestige approximately
    normal?
  • What about for people with X 4? X 16
  • If all are roughly normal, the assumption is met

15
Two Possible Regressions
16
Simple Linear Regression An Example
  • For a sample of 8 employees, a personnel
    director has collected the following data on
    ownership of company stock, y, versus years with
    the firm, x.
  • x 6 12 14 6 9 13 15 9
  • y 300 408 560 252 288 650 630 522
  • (a) Determine the least squares regression line
    and interpret its slope. (b) For an employee who
    has been with the firm 10 years, what is the
    predicted number of shares of stock owned?

17
An Example, cont.
  • x y xy x2
  • 6 300 1800 36
  • 12 408 4896 144
  • 14 560 7840 196
  • 6 252 1512 36
  • 9 288 2592 81
  • 13 650 8450 169
  • 15 630 9450 225
  • 9 522 4698 81
  • Mean 10.5 451.25
  • Sum 41,238 968

18
An Example, cont.
  • Slope
  • y-Intercept
  • So the best-fit linear model, rounding to the
    nearest tenth, is

19
An Example, cont.
  • Interpretation of the slope For every
    additional year an employee works for the firm,
    the employee acquires an estimated 38.8 shares of
    stock per year.
  • If x1 10, the point estimate for the number of
    shares of stock that this employee owns is

20
Using the Regression Equation
  • Before using the regression model, we need to
    assess how well it fits the data.
  • If we are satisfied with how well the model fits
    the data, we can use it to make predictions for
    y.
  • Illustration
  • Predict the selling price of a three-year-old Car
    with 40,000 km on the odometer

21
Bivariate Regression Assumptions
22
Estimating the Coefficients
  • The estimates are determined by
  • drawing a sample from the population of interest,
  • calculating sample statistics.
  • producing a straight line that cuts into the data.

The question is Which straight line fits best?
23
Ordinary Least Squares
  • 1. Best Fit Means Difference Between Actual
    Values (Yi) Predicted Values (Xi) Are a
    Minimum
  • But Positive Differences Off-Set Negative
  • 2. OLS Minimizes the Sum of the Squared
    Differences (or Errors)

24
Least Squares Method
The best line is the one that minimizes the sum
of squared vertical differences between the
points and the line.
Let us compare two lines
The second line is horizontal
The smaller the sum of squared differences the
better the fit of the line to the data.
25
Assumptions of OLS regression
  1. Model is linear in parameters
  2. The residuals are normally distributed
  3. The residuals have constant variance
  4. The expected value of the residuals is always
    zero
  5. The residuals are independent from one another
  6. The X values are precise
  7. The independent variables are not too strongly
    collinear
  • If these assumptions are satisfied, then OLS
    estimator is unbiased and has minimum variance of
    all unbiased estimators.
  • How can we test these assumptions?
  • If assumptions are violated,
  • what does this do to our conclusions?
  • how do we fix the problem?

26
The Model
  • The first order linear model or a simple
    regression model,
  • y dependent variable
  • x independent variable
  • b0 y-intercept
  • b1 slope of the line
  • ? error variable

27
Least Squares Method
To calculate the estimates of the coefficients
that minimize the differences between the data
points and the line, use the formulas
28
Least Squares Method
Now we define
29
Least Squares Method
Then
The estimated simple linear regression equation
that estimates the equation of the first order
linear model is
30
Error Variable Required Conditions
  • The error e is a critical part of the regression
    model.
  • Five requirements involving the distribution of e
    must be satisfied.
  • The mean of e is zero E(e) 0.
  • The standard deviation of e is a constant (se)
    for all values of x.
  • The errors are independent.
  • The errors are independent of the independent
    variable x.
  • The probability distribution of e is normal.

31
Standard error of estimate
  • If se is small the errors tend to be close to
    zero (close to the mean error). Then, the model
    fits the data well.
  • Therefore, we can, use se as a measure of the
    suitability of using a linear model.
  • An unbiased estimator of se2 is given by se2

32
Assessing the Model
  • The least squares method will produce a
    regression line whether or not there is a linear
    relationship between x and y.
  • Consequently, it is important to assess how well
    the linear model fits the data.
  • Several methods are used to assess the model
  • Testing and/or estimating the coefficients.
  • Using descriptive measurements.

33
Outliers
  • An outlier is an observation that is unusually
    small or large.
  • Several possibilities need to be investigated
    when an outlier is observed
  • There was an error in recording the value.
  • The point does not belong in the sample.
  • The observation is valid.
  • Identify outliers from the scatter diagram and
    remove them.

34
Practice Problem
  • A car dealer wants to find the relationship
    between the odometer reading and the selling
    price of used cars.
  • A random sample of 100 cars is selected, and the
    data recorded.
  • Find the regression line.

35
Solution
  • We need to calculate several statistics first

where n 100.
36
Coefficient of Determination
  • A measure of the
  • Strength of the linear relationship between x and
    y.
  • The larger the value of r2, the more the value of
    y depends in a linear way on the value of x.
  • Amount of variation in y that is related to
    variation in x.
  • Ratio of variation in y that is explained by the
    regression model divided by the total variation
    in y.

37
Coefficient of determination
  • To understand the significance of this
    coefficient note

The regression model
Overall variability in y
The error
38
Coefficient of determination
  • When we want to measure the strength of the
    linear relationship, we use the coefficient of
    determination.

39
Conclusion
  1. Used scatter diagram to visualize relationship
    between two variables
  2. Learnt the use of correlation analysis
  3. Described the linear regression model
  4. Explained ordinary least-squares method in
    generating equation
  5. Learnt the limitations of regression and
    correlation analysis

40
Thank You !
Write a Comment
User Comments (0)
About PowerShow.com