WFM 5201: Data Management and Statistical Analysis - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

WFM 5201: Data Management and Statistical Analysis

Description:

WFM 5201: Data Management and Statistical Analysis Lecture-6: Correlation and Regression Analysis Akm Saiful Islam Institute of Water and Flood Management (IWFM) – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 17
Provided by: acb46
Category:

less

Transcript and Presenter's Notes

Title: WFM 5201: Data Management and Statistical Analysis


1
WFM 5201 Data Management and Statistical Analysis
Lecture-6 Correlation and Regression Analysis
  • Akm Saiful Islam

Institute of Water and Flood Management
(IWFM) Bangladesh University of Engineering and
Technology (BUET)
June, 2008
2
Correlation
  • Correlation is concerned with describing the
    direction (positive or negative) and strength of
    a relationship between two variables.
  • Correlation makes no distinction between the two
    variables (it is a measure of how they vary
    jointly), whereas regression theory depends on a
    dependent variable being affected by an
    error-free independent variable.

3
Correlation coefficient
  • The direction and strength of the relationship
    can be expressed by means of a correlation
    coefficient r, which is mathematically defined
    as
  • The sum of cross products of deviations

4
Correlation coefficient
  • The sum of squared deviations for X
  • The sum of squared deviations for Y

5
Pearsons r
6
Correlation coefficient
  • A correlation coefficient varies from -1 to 1
  • -1 indicating a perfect negative relationship
    (one increase while other decrease),
  • 0 indicating no relationship
  • 1 indicating a perfect positive relationship.
  • The size of the correlation indicates the
    strength of the relationship for example, the
    correlation coefficient -0.89 indicates a
    stronger relationship than a coefficient of 0.60.

7
Linear Regression
  • Regression is primarily concerned with using the
    relationship for the purpose of predicting one
    variable from knowledge of the other
  • Correlation, on the other hand, is primarily
    concerned with discovering whether or not a
    relationship exists in the first place, and then
    specifying the strength and direction of this
    relationship.

8
Linear Regression
  • The simple linear regression equation is given
    as
  • X given data
  • b0 intercept of regression line
  • b1 slope of regression line

It is also known as least squares method
9
Regression line
10
Coefficient of Regression
11
Coefficient of Determination
  • The decomposition of the sample variation of
    leads to a measure of the "goodness of fit",
    which is known as the coefficient of
    determination and denoted by R2.

Note
12
Coefficient of determination
  • is a measure commonly used to describe how well
    the sample regression line fits the observed
    data.
  • Range
  • 0 means poorest , 1 best fit of regression model

13
Exercise-1 Fit regression equation between Boro
production and rainfall and find R2
Year Boro Production Rainfall
1975-76 424536 216
1976-77 152273 319
1977-78 437007 164
1978-79 278287 141
1979-80 417225 237
1980-81 500207 197
1981-82 395940 255
1982-83 418170 221
14
Deviations or Errors
  • The sum of squares of these deviations from the
    fitted line is

Total Explained unexplained deviation
deviation deviation
15
Total, explained, and unexplained deviation
16
Regression diagnostics
  • Patterns for residual plots (a) satisfactory (b)
    funnel, (c) double bow (d) non-linear
Write a Comment
User Comments (0)
About PowerShow.com