West Philadelphia and South Philadelphia Housing Correlations - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

West Philadelphia and South Philadelphia Housing Correlations

Description:

With Housing and Rent very close to each other. ... It came up that the p-value was beyond the 0.05 for rent. ... Rent is a good predictor of house value. ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 19
Provided by: Tom3302
Category:

less

Transcript and Presenter's Notes

Title: West Philadelphia and South Philadelphia Housing Correlations


1
West Philadelphia andSouth Philadelphia Housing
Correlations
  • Tom DiMarco
  • CIS 61 Programming in Matlab
  • June 27, 2006

2
Where did was the data from?
  • University of Pennsylvania Databank on
  • West Philadelphia and Pilot Project for
  • South Philadelphia.
  • Excel format files (.xls).
  • Grouped by Census Tracts - Grouping Census Uses.
  • Show Geographic Areas of West and SW Philadelphia
    on Internet at site http//westphillydata.library.
    upenn.edu/

3
How did I analyze the data?
  • Used Excel Data Analysis Tool
  • Used Matlab originally with my own function I
    found from a book which defines R2 as
  • (ßXY SUM(Y)2/n)/(YY-SUM(Y)2)
  • Used the regress function which gave back R2,
    p-value, F-statistic, and standard error (mse).
  • Used Yilians function for learning and
    predicting for mse.

4
What method did the Matlab Program to use?
  • Had some read functions to read in housing data.
    Took name of Housing File to Read from.
  • Had function generateFeatureComparison which
    was a main function that took the Housing Data,
    Housing Zeros Vector, name of Excel file then
    returning R2.
  • In this function was a call to read the feature
    data, get the zeros, union the zeros of this
    feature and housing zeros to get two matrices
    which where the matrices to compare without the
    zero data. Then called linearRegression which
    was my own function built on the formula
    enumerated. Then called regress to compare how
    the regression worked and got a p-value.

5
What was special about the program?
  • It made a 150 samples with replacement similar to
    what we did in lab.
  • It had Data Normalization. To make the data more
    centralized and generate more apparent
    correlations?

6
What was used for Data Normalization?
  • Three methods where used
  • Digital Scaling which uses the formula
  • y x./10k where k is the max(abs(y))
  • this produces values between 0,1
  • Data Min Maxing which uses the formula
  • X X - min(X))/(max(X)-min(X))(nmax-nmin)nmi
    n
  • this produces a nmin,nmax I chose 2 and -2
    just like Yilian did in her example paper.
  • Data Zero Mean was the one I used to find the
    results primarily as this was Yilians suggestion
    the formula is
  • X (X mean(X))/std(X) this produces values
    between 0,1.

7
What did I find out?From 1990 West Philadelphia
Data
  • Income is surprisingly not the greatest predictor
    in these sections of Philadelphia as to where
    people live.
  • R2 was only .492 and mse was 10.4.
  • Rent is a decent predictor of Housing value
  • R2 was .682 and mse was 11.2.
  • Homeownership was a decent predictor of Housing
    value.
  • R2 was .661 and mse was 6.3.

8
What did I find out?From 2000 West Philadelphia
Data
  • 12 Features.
  • Housing Value vs. Ethnicity (White, Black, Other,
    and Hispanic) did not have a correlation except
    for Black had a correlation of .483 and mse of
    11.6.
  • Living in a residence or geographic area for a
    period of 5-years did have a significant
    correlation with R2 around 1 and mse at 0.37.

9
What did I find out?From 2000 West Philadelphia
Data
  • High School correlation was relatively high at
    .874 and mse of 7.
  • Non Manual Worker, Unemployment, Overcrowding,
    and Poverty had significant correlations with R2
    near 1 and mse of about .38.
  • Vacancy had a R2 of .446 and mse of 11.5
  • Number of Family units did not have that much of
    a correlation.

10
What did I find out?From 2000 West Philadelphia
Data
  • Average Family size had a significant correlation
    near R2 of 1 and mse of .34.
  • So the conclusion of my findings is that Same
    House (Past 5-years), Same County (Past Five
    Years), Non Manual Workers, Overcrowding, Average
    Family Size all have remarkably high
    correlations. Then comes HS graduate and lastly
    Blacks and Vacancy.

11
What did I find out?From 2000 Southwest
Philadelphia Data
  • In the interest of brevity I will just state the
    conclusions for SW Philadelphia. The features
    with the highest correlation where Same House
    (Past 5-years), Same County (Past- 5 years),
    From another county (5 years ago), High School
    Graduate, Non Manual Workers, Unemployment,
    Overcrowding, Poverty, and Vacancy.

12
What did I find out?From 2000 Southwest
Philadelphia Data
  • Housing again had a stronger correlation perhaps
    from the neighborhood being better with R2 at
    .709 and mse 9.3.
  • Rent had a correlation of R2 at 71.1 and mse at
    9.8
  • Conclusion many of the features are seen as being
    close to one in both West Philadelphia and SW
    Philadelphia. With Housing and Rent very close
    to each other.
  • Vacancy was the only real exception perhaps
    because SW Philadelphia is in a better
    neighborhood.

13
What did I learn from the Matlab Program?
  • Matlab handles Matrix operations superbly.
  • Normalization is a powerful technique to get the
    data to be within a range to be analyzed.
  • The Data Normalization were about the same except
    for Decimal Scaling which gave some different
    answers than the ones I was getting. The other
    two methods worked quite well.

14
What happened with the Excel Data Analysis for
West Philadelphia 1990?
  • I analyzed the three features which had R2 of
    .744.
  • It came up that the p-value was beyond the 0.05
    for rent.
  • So I built a linear equation based off of all
    values but this after running another analysis
    without rent.
  • This was
  • Y ßX ßX ß so
  • Y 3.3920X (Income)
  • -942.0553X (Home Ownership)
    23472.1603.
  • This linear equation had an R2 of .735.

15
What happened with the Excel Data Analysis for
West Philadelphia 2000?
  • I analyzed the 12 features.
  • This multilinear regression had a R2 of .744.
  • The p-values for where below 0.05 except for HS
    Grad and Blacks so I reran the Excel Analysis
    without this data and came up with an a linear
    equation
  • Y ßX ßX ßX ßX ßX ßX ßX ßX
    ß Y2012.0562X(Whites)3706.9261X(Hispanic)-150269
    97.65(Same House 5 -Years)-
  • 10776738.57(Same County 5 Years)-7618762.411X(No
    n-Manual)- 42139526.18(Unemployment)-4198729.4
    42X(Avg. Family Size) 5117.7525(Change in
    House Units) 9102241.005
  • This regression had an R2 of .700.

16
What happened with the Excel Data Analysis for
South West Philadelphia 2000?
  • I analyzed the seven features which had an R2 of
    .829.
  • The p-values where all out of range of 0.05
    except for HS Graduate.
  • A linear model could not really be formed.
  • Maybe not enough samples.

17
What are some conclusions?
  • Income is not that big a factor maybe because not
    in upscale neighborhood (One obviously needs
    income however).
  • Rent is a good predictor of house value.
  • Homeownership is a good predictor of house value.
  • The social features all had high correlations
    (Same House (Past 5-years), Same County (Past- 5
    years), From another county (5 years ago),
    High School Graduate, Non Manual Workers,
    Unemployment, Overcrowding, Poverty, and
    Vacancy for SW Phila).
  • Ethnicity not really a factor.

18
What does running Matlab look like?
  • Run the Project script for West Philadelphia.
  • Show the Matlab program, how the R2 and mse
    values as they are generated.
Write a Comment
User Comments (0)
About PowerShow.com