Title: Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods
1Delineating Metropolitan Housing Submarkets with
Fuzzy Clustering Methods
- Julie Sungsoon Hwang
- Department of Geography, University of Washington
- Jean-Claude Thill
- Department of Geography, State University of New
York at Buffalo
November 10, 2005 North American Meetings of
Regional Science Association International
2Outlines
- Research objectives
- Methodology specification
- Methodology illustration
- Evaluating the performance of fuzzy clustering
- Conclusions
3Research objectives
- Demonstrate the use of fuzzy c-means (FCM)
algorithm for delineating housing submarkets - Comparison to K-means
- Discuss empirical characteristics of FCM applied
to given applications, in particular choice of
parameters - Cluster validity index
4Challenges
- Are the boundaries of clusters crisp?
5- Methodology specification
6- Our task is to group census tracts to homogeneous
housing submarkets within a metropolitan area - Using fuzzy c-means algorithm
- In order to examine whether fuzzy set-based
clustering can do the better job - Implemented in 85 metropolitan areas
- Most of data set are public (e.g. 2000 Census)
- The whole procedure is automated in GIS
7Methodology flow chart
For each metropolitan area
Candidate variables
National
x1 x2 x3 xm
1
2
3
n
Regional
Metro
Local
Uj membership to cluster j
Cluster Analysis
U1 U2 Uc
1 1 0 0
2 0 1 0
0 1 0
n 0 0 1
Significant variables
y1 y2 yk
1
2
3
n
U1 U2 Uc
1 0.85 0.05 0.10
2 0.12 0.80 .. 0.05
0.02 0.74 0.12
n 0.40 0.03 0.50
(c n)
k selected variables
c submarkets
8Explanatory variables for house price
Var_Name Variable Definition Data Year Spatial Unit
Socioeconomic/demographic Characteristics of Residents Socioeconomic/demographic Characteristics of Residents Socioeconomic/demographic Characteristics of Residents Socioeconomic/demographic Characteristics of Residents Socioeconomic/demographic Characteristics of Residents
pcincome per capita income Census 2000 Census Tract
college college degree Census 2000 Census Tract
managep management workers Census 2000 Census Tract
prodp production workers Census 2000 Census Tract
famcpchl family with children Census 2000 Census Tract
nfmalone nonfamily living alone Census 2000 Census Tract
black_p black Census 2000 Census Tract
nhwht_p non-hispanic white Census 2000 Census Tract
nativebr native born Census 2000 Census Tract
Structural Characteristics of Housing Units Structural Characteristics of Housing Units Structural Characteristics of Housing Units Structural Characteristics of Housing Units Structural Characteristics of Housing Units
medroom median number of room Census 2000 Census Tract
hudetp detached housing unit Census 2000 Census Tract
yrhublt median year structure built Census 2000 Census Tract
Locational Characteristics (Amenities) of Neighborhoods Locational Characteristics (Amenities) of Neighborhoods Locational Characteristics (Amenities) of Neighborhoods Locational Characteristics (Amenities) of Neighborhoods Locational Characteristics (Amenities) of Neighborhoods
ptratio pupil to teacher ratio NCES 2002 School District
schexp school expenditure per student NCES 2002 School District
vrlcrime violent crime rate FBI 2003 Designated Place
prpcrime property crime rate FBI 2003 Designated Place
jobacm job accessibility (Hansen 1959) CTPP 2000 Census Tract
National Center for Education Statistics FBI
annual report Crime in the U.S. 2003
CTPP Census Transportation Planning Package
Dependent variables median home value of
owner-occupied housing units
9Study set 85 metropolitan areas
10What is fuzzy c-means (FCM)?
- Clustering method that minimizes the following
objective function
Vectors of data point, 1 k n Center of
cluster i, 1 i c Membership degree of data
point k with cluster i 0,1 Fuzziness amount
associated with assigning data point k to cluster
i, 1 m 8
- Updates cluster means vi and membership degree
uik until the algorithm converges
(III-3a)
(III-3b)
Source Bezdek 1981
11FCM missing elements
- Optimal number of clusters c
- Optimal fuzziness amount m
m
c
FCM
12Extended fuzzy c-means algorithm
- Step 1 Initialize the parameters related to
fuzzy partitioning c 2 (2 c ? cmax), m 1
(1 m ? mmax), where c is an integer, m is a
real number Fix minc where minc is incremental
value of m ( 0 lt minc 0.1) Fix cut-off
threshold ?L Choose validity index v - Step 2 Given c and m, initialize U(0) so that it
becomes the fuzzy matrix. Then at step l, l 0,
1, 2, . - Step 3 Calculate the c fuzzy cluster centers
vi(l) with (III-3a) and U(l) - Step 4 Update U(l1) using (III-3b) and vi(l)
- Step 5 Compare U(l) to U(l1) in a convenient
matrix norm if U(l1) U(l) ?L to go
step 6 otherwise return to Step 3. - Step 6 Compute the validity index for given c
and m - Step 7 If c lt cmax, then increase c ? c 1 and
go to step 3 otherwise go to step 8 - Step 8 If m lt mmax, then increase m ? m minc
and go to step 3 otherwise go to step 9 - Step 9 Obtain the optimal validity index from ,
optimal number of clusters c, and optimal amount
of fuzziness exponent m The optimal fuzzy
partition U is obtained given c and m
13Cluster validity indices
Partition coefficient
Partition entropy
SVi index where w is set to 2 in this study
Xie-Beni index
14Determining c and m
- Selected validity indices are calibrated over the
study set - Xie-Beni index is recommended as a validity index
- Average m is 1.38
15Histogram of m for FCM
16 17Median home value of Buffalo, NY
18Dimensionality of Buffalo housing market
Hedonic regression equation of median home value
in Buffalo, NY
Predictor Coefficient Standard Error t-statistics p-value
Constant -1455768 164417 -8.85 0.000
Per capita income 2.3667 0.2791 8.48 0.000
college degree 88221 11346 7.78 0.000
family couple with children 65735 18775 3.50 0.001
detached housing unit -31260 5527 -5.66 0.000
Housing age (year) 692.88 80.26 8.63 0.000
non-hispanic white 11186 3914 2.86 0.005
native born status 130039 31111 4.18 0.000
Job accessibility -0.05266 0.02227 -2.36 0.019
Adjusted R sq 84.3
19Optimal number of housing submarkets c, Optimal
fuzziness amount m, Buffalo, NY
c m 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9
2 0.4735 0.4570 0.4380 8.0983 10.4115 12.5478 14.4334 16.0634 17.4645 18.6721
3 0.4136 0.3889 0.3460 0.3385 10.7864 12.9137 14.7939 16.4217 17.8290 19.0553
4 0.7802 0.7116 0.6080 0.5241 1.3154 6.8837 7.4807 8.0441 8.5632 9.0391
5 0.5560 0.5622 0.5940 0.6121 0.4683 0.3404 0.6489 0.6850 0.7206 0.7555
6 0.6223 0.7578 1.0187 0.8173 0.6907 1.3393 1.4074 1.4819 1.5595 1.6382
7 0.8836 0.6903 0.6881 0.6016 0.6148 0.9515 2.4397 2.6306 2.8317 3.0383
8 0.5981 0.5888 0.5703 0.5232 0.3992 0.7381 0.8910 1.2388 1.2926 1.3538
9 0.9645 0.6160 0.4836 0.4866 0.8449 1.4020 1.4198 1.8317 1.8639 1.9161
10 0.7053 0.6004 0.6619 0.5873 0.5868 1.3465 1.5081 1.6875 1.8215 1.8591
c 3 3 3 3 8 5 5 5 5 5
Values in the cell represent Xie-Beni index given
c and m
20Buffalo housing submarkets
c 3 m 1.3
21- Evaluating the performance of fuzzy clustering
22Compare FCM with K-means (KM)
- Compare the sum of squared error
derived from KM (m1) and FCM (mm) given
c - Fuzzy clustering outperforms crisp clustering
23Conclusions
- Fuzzy set theory provides a mechanism for
uncertainty handling involved in classification
task - Fuzzy c-means algorithm is of practical use in
delineating housing submarkets - Fuzzy set theory needs further attention in
social science fields - More works on the choice of parameters are needed