Title: Using Demographic Variables to Identify ProfitEarning Facility Locations
1Using Demographic Variables to Identify
Profit-Earning Facility Locations
2Objectives
- By analyzing data from different sources we would
like to extract knowledge and create a decision
support system for high level business
executives. - In particular, based on derived indicators we
would like to locate stations of a particular
brand that have a large probability to be highly
profitable in the future.
3Outline
- Decision support and analysis architecture
- Data sources
- Catchment area models
- Traffic modelling
- Correlation analysis
- Variable distribution analysis
- Future work
4Data sources an analysis architecture
GSD
Decision support
5Data to be analyzed
- Locations and revenues for 314 Q8 stations
- Locations for approx. 1900 competitors
- Top-level road network with traffic load
- Total street network
- conzoom variables
- Number of households with cars
- Income
- Number of long distance commuters
- Locations and types (size) of retailers
- Commuter behavior
SA
6Defining possible indicators
- Location, location, location! Spatial
density indicators - Segment Denmark into areas served by a station
- Count variables for segments and associate these
spatial density indicators with the corresponding
stations
Station X
7How to segment service area among gas stations?
- Non-overlapping segmentation
- based on straight distance
- based on road distance
- Problem Areas assigned to gas stations are
uneven and intuitively unjustified! - Overlapping segmentation
- Fixed area circles
- Centered and off-centered
- Problem Very similar catchment areas!
- Based on human intuition we select the
off-centered segmentation approach with radius
4500m and maximum offset 3000m.
8Off-centered circular catchment areas
9How to segment service area?
- Though the service areas considered take
competition into account they disregard
information about the underlying road network. - Solution
- Equal driving-distance areas
- Equal driving-time areas
10Modelling traffic (not performed in experiments)
- 3 types of traffic
- Passing through traffic induced by travellers and
tourists (irregular or rather seasonal) - Traffic induced by commuters (regular)
- Local traffic
- Data available for modelling
- Number of persons commuting daily between
municipalities, parishes or conzoom clusters - Average number of cars travelling on road
segments of Denmark (highways and main roads
only) - Number of households with one or more cars
- Number of individuals travelling more than 50 km
to work and back - Assume that people who have cars use them
11Commuter traffic
- Assume that in a given area the distribution of
people travelling from that area to another area
is uniform. - To calculate the road usage we assume that people
take the shortest road possible and travel to the
center of the destination area.
12Experiments and results
- Setup
- Off-centered circular catchment areas with R
4500m - Derived variables
- competition
- traffic load
- population density
- demographic
- travel behaviour
- Target variables
- sales volume
- station characteristic
- performance (actual profit / budget)
- Correlation analysis
- Between derived and target variables, but
- What are out target variables?
- Variable distribution analysis
- Can we find variables that separate good and bad
performing gas stations?
13Correlations between spatial density indicators
and target variables
competition
traffic load
pop. density
Derived Variables
demographics
travel behaviour
sales volume
Target Variables
Area of interest
station characteristics
performance
14What are our target variables?
- Sales volume or Performance?
There is only low correlation, if any, between
sales volume and performance!
15Good station, bad stations
- Ultimate goal is to find measures that are useful
for dividing well-performing stations from
badly-performing stations. - Separate gas stations based on performance index
(actual profit / budget) - We hope to see more crisp correlations
16Correlations for the Top 25 stations
17Correlation for the Bottom 25 stations
18Variable distribution analysis
- Correlations are a good way to find interesting
but not necessarily useful variables - Useful variables have a different distribution
for good and bad performing gas stations
- Visually inspect interesting variables for
different performance areas fuel, retail, wash
19Household-variable for fuel performance
There is a better chance to have a good
performance in dense household areas!
20Household income 80-100-variable for fuel
performance
There is a better chance to have a good
performance where the household income is high!
21Quick pump count-variable for fuel performance
Seems like the optimal number of quick pumps to
have a good performance on fuel is 6!
22Household income 80-100-variable for retail
performance
There is a better chance to have a good
performance where the household income is high!
23Pump count-variable for retail performance
Seems like the optimal number of pumps to have a
good performance on retail is 6!
24Quick pump count-variable for retail performance
Moreover the 6 pumps should be quick pumps!
25So we have found some useful variables that have
a potential to separate well-performing gas
stations from badly-performing ones!
26How to exploit useful variables?
- Individual variables are usually not robust for
prediction - In general a combination of a few variables
enhances prediction performance dramatically - Machine learning methods are available for this
task - Regression
- Decision trees
- Support vector machines
- Neural networks
- Genetic algorithms
- Vector quantization, Clustering, Self-organizing
maps
27Future work
- Implement more sophisticated catchment areas
based on underlying road network - Refine performance measures
- Further investigate robustness of variables
- Incorporate commuter travel behaviour
- Use machine learning methods to optimize
prediction performance - Directly incorporate competitors in the model
28Summary
- Devised some catchment area models
- Defined several spatial density measures based on
various data sources - Performed correlation analysis
- Found some interesting variables
- Performed distribution analysis
- Found some useful variables
- Some of the variables have intuitive meaning
- Sketched out future work to reach optimal
prediction performance.