Title: Hotel Demand
1Estimating Demand in the Hotel Industry by Mining
User-Generated and Crowdsourced Content
Anindya Ghose (with P. Ipeirotis and B. Li )
Stern School of Business New York University
2Before We Start
- How can I find a 5-star hotel in Miami,
- near the interstate highway with easy
access to beach in an area with lots of
nightlife, and also provides a great price for
what it offers?
3Customer Search in Travel Search Engines
- Rudimentary ranking facilities using a single
criterion - i.e., name, price per night, class, customer
reviews.
Largely ignore - multidimensional preferences of
consumers - location and service characteristics
of the hotels
4Introduction
Customers try to identify hotels with particular
characteristics e.g. location, service,
Search?
Near the Beach,
Near Downtown
Location
Demand
Influence Desirability
Service
Free internet access
24 hour fitness center
No empirical studies have focused on location,
service, and hotel demand. What characteristics
and how to get the data?
5Research Agenda
Problem Locate the hotels that satisfy specific
criteria and offer the best value for the
money.
Challenge Need to quantify the economic weight
of the location- and service-based
characteristics of hotels.
Method Combine structural modeling of demand
estimation with text mining of user-generated
content, on-demand annotations using
crowd-sourcing and image classification to
identify and measure hotel characteristics.
6New Ranking Approach for Hotels
- Consumers ideally like the best product shown
first on the screen - Best product Highest value for money
- Consumers gain utility from product
characteristics (WTP) - Consumers lose utility by paying for product
(Price) - Value for money Difference of the two
- Transaction data from travel search engines
- Compute consumer surplus for each hotel using
location and service characteristics minus price - Rank according to value for money
7Main Data Travelocity hotel reservations
- Our technique is validated on a unique panel
dataset consisting of based on 1500 different
hotels located in the United States for 3 months.
(2008/11 - 2009/02). - Supplemented this dataset with data from
Microsoft, Tripadvisor, Geonames, Amazon and
Google.
8Identification of Hotel Characteristics
- An online anonymous survey
- 100 users on Amazon Mechanical Turk
(AMT/MTurk) - What characteristics do you consider to be
the most important when you choose a hotel?
9Identification of Hotel Characteristics
Location-based hotel characteristics
- Near the Beach
- Near the Lake/River
- Near Public Transportation
- Near Downtown
- Near Interstate Highway
- Number of External Amenities (i.e., near
Restaurants/Shops/Bars/Markets) - Safe Neighborhood
- Number of local competitors
- Convention center, airport, etc
10Identification of Hotel Characteristics
Service-based hotel characteristics
- Hotel Class
- Number of Internal Amenities
- (Aggregation of 23 hotel internal
amenities, i.e., - free breakfast, business center,
high speed - internet, swimming pool, parking,
etc.) - Customer Review (Count, Valence, Text)
- Text mining of hotel reviews on both
Travelocity and Tripadvisor
The service-based hotel characteristics data were
crawled from www.Tripadvisor.com.
11Acquiring Location-based Characteristics
However, all the location-based characteristic
information can NOT be easily derived from the
same way. i.e., Near the Beach vs. Near
Restaurants
12Acquiring Location-based Characteristics
(1) Commercial characteristics are computed via
local search queries using Virtual Earth
Interactive SDK.
A new generation of interactive online mapping
services, providing both a main mapping site and
a JavaScript API.
i.e., Near Restaurants/Shops/Bars/Markets
13Acquiring Location-based Characteristics
- Geographical characteristics with rich textural
information are derived by image classification
with Gabor feature extraction. - 256 256 pixels ? 49
overlapping regions
SVM Classification Accuracy 0.912
SVM Classification Accuracy 0.807
14Acquiring Location-based Characteristics
- (3) Geographical characteristics too hard even
for image - classification algorithms are classified
using on-demand - human annotation through AMT survey.
- 4 different zoom levels for each location to 5
Turkers - Public Transportation , Lake/river, Highway
(4) Characteristics related to neighborhood
safety are acquired from the FBI online
statistics (http//www.FBI.com ).
- City Annual Crime Rate over last 6 years
15(No Transcript)
16Summary Statistics
17Acquiring Hotel Characteristics
Goal Locate the hotel with specific criteria
and the best value for the money.
Estimate the economic value for those
characteristics.
What characteristics
Collection of the data
18Framework of Structural Model
- First, consumer finds a subset of hotels that
matches her own. - Each hotel belongs to one of the following types
of travel category Family Trip, Business
Trip, Romantic Trip, Tourists Trip, Trip with
Kids, Trip with Seniors, Pets Friendly and
Disabilities Friendly. - In order to capture heterogeneity in consumers
travel category, we introduce an idiosyncratic
taste shock similar in flavor to BLP (1995)
model. - Second, once the consumer has picked a specific
travel category, she will make a decision based
on her evaluation of the quality of the hotels. - Pure characteristic model (Berry and Pakes 2007)
to capture the differentiation among hotels
within the same category - Summary Combine the BLP (1995) and Berry Pakes
(2007)
19Structural Modeling
We propose a two-step random coefficient based
structural model in the following form
- jk represents hotel j with category type k (
1k7) - ß and ? are random coefficients that capture
consumers heterogeneous tastes towards
different observed hotel characteristics, X, and
towards price per night, P. - ? represents the set of hotel characteristics
that are unobservable to the econometrician. - e with a superscript k represents a travel
category level taste shock with a Type-I EV
distribution.
20Estimation
- Step 1 Calculating market share.
- Step 2 Solving mean utility.
- Solution is based on contraction mapping
technique
- Step 3 Solving variance of ß and ?.
- Instrumental variables IV for price - Average
price of the same-star rating hotels in the
same market/other markets (Hausman 1994). - Form a GMM objective function using moment
conditions. - Minimize the GMM objective function.
-
21Estimation
Step 1 Calculating market share.
- Market share for hotel within a travel
category type - - PCM-based model
- Market share for each travel category type as a
whole - - BLP-based model
- Final market share for a hotel with a travel
category type.
22Estimation
Step 1 Calculating market share.
- Market share for hotel within a travel
category type - - PCM-based model
- Market share for each travel category type as a
whole - - BLP-based model
- Final market share for a hotel with a travel
category type.
Within-category Market share
Market share for a particular category
23Estimation
Step 2 Solving mean utility.
- Solving mean utility such that the model
predicted market - share equates the observed market share.
- Solution is based on contraction mapping
technique.
Step 3 Solving variance of ß and ?.
- Instrumental variables IV for price -
Average price of the same-star -
rating hotels in the same market. - Form a GMM objective function using moment
conditions. - Iterate step 1, 2 and 3 to minimize the GMM
objective function.
24Identification (BLP (1995) and PCM (2007) models)
- (i) Monotonicity sj is weakly increasing and
continuous in ?j and weakly decreasing in ?j-1,
where ?j -1is the unobserved characteristics for
the rival-products. - (ii) linearity of utility in ? - if ? for every
good is increasing by an equal amount, then no
market share changes, and - (iii) substitutes with some other good - every
product must be a strict substitute with some
other good.
25Economic Value of Characteristics
.
I. At least 1 review from either TA or
TL. II. Reviews gt5. III. Review gt10.
26Hotel Characteristic Impact
- Positive Impact
- Beach
- Interstate Highway
- Downtown
- Public Transportation
- Hotel Class
- Hotel External Amenities
- Hotel Internal Amenities
- Negative Impact
- Price
- Annual crime rate
- Number of competitors
- Lake
- Spelling errors
- Syllables
- Complexity
- Subjectivity
27Marginal Effects
28Marginal Effects
29Robustness Checks
- Sample consisting of those hotels that have at
least one review from either Travelocity or
TripAdvisor. - Estimations after extracting individual service
features from the text of reviews. - Estimations with hotel brand, convention center,
distance from airport, etc. - Estimations with Google Trends data to control
for endogeneity of WoM and sales. - Estimations with BLP (1995) model and PCM (2007)
models. - Estimations across only those cities where all
location features present.
30Robustness Test (I) - Using Alternative
Sample Split
.
IV. At least 1 review from TA. V. At least 1
review from TL. VI. At least 1 review from both.
31Robustness Test (II) - Using an Alternative
Model - BLP
.
32Text mining method to extract score service
features
- Use a POS (part-of-speech) tagger to identify
frequently mentioned nouns and noun phrases,
which we consider candidate hotel features. - Clustering using wordnet and a context-sensitive
hierarchical agglomerative clustering algorithm
(Manning and Schutze 1999), into set of similar
nouns and noun phrases. - We keep the top-5 features since they covered 80
of the hotels in our data. - Hotel staff, food quality, bathroom, parking
facilities, and bed quality. - Extract all the adjectives and adverbs that are
being used to evaluate the individual features. -
- Used AMT to create the ontology with scores for
each evaluation phrase (Ghose et al. 2008). - AMT workers look at the pair of the evaluation
phrase together with the product feature, and
assign a grade from -3 (strongly negative) to 3
(strongly positive) to the evaluation. - Dropped the highest and lowest evaluation score,
and used the average of the remaining evaluations
as the externally imposed score.
33Robustness Test (III) - Using Additional
Features
.
Consistent with Pakes (2003) and Archak et al.
(2008)
34Model Fit With UGC vs. Without UGC
35Model Validation
36Counterfactual Experiments
- Simulate a dataset with 6000 observations based
on the distribution of the original hotel data - Compute the corresponding utility for hotels in
the simulated dataset based on our model, using
our prior set of estimates.
37Counterfactual Experiments (1)
1. Marginal Effects Under Different Location
Environments.
- Goal
- Examine the robustness for the rank order of
marginal effects of the - location features in areas with no beach, or no
transportation, etc.
- Treatment
- Generate 6 derivative samples, by assuming each
of the 6 location features - (beach, downtown, highway, lake, trans,
external) to be absent, one at a time - Re-compute the corresponding utility for
hotels, with the corresponding - absent location feature value zero.
- Re-estimate with the updated utilities and the
remaining features.
- Finding
- The rank order of marginal effects for the
remaining location features - stay consistent with our original baseline
estimates.
38Counterfactual Experiments (2)
2. Effects of Competition Under Different
Location Environments.
- Goal
- Examine the effect on demand from the entry of
one local competitor under - different location environments.
- Treatment
- Consider 2 different types of location feature
combinations - Type 1 beach and highway (typical
west/south coast setting) - Type 2 downtown, transportation and
external amenities (typical big city setting) - Generate 2 derivative samples correspondingly,
by assuming the unrelated - location features in each of the two types
to be absent (valuezero) - Re-compute the utility and re-estimate the
model.
- Finding
- Demand drop in big city is 1.5 times larger
than that in coastline.
39Counterfactual Experiments (3)
3. Effects of Changes in Pricing Policy Under
Different Location Environments.
- Goal
- Examine how price change will affect hotel
demand under different - location environments.
- Treatment
- Consider the same two derivative samples as in
Experiment (2) - Assume a price cut by 20.
- Finding
- Increase in demand is lower in big city than
that in coastline.
Consumers in big cities are less sensitive to
price.
40- 3 - Effects of competition under different
location environments. - of Competitors increases by 1 Price Cut 20
- (i) Beach Highway -0.46 1.43
- (ii) Downtown, Transportation Amenity -0.70
1.18 - Baseline -0.59 2.31
41Value for Money Based Ranking
- We propose a ranking approach for hotels based on
the value for money of each hotel for consumers
on an aggregate level. - This ranking idea is based on how much extra
value consumers can obtain after paying for that
hotel. - If a hotel provides a comparably higher value for
money for consumers on an aggregate level, then
it should appear on the top part of our ranking
list. - Higher ranked hotels can provide consumers with
higher surplus (WTP) value, thus should be more
often recommended to consumers.
42Results Based on Consumer Surplus Estimation
(Best Value for Money)
43Ranking Evaluation - User Study
(1) Comparison with blinded lists Hide all the
titles, and conduct pair wise comparison with
each of the other 9 competing alternatives.
44Ranking Evaluation - User Study
New York City Los Angeles San Francisco
Orlando New Orleans Salt Lake City
45Ranking Evaluation - User Study
Explanations from users -Diversity 30
5-star, 40 4-star, 30 3-star (and
lower) -Price is not the only factor multi
dimensional preferences are taken into account.
Our reasoning Based on qualitative opinions of
users, diversity is indeed an important factor
that improves the satisfaction of consumers. Our
economic-based ranking approach seems to
introduce diversity more naturally.
46On-going Work
Derive personalized consumer surplus by
incorporating consumer demographics. (i.e., age
group, travel purpose)
47Personalized Model
- Examine the interaction effect between consumer
demographics and hotel characteristics - Derive personalized ranking based on individual
utility, conditional on consumer demographics
(i.e., age group, travel purpose).
48Weights of Hotel Characteristics Based on
Different Travel Purposes
Consumers with different travel purposes assign
different weight distributions on the same set of
hotel characteristics.
49User Study
Experiment 2 Blind pair-wise comparisons, 100
anonymous AMT users baseline generalized
CS-based ranking (for an average consumer). E.g.,
Business trip and family trip AMT user study
results in the NYC experiment.
Conclusion Personalized CS-based ranking is
overwhelmingly preferred.
Reasoning Capture consumers specific
expectations, dovetail with their real purchase
motivation.
50Estimation Results Capture Consumers Real
Motivation
e.g., In the user study, business travelers
indicated that they prefer quiet inner
environment and easy access to highway and
public transportation. This was fully captured in
our estimation results, see (b).
51Conclusion
- We empirically estimate the economic impact of
hotel characteristics, using - user-generated and crowd-sourced content
- structural modeling, automated image
classification, automatic text mining on-demand
surveys
New Ranking System for Hotels on Travel Search
Engines
http//hyperion.stern.nyu.edu/mturk/travel.html
52AMT demographics survey
- Surveyed AMT workers about their place of origin
and residence, gender, age, education, income,
marital status, household size, and number of
children. - We also asked them about the time that they spend
every week on AMT, the amount of work that they
complete, the payment they receive, and their
reasons for participating on AMT. - To ensure consistency in results, we conducted
the survey six times, once a month in 2009. - The results of the surveys suggest that AMT
participants are well representative of the
overall Internet population. - Also asked them about their experience with
visits to online travel search engines
Tripadvisor and Travelocity