Title: Automatic Identification of User Goals in Web Search
1Automatic Identification of User Goals in Web
Search
- Uichin Lee, Zhenyu Liu, Junghoo ChoComputer
Science Department, UCLAuclee, vicliu,
cho_at_cs.ucla.edu
2Motivation
- Users have different goals for Web search
- Reach the homepage of an organization (e.g.,
UCLA) - Learn about a topic (e.g., simulated annealing)
- Download online music, etc.
- Can we identify the user goal for a Web search
automatically? - Improve and customize search results based on the
identified user goal, for example
3Two high-level user-goals
- Navigational query
- Reach a Web site the user already has in mind
(e.g., UCLA Library) - Informational query
- Visit multiple sites to learn about a particular
topic (e.g. Simulated Annealing) - Based on Broder02, RoseLevinson04
- Navigational and informational are common in both
studies
4Exploiting identified user goals
- Tailored weighting/ranking mechanism
- Navigational queries
- Emphasize on anchor texts Craswell01, Kang03,
URL path Westerveld01 - Informational queries
- Emphasize on page content Kang03, IR techniques
(query expansion, relevance feedback, pseudo
relevance feedback, etc.) - Tailored result presentation
- Informational queries
- Clustered search results Etzioni99, Zeng04,
Kummamuru04 - Targeted ads / answers
5Outline
- Are query goals predictable?
- Human-subject study
- How can we predict user goals automatically?
- Anchor-link distribution
- User-click distribution
- How effective are our features?
- Experimental evaluation
6Are query goals predictable?
- Search engines see only a few keywords
- No explicit indication of goals by users
- Can we predict the user goal simply from the
keywords? - Human subject study
- 50 most popular Google queries from UCLA CS
- 28 participants (grad students) from UCLA CS
- Ask subjects to indicate the likely goal of each
query if they had issued it - Do most subjects agree on a particular goal?
7Human subject study results
- i(q) the of participants that judge query q
as informational - e.g., i(q) 0.038 forUCLA Library
8Human subject study results
- i(q) the of participants that judge query q
as informational - e.g., i(q) 0.038 forUCLA Library
43.5 software names 30.4 person names
9Human subject study results
- i(q) the of participants that judge query q
as informational - e.g., i(q) 0.038 forUCLA Library
- After removing software and person-name queries
10Human subject study summary
- Majority of queries have predictable goals
- Interestingly, most ambiguous queries tend to be
on a certain set of topics - Topic-based ambiguity detection may be possible
- Treat ambiguous queries differently from others
11Outline
- Are query goals predictable?
- Human-subject study
- How can we predict user goals automatically?
- How effective are our features?
- Experimental evaluation
12How to predict user goal?
- UCLA Library vs. Simulated Annealing
- Navigational vs. informational
- Semantic analysis necessary?
- Our idea use information provided implicitly by
Web users - Web-link structure
- User-click behavior
13Web-link structure
- Anchor-link distribution to quantify the link
structure
www.ucla.edu/library.html
repositories.cdlib.org/uclalib/
www.library.ucla.edu
14Web-link structure
- Anchor-link distribution to quantify the link
structure
Anchor-link distribution for query UCLA Library
www.ucla.edu/library.html
repositories.cdlib.org/uclalib/
www.library.ucla.edu
15Anchor-link distribution for sample queries
Simulated Annealing
UCLA Library
Navigational
Informational
16User-click behavior
- Click distribution to quantify past user-click
behavior
Click distribution for the navigational query
UCLA Library
17User-click behavior (contd)
Simulated Annealing
UCLA Library
Navigational
Informational
18Capturing the shape of distributions
- Possible numeric features for f(x)
- Mean ?
- Median
- Skewness ?(x - ?)3?f(x)?dx / ?3
- How asymmetric f(x) is
- Kurtosis ?(x - ?)4?f(x)?dx / ?4
- How peaked f(x) is
- Single linear regression
- Median is the most effective measurement for both
anchor-link distribution and click distribution
19Evaluation of features
- Based on 30 queries from the human subject study
- Except software and person-name queries
- Each query is associated with a distinct user
goal - Anchor-link distribution for each query
- Based on 60M pages crawled from the Web
- Click distribution for each query
- Based on Google-result click behavior from UCLA
CS during April 2004 - September 2004
20Goal-prediction graph (synthetic)
navigational
informational
?
An effective feature (hypothetically)
21Prediction graph median of anchor-link dist.
- Navigational iff median lt ?1 1.0
- Navigational queries the vast majority of links
point to the1 anchor destination - Prediction accuracy 80.0
navigational
informational
?1 1.0
22Prediction graph combining the two features
- Linear combination with equal weights
- Navigational queries iff
- the median of click dist.
- the median of anchor-link dist.
- lt ?1 ?2 ( 2.0)
- Prediction accuracy 90
navigational
informational
?1?2 2.0
23Comparison with previous work
- Three features in Kang and Kim 03
- Anchor usage rate
- Query term distribution
- Term-dependence
24Summary
- Two effective features for goal identification
- Anchor-link distribution (Web-link structure) and
click distribution (user-click behavior) - Achieved an overall accuracy of 90 on a
benchmark query set - More details in the paper
25Future work
- Evaluate on a larger and less biased query set
- Handle queries with insufficient anchor/click
statistics - Learn patterns from queries whose goals are clear
- Predict search intentions on a finer granularity
- Informational queries can be further classified,
e.g., directed, undirected, advice, list, etc.
Rose04 - Analyze the contents of Web pages that users have
clicked/viewed - Linguistic methods
26Thank you
27Questionnaire design
- 1st version direct classification by subjects
- Navigational vs. informational
- Some confusion
- Alan Kay home page other pages
- Have a site in mind? vs plan to visit one
site? - 2nd version
- Have a site in mind. Intend to visit only that
site - Have a site in mind. But willing to visit others
- Have no site in mind. Willing to visit anything
relevant