Title: Automatic Construction of Multifaceted Browsing Interfaces
1Automatic Construction of Multifaceted Browsing
Interfaces
- Wisam Dakka Columbia University
- Panagiotis G. Ipeirotis NYU
- Kenneth R. Wood MSR Cambridge
2Why Guided Navigation?
- Typical search for a product name
- Multifaceted hierarchies are superior than
single, monolithic hierarchies - Allow users to browse across multiple dimensions
- Expose the contents of the underlying collection
and can help users more quickly locate items of
interest
3Roadmap
- ?Identify manually the dimensions/facets that can
be used to browse a collection - Type, country, price, grape, diet
- ?A technique for extracting facets
- ? Create manually the hierarchies for each
dimension - The countries hierarchy
- ?An efficient construction algorithm
- Ranking categories within hierarchies
- How to show the best categories first?
- ?Ranking schemes
- ?Extensive experiments
4Extracting Important Navigational Facets
Motivation
- Many collections with metadata organized across
different facets - Corbis royalty-free collection
- A set of 36,820 annotated images
- Each image has a title, a free-text description,
and a set of associated keywords - Total of 65,521 keywords, mainly assigned to 14
out of the 38 facets - And many others, like the wine collection we used
early - The task for a given image with its metadata,
extract a set of proper facets or dimensions - Idea classify keywords in the appropriate facets
- Cat and dog under animal
- Mountain and fields under topographic feature
How do we extract such facets?
5Basic Idea for Extracting Important Navigational
Facets
- Given a collection of objects and associated
metadata, where each object is assigned to a list
of facets (dimensions) - Train a classifier that given an object and its
metadata, it projects a list of facets - Run the classifier on a new set of objects with
no assigned facets, to identify the frequently
used facets - Use the discovered facets for the guided
navigation
6The Classifier Straightforward Approach
- Classifying keywords in appropriate facets
- Cat or dog -gt animal
- ? Cannot generalize
????
7The Classifier Expansion Using WordNet
- Capturing the meanings of other words using
hypernyms - Cat feline, carnivore, mammal, animal, living
being, object, entity - ? can generalize ? cannot disambiguate
Animal ?
Computer Device ?
Fields
Topographic
feline, carnivore, mammal, animal, living being,
object, entity
Hypernyms
Mountain
Topographic
Animal
Dog
Hypernyms
Hypernyms
8The Classifier Capturing the Context
- Keywords associated with the same object give
valuable clues - ? can disambiguate
Animal
Computer Device
feline, carnivore, mammal, animal, living being,
object, entity
9Building the Classifier Text Classification
Problem
- We can map our problem to be a classical text
classification problem - Each example
- Represents a keyword in an object
- Has a list of assigned classes (facets) to the
keyword - Has three vector representations
- The keyword it self
- The expansion from WordNet using hypernyms
- The context - other keywords assigned to same
object and their hypernyms
10Efficient Hierarchy Construction
- Once we have identified the facets, we need to
navigate within each facet - The subsumption algorithm (Croft and Sanderson,
SIGIR1999) - Improved version of the subsumption algorithm
- For the best values of the different params, the
algorithm runs 3 time faster than the original
subsumption algorithm - Good integration with relational databases
- Extensive set of experiments
- Details in paper
11Ranking Methods
- Ranking categories is important difficult
- Important limited cognitive ability to
understand presented info - Difficult lack of explicit user goals while
browsing - Maximize Coverage maximizes the number of
objects that are covered by the displayed, top-k
categories - Frequency-based and set-cover schemes
- ? Structure and user effort to find items
- Structure considers the structure of the
underlying hierarchy and the respective effort
that the user has to put to locate items of
interest - Merit-based
12Ranking Methods Maximize Coverage
- Frequency-based Ranking (Baseline)
- Users see first categories with the greatest
wealth of information - Low ranked categories represent only a small
fraction of the collection - An easy schema to implement
- Is it optimal?
13Ranking Methods Maximize Coverage
- Set-cover Ranking
- Maximizing the cardinality of the top-k ranked
categories - A well-known NP-complete problem
- The optimal solution is unnecessary expensive,
and generates non-monotonic ranking - A greedy algorithm for approximating the
set-cover problem
14Ranking Methods Structure
- Merit-based Ranking
- Ranks higher categories that enable users to
access their contents with the smallest cost, on
average - We start by defining the cost function T(Ci) the
time to reach an object starting from node Ci in
the hierarchy - The time for reading the category headings
- The time spending on correcting mistakes
- The time for browsing the correct sub-tree
- Now let us define the Merit score based on T(Ci)
15Merit-based Ranking
- The metric is similar to the F1-measure
- o(C) number of distinct objects classified
under C - Can be computed very efficiently in a bottom-up
fashion - ? Favors categories with low cost and large
number of objects - Using the merit of each category, we can rank
categories appropriately, putting first
categories that have good hierarchy structures
under them and provide access to a large number
of objects
16Evaluation Settings
- Datasets
- Corbis royalty-free collection
- XMLTV television programs broadcasted over 261
channels in NYC - DMOZ real web pages from Open Directory
- Extracting important navigational facets
- Facet classifier using SVM with linear kernels
and Ripper - Efficient hierarchy construction
- See paper
- Ranking categories in a hierarchy
- Frequency-based, set-cover, merit-based
17Extracting Important Navigational Facets Results
using SVM and Ripper
- Baseline
- 10 (F1) slightly above random classification
- Adding hypernyms 71 (F1)
- Adding associated keywords
- Ripper
- investigate whether rule-based assignments are
sufficient - High-level WordNet hypernyms
- 55 (F1), significantly worse than SVM
- Some classes (facets) work well with simple,
rule-based assignment of terms to facets - Generic Animals (93.3)
- Action Process Activity (35.9)
SVM with hypernyms and associated keywords
F1 harmonic mean of Precision Recall
18Ranking Quality of the Generated Hierarchies
- How do the structural properties affect the
browsing experience? - Coverage the fraction of reachable objects in
the hierarchy - Other properties
- Average path length shorter paths are preferable
- Average branching factor users can decide faster
which category is best with small branching
factors - Can we combine these metrics meaningfully?
- Cost the time to reach an object in the
hierarchy
19Coverage of Ranking Methods
- Set-cover consistently covers larger fraction of
the collection - As expected, merit-based performs slightly worse
than the set-cover
20Cost of Ranking Methods
- Merit-based consistently perform better than the
other approaches, decreasing by 10-50 the time
needed to locate items of interest
21Ranking Conclusions
- Merit-based performs very well and offers fast
access to the contents of the collection. - Merit-based rankings are efficient to implement
on top of relational database systems, while the
set-cover rankings typically take longer to
compute
22Summary
- Automatically constructing multifaceted
hierarchies - ?A technique for extracting facets
- ?An efficient construction algorithm
- Ranking categories in hierarchy
- ?Frequency-based, set-cover, merit-based schemes
- ?Extensive experiments
- Automatic construction of multifaceted interfaces
is feasible, and generates high-quality
hierarchies
23Future Work
- Exploring different ways of presenting the
hierarchies to expose the contents of the
collection in efficient ways - Integrating better browsing and searching in
multifaceted databases - Indexing structures to support concurrent
searching and browsing
24Thank you for your time