Automatic Construction of Multifaceted Browsing Interfaces - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Automatic Construction of Multifaceted Browsing Interfaces

Description:

Automatic Construction of Multifaceted Browsing Interfaces – PowerPoint PPT presentation

Number of Views:29

Avg rating:3.0/5.0

Slides: 25

Provided by: Wis

Category:

more less

Transcript and Presenter's Notes

Title: Automatic Construction of Multifaceted Browsing Interfaces

1
Automatic Construction of Multifaceted Browsing
Interfaces

Wisam Dakka Columbia University
Panagiotis G. Ipeirotis NYU
Kenneth R. Wood MSR Cambridge

2
Why Guided Navigation?

Typical search for a product name

Multifaceted hierarchies are superior than
single, monolithic hierarchies
Allow users to browse across multiple dimensions
Expose the contents of the underlying collection
and can help users more quickly locate items of
interest

3
Roadmap

?Identify manually the dimensions/facets that can
be used to browse a collection
Type, country, price, grape, diet
?A technique for extracting facets
? Create manually the hierarchies for each
dimension
The countries hierarchy
?An efficient construction algorithm
Ranking categories within hierarchies
How to show the best categories first?
?Ranking schemes
?Extensive experiments

4
Extracting Important Navigational Facets
Motivation

Many collections with metadata organized across
different facets
Corbis royalty-free collection
A set of 36,820 annotated images
Each image has a title, a free-text description,
and a set of associated keywords
Total of 65,521 keywords, mainly assigned to 14
out of the 38 facets
And many others, like the wine collection we used
early
The task for a given image with its metadata,
extract a set of proper facets or dimensions
Idea classify keywords in the appropriate facets
Cat and dog under animal
Mountain and fields under topographic feature

How do we extract such facets?
5
Basic Idea for Extracting Important Navigational
Facets

Given a collection of objects and associated
metadata, where each object is assigned to a list
of facets (dimensions)
Train a classifier that given an object and its
metadata, it projects a list of facets
Run the classifier on a new set of objects with
no assigned facets, to identify the frequently
used facets
Use the discovered facets for the guided
navigation

6
The Classifier Straightforward Approach

Classifying keywords in appropriate facets
Cat or dog -gt animal
? Cannot generalize

????
7
The Classifier Expansion Using WordNet

Capturing the meanings of other words using
hypernyms
Cat feline, carnivore, mammal, animal, living
being, object, entity
? can generalize ? cannot disambiguate

Animal ?
Computer Device ?
Fields
Topographic
feline, carnivore, mammal, animal, living being,
object, entity
Hypernyms
Mountain
Topographic
Animal
Dog
Hypernyms
Hypernyms
8
The Classifier Capturing the Context

Keywords associated with the same object give
valuable clues
? can disambiguate

Animal
Computer Device
feline, carnivore, mammal, animal, living being,
object, entity
9
Building the Classifier Text Classification
Problem

We can map our problem to be a classical text
classification problem
Each example
Represents a keyword in an object
Has a list of assigned classes (facets) to the
keyword
Has three vector representations
The keyword it self
The expansion from WordNet using hypernyms
The context - other keywords assigned to same
object and their hypernyms

10
Efficient Hierarchy Construction

Once we have identified the facets, we need to
navigate within each facet
The subsumption algorithm (Croft and Sanderson,
SIGIR1999)
Improved version of the subsumption algorithm
For the best values of the different params, the
algorithm runs 3 time faster than the original
subsumption algorithm
Good integration with relational databases
Extensive set of experiments
Details in paper

11
Ranking Methods

Ranking categories is important difficult
Important limited cognitive ability to
understand presented info
Difficult lack of explicit user goals while
browsing
Maximize Coverage maximizes the number of
objects that are covered by the displayed, top-k
categories
Frequency-based and set-cover schemes
? Structure and user effort to find items
Structure considers the structure of the
underlying hierarchy and the respective effort
that the user has to put to locate items of
interest
Merit-based

12
Ranking Methods Maximize Coverage

Frequency-based Ranking (Baseline)
Users see first categories with the greatest
wealth of information
Low ranked categories represent only a small
fraction of the collection
An easy schema to implement
Is it optimal?

13
Ranking Methods Maximize Coverage

Set-cover Ranking
Maximizing the cardinality of the top-k ranked
categories
A well-known NP-complete problem
The optimal solution is unnecessary expensive,
and generates non-monotonic ranking
A greedy algorithm for approximating the
set-cover problem

14
Ranking Methods Structure

Merit-based Ranking
Ranks higher categories that enable users to
access their contents with the smallest cost, on
average
We start by defining the cost function T(Ci) the
time to reach an object starting from node Ci in
the hierarchy
The time for reading the category headings
The time spending on correcting mistakes
The time for browsing the correct sub-tree
Now let us define the Merit score based on T(Ci)

15
Merit-based Ranking

The metric is similar to the F1-measure
o(C) number of distinct objects classified
under C
Can be computed very efficiently in a bottom-up
fashion
? Favors categories with low cost and large
number of objects
Using the merit of each category, we can rank
categories appropriately, putting first
categories that have good hierarchy structures
under them and provide access to a large number
of objects

16
Evaluation Settings

Datasets
Corbis royalty-free collection
XMLTV television programs broadcasted over 261
channels in NYC
DMOZ real web pages from Open Directory
Extracting important navigational facets
Facet classifier using SVM with linear kernels
and Ripper
Efficient hierarchy construction
See paper
Ranking categories in a hierarchy
Frequency-based, set-cover, merit-based

17
Extracting Important Navigational Facets Results
using SVM and Ripper

Baseline
10 (F1) slightly above random classification
Adding hypernyms 71 (F1)
Adding associated keywords
Ripper
investigate whether rule-based assignments are
sufficient
High-level WordNet hypernyms
55 (F1), significantly worse than SVM
Some classes (facets) work well with simple,
rule-based assignment of terms to facets
Generic Animals (93.3)
Action Process Activity (35.9)

SVM with hypernyms and associated keywords
F1 harmonic mean of Precision Recall
18
Ranking Quality of the Generated Hierarchies

How do the structural properties affect the
browsing experience?
Coverage the fraction of reachable objects in
the hierarchy
Other properties
Average path length shorter paths are preferable
Average branching factor users can decide faster
which category is best with small branching
factors
Can we combine these metrics meaningfully?
Cost the time to reach an object in the
hierarchy

19
Coverage of Ranking Methods

Set-cover consistently covers larger fraction of
the collection
As expected, merit-based performs slightly worse
than the set-cover

20
Cost of Ranking Methods

Merit-based consistently perform better than the
other approaches, decreasing by 10-50 the time
needed to locate items of interest

21
Ranking Conclusions

Merit-based performs very well and offers fast
access to the contents of the collection.
Merit-based rankings are efficient to implement
on top of relational database systems, while the
set-cover rankings typically take longer to
compute

22
Summary

Automatically constructing multifaceted
hierarchies
?A technique for extracting facets
?An efficient construction algorithm
Ranking categories in hierarchy
?Frequency-based, set-cover, merit-based schemes
?Extensive experiments
Automatic construction of multifaceted interfaces
is feasible, and generates high-quality
hierarchies

23
Future Work

Exploring different ways of presenting the
hierarchies to expose the contents of the
collection in efficient ways
Integrating better browsing and searching in
multifaceted databases
Indexing structures to support concurrent
searching and browsing

24
Thank you for your time

Write a Comment

User Comments (0)