Title: Query Result Clustering for Objectlevel Search
1Query Result Clustering for Object-level Search
Seung-won Hwang (POSTECH) Joint work w/ Jongwuk
Lee (POSTECH)Zaiqing Nie, Ji-rong Wen (MSRA)
2Outline
- Motivation
- Observation
- Preliminaries
- Algorithm
- Experiments
3Motivation (1 / 4)
- Given a query, search engines retrieve relevant
results. - Personalization maximize the satisfaction of a
particular user. - Diversification minimize the dissatisfaction of
varying user intents.
Canon 5D
Good!!
Canon 5D
Good??
4Motivation (2 / 4)
- Query result organization
- Provide end-users with a succinct overview of
relevant results. - e.g., a topic hierarchy or topic terms on a map
Canon 5D
Good!!
Canon 5D
Good!!
5Motivation (3 / 4)
- Document-level search
- Documents as an information unit
- e.g., Microsoft Live Search
- Object-level search
- Web objects as an information unit
- e.g., Microsoft Libra, Product Search
- More concise results for object queries
6Motivation (4 / 4)
Query result clustering for object-level search
Visualize a graph for object-level
summarization. Center a query object
Nodes relevant objects Edges relationships
between objects
Users can easily recognize relevant objects, then
drill-down their interests.
7Motivation (4 / 4) - Demo
8Why Challenging Object-level Search
- Documents
- A vector of frequencies (homogeneous)
- Well-agreed similarity
- Objects
- A vector of values
- (heterogeneous)
- Different similarity
-
Compare objects with different importance between
features.
Compare TF-IDF vectors between Docs.
???
???
Sensor size, Optical zoom, Resolution, Weight,
9Observation (1 / 2)
- Feature-based similarity
- Depend on data-specific and intent-specific
characteristics. - Need to use a measure to identify both a relevant
feature set and the corresponding distance.
DSLR cameras Sensor size, Optical zoom
Compact camerasResolution, Weight
10Observation (2 / 2)
- Exploit the intuition of subspace clustering to
identify relevant objects on different subspaces.
AB subspace
BC subspace
11Preliminaries (1 / 4)
- Challenging issue on subspace clustering
- Expensive to enumerate all possible 2d - 1
subspaces - Hard to select a desirable subspace and distance
among all subspaces
F1 Sensor size, F2, Optical zoom,F3 Weight
???
???
???
???
???
12Preliminaries (2 / 4)
- Possible solution
- Introduce parameters to save the enumeration cost
of all subspaces. - Rmin The minimum distance on a feature
- dmin The minimum number of subspaces
Rmin 0.5, dmin 2
v
Feature-based similarity matrix
(Darker cells indicate closer pairs.)
13Preliminaries (2 / 4)
- Possible solution
- Introduce parameters to save the enumeration cost
of all subspaces. - Rmin The minimum distance on a feature
- dmin The minimum number of subspaces
Rmin 0.5, dmin 2
Feature-based similarity matrix
(Darker cells indicate closer pairs.)
14Preliminaries (3 / 4)
- Problem
- Clustering results heavily depends on parameters
Rmin and dmin. - It is hard to find desirable parameter settings.
- Our solution
- Exploit co-occurrence as votes reflecting wisdom
of crowds.
15Preliminaries (4 / 4)
- Co-occurrence similarity
- Pros Presented as ground-truth reflected from
creators intuition. - Cons Include inconsistent meanings for different
characteristics. - Need to use a complementary measure to
disambiguate different characteristics.
(1, 2) Similar DSLRs
((1, 2), 6) DSLRs and high-end compact cameras
Co-occurrence matrix
16Preliminaries (4 / 4)
- Co-occurrence similarity
- Pros Presented as ground-truth reflected from
creators intuition. - Cons Include inconsistent meanings for different
characteristics. - Need to use a complementary measure to
disambiguate different characteristics.
Co-occurrence matrix
17Algorithm (1 / 4)
- Co-occurrence similarity
- Provide ground-truth for the relationships
between objects. - Do not distinguish different relationships.
- Feature-based similarity
- Disambiguate inconsistent information for
different relationships. - Cluster quality heavily depends on parameters.
18Algorithm (2 / 4)
- Associate co-occurrence with feature-based
similarity - Use co-occurrence similarity to determine the
order of merging clusters. - Provide a less sensitive property for specific
parameters. - Use feature-based similarity to disambiguate
relationships with different characteristics. - Only merge objects with consistent relationships.
19Algorithm (3 / 4)
Co-occurrence matrix
Rmin 0.5, dmin 2
Feature-based similarity matrix
20Algorithm (3 / 4)
Co-occurrence matrix
Clustering results
Rmin 0.5, dmin 2
Feature-based similarity matrix
21Algorithm (4 / 4)
- Parameter setting
- Abstracted as a multivariate interpolation
problem of (Rmin, dmin). - Basically, use a linear loosening.
- Unnecessary computation
- Improved loosening tuning
- Conservative loosening
- Use single-linkage clustering.
- Estimate dmin.
- Aggressive loosening
- Use the distribution on pair-wise feature-based
distances. - Estimate Rmin and dmin as median values.
22Experiments (1 / 6)
- Real-life user study
- Conduct 32 people (MSRA interns and POSTECH
students) - Cameras Canon Powershot SD850, Nikon D80, Nikon
D2Xs - Laptops Lenovo Thinkpad R61, T60, T61
HAC only use co-occurrence. HARP only use
feature-based similarity.
23Experiments (2 / 6)
- Synthetic datasets
- Parameter settings
- Quality metrics
- CE (Clustering Error) An ideal value is
minimized as 0. - F1-value, FF1-value An ideal value is maximized
as 1.
24Experiments (3 / 6)
- Varying average feature size
Ideal
Ideal
HydraAdaptive Use aggressive loosening with
Hydra.
25Experiments (4 / 6)
26Experiments (5 / 6)
Note When sp 0, co-occurrence exists for all
possible pairs.
27Experiments (6 / 6)
- Efficiency for cardinality and dimensionality
28Q A
Thank you!!
28