Evaluating Hierarchical Clustering of Search Results

About This Presentation

Title:

Evaluating Hierarchical Clustering of Search Results

Description:

Astrophysics. d4. Nuclear physics. d2, d3. d1, d2, d3, d4. Physics. Astrophysics. d4. Nuclear physics. Previous assumptions. Open world' perspective. X. X. Jokes ... – PowerPoint PPT presentation

Number of Views:87

Avg rating:3.0/5.0

Slides: 23

Provided by: anselm

Category:

more less

Transcript and Presenter's Notes

Title: Evaluating Hierarchical Clustering of Search Results

1
Evaluating Hierarchical Clustering of Search
Results
SPIRE 2005, Buenos Aires

Departamento de Lenguajes y
Sistemas Informáticos
UNED, Spain
Juan Cigarrán
Anselmo Peñas
Julio Gonzalo
Felisa Verdejo
nlp.uned.es

2
Overview

Scenario
Assumptions
Features of a Good Hierarchical Clustering
Evaluation Measures
Minimal Browsing Area (MBA)
Distillation Factor (DF)
Hierarchy Quality (HQ)
Conclusion

3
Scenario

Complex information needs
Compile information from different sources
Inspect the whole list of documents
More than 100 documents
Help to
Find the relevant topics
Discriminate from unrrelevant documents
Approach
Hierarchical Clustering Formal Concept Analysis

4
(No Transcript)
5
(No Transcript)
6
Problem

How to define and measure the quality of a
hierarchical clustering?
How to compare different clustering approaches?

7
Previous assumptions

Each cluster contains only those documents fully
described by its descriptors

8
Previous assumptions

Open world perspective

9
Good Hierarchical Clustering

The content of the clusters.
Clusters should not mix relevant with non
relevant information

10
Good Hierarchical Clustering

The hierarchical arrangement of the clusters
Relevant information should be in the same path

11
Good Hierarchical Clustering

The number of clusters
Number of clusters substantially lower than the
number of documents
How clusters are described
Cognitive load of reading a cluster description
Ability to predict the relevance of the
information that it contains (not addressed here)

12
Evaluation Measures

Criterion
Minimize the browsing effort for finding ALL
relevant information
Baseline
The original document list returned by a search
engine

13
Evaluation Measures

Consider
Content of clusters
Hierarchical arrangement of clusters
Size of the hierarchy
Cognitive load of reading a document (in the
baseline) Kd
Cognitive load of reading a node descriptor (in
the hierarchy) Kn
Requirement
Relevance assessments are available

14
Minimal Browsing Area (MBA)

The minimal set of nodes the user has to traverse
to find ALL the relevant documents minimising the
number of irrelevant ones

15
Distillation Factor (DF)

Ability to isolate relevant information compared
with the original document list (Gain
Factor, DFgt1)

Equivalent to

Considers only the cognitive load of reading
documents

16
Distillation Factor (DF)

Example

Precision 4/7
Precision MBA 4/5
DF(L) 7/5 1.4
17
Distillation Factor (DF)

Counterexample

Bad clustering with good DF
Extend the DF measure considering the cognitive
cost of taking browsing decisions ? HQ

18
Hierarchy Quality (HQ)

Assumption
When a node (in the MBA) is explored, all its
lower neighbours have to be considered some will
be in turn explored, some will be discarded
Nview subset of lower neighbours of each node
belonging to the MBA

19
Hierarchy Quality (HQ)

Kn and Kd are directly related with the retrieval
scenario in which the experiments take place
The researcher must tune KKn/Kd before
conducting the experiment
HQ gt 1 indicates an improvement of the clustering
versus the original list

20
Hierarchy Quality (HQ)

Example

21
Conclusions and Future Work

Framework for comparing different clustering
approaches taking into account
Content of clusters
Hierarchical arrangement of clusters
Cognitive load to read document and node
descriptions
Adaptable to the retrieval scenario in which
experiments take place
Future work
Conduct user studies to compare their results
with the automatic evaluation
Results will reflect the quality of the
descriptors
Will be used to fine-tune the kd and kn parameters

Thank you!

Write a Comment

User Comments (0)

About PowerShow.com

Evaluating Hierarchical Clustering of Search Results - PowerPoint PPT Presentation

Evaluating Hierarchical Clustering of Search Results

Astrophysics. d4. Nuclear physics. d2, d3. d1, d2, d3, d4. Physics. Astrophysics. d4. Nuclear physics. Previous assumptions. Open world' perspective. X. X. Jokes ... – PowerPoint PPT presentation