Title: INEX 2002 2006: Understanding XML Retrieval Evaluation
1INEX 2002 - 2006 Understanding XML Retrieval
Evaluation
Mounia Lalmas and Anastasios Tombros Queen Mary,
University of London Norbert Fuhr University of
Duisburg-Essen
2XML retrieval vs. document retrieval(Retrieval
of structured vs. unstructured documents)
- No predefined unit of retrieval
- Dependency of retrieval units
- Aims of XML retrieval
- Not only to find relevant elements
- But those at the appropriate level of granularity
XML retrieval allows users to retrieve document
components that are more focused, e.g. a
subsection of a book instead of an entire book.
3Outline
- Collections
- Topics
- Retrieval tasks
- Relevance and assessment procedures
- Metrics
4Evaluation of XML retrieval INEX
- Promote research and stimulate development of XML
information access and retrieval, through - Creation of evaluation infrastructure and
organisation of regular evaluation campaigns for
system testing - Building of an XML information access and
retrieval research community - Construction of test-suites
- Collaborative effort ? participants contribute to
the development of the collection - End with a yearly workshop, in December, in
Dagstuhl, Germany - INEX has allowed a new community in XML
information access to emerge
5INEX Background
- Since 2002
- Sponsored by DELOS Network of Excellence for
Digital Libraries under FP5 and FP6 IST
programme - Mainly dependent on voluntary efforts
- Coordination is distributed for tasks and tracks
- 64 participants in 2005 80 in 2006
- Main Institutions involved in Coordination for
2006
- Queensland University of Technology, AUS
- University of California, Berkeley, USA
- Royal School of LIS, DK
- Queen Mary, University of London, UK
- University of Duisburg-Essen, DE
- INRIA-Rocquencourt, FR
- Yahoo! Research
- Microsoft Research Cambridge, UK
- Max-Planck-Institut fur Informatik, DE
- University of Amsterdam, NL
- University of Otago, NZ
- University of Waterloo
- CWI, NL
- Carnegie Mellon University, USA
- IBM Research Lab, IL
- University of Minnesota Duluth, USA
- University of Paris 6, FR
6Document collections
7Topics
8Two types of topics in INEX
- Content-only (CO) topics
- ignore document structure
- simulates users, who do not have any knowledge of
the document structure or who choose not to use
such knowledge - Content-and-structure (CAS) topics
- contain conditions referring both to content and
structure of the sought elements - simulate users who do have some knowledge of the
structure of the searched collection
9CO topics 2003-2004
- lttitlegt
- "Information Exchange", "XML", "Information
Integration" - lt/titlegt
- ltdescriptiongt
- How to use XML to solve the information
exchange (information integration) problem, - especially in heterogeneous data sources?
- lt/descriptiongt
- ltnarrativegt
- Relevant documents/components must talk about
techniques of - using XML to solve information exchange
(information integration) - among heterogeneous data sources where the
structures of participating - data sources are different although they might
use the same ontologies - about the same content.
- lt/narrativegt
10CAS topics 2003-2004
- lttitlegt
- //article(./fm//yr '2000' OR ./fm//yr
'1999') AND about(., '"intelligent transportation
system"')//secabout(.,'automation vehicle') - lt/titlegt
- ltdescriptiongt
- Automated vehicle applications in articles
from 1999 or 2000 about intelligent
transportation systems. - lt/descriptiongt
- ltnarrativegt
- To be relevant, the target component must be
from an article on intelligent transportation
systems published in 1999 or 2000 and must
include a section which discusses automated
vehicle applications, proposed or implemented, in
an intelligent transportation system. - lt/narrativegt
11COS topics 2005-2006
- lttitlegtmarkov chains in graph related
algorithmslt/titlegt - ltcastitlegt//article//secabout(.,"markov
chains" algorithm graphs) lt/castitlegt - ltdescriptiongtRetrieve information about the use
of markov chains in - graph theory and in graphs-related
algorithms. - lt/descriptiongt
- ltnarrativegtI have just finished my Msc. in
mathematics, in the field - of stochastic processes. My research was in a
subject related to - Markov chains. My aim is to find possible
implementations of my - knowledge in current research. I'm mainly
interested in - applications in graph theory, that is,
algorithms related to graphs - that use the theory of markov chains. I'm
interested in at - least a short specification of the nature of
implementation (e.g. - what is the exact theory used, and to which
purpose), hence the - relevant elements should be sections,
paragraphs or even abstracts - of documents, but in any case, should be part
of the content of the - document (as opposed to, say, vt, or bib).
- lt/narrativegt
12Expressing structural constraints NEXI
- Narrowed Extended XPath I
- INEX Content-and-Structure (CAS) Queries
- Specifically targeted for content-oriented XML
search (i.e. aboutness)
//articleabout(.//title, apple) and
about(.//sec, computer)
13Retrieval Tasks
14Retrieval tasks
- Ad hoc retrieval
- a simulation of how a library might be used and
involves the searching of a static set of XML
documents using a new set of topics - Ad hoc retrieval for CO topics
- Ad hoc retrieval for CAS (S) topics
- Core task
- identify the most appropriate granularity XML
elements to return to the user, with or without
structural constraints
15CO retrieval task (2002 - )
- Specification
- make use of the CO topics
- retrieves the most specific elements and only
those, which are relevant to the topic - no structural constraints regarding the
appropriate granularity - must identify the most appropriate XML elements
to return to the user - Two main strategies
- Focused strategy
- Thorough strategy
16Focused strategy (2005 - )
- Specification
- find the most exhaustive and specific element
on a path within a given document containing
relevant information and return to the user only
this most appropriate unit of retrieval - no overlapping elements
- preference for specificity over exhaustivity
17Thorough strategy (2002 - )
- Specification
- core system's task underlying most XML retrieval
strategies, which is to estimate the relevance of
potentially retrievable elements in the
collection - overlap problem viewed as an interface and
presentation issues - challenge is to rank elements appropriately
- Task that most XML approaches performed up to
2004 in INEX.
18Fetch Browse - 2005
- Document ranking, and in each document, element
ranking - Query wordnet information retrieval
19Fetch Browse - 2006
- Document ranking, and in each document
- All in context task rank relevant elements, no
overlap allowed (actual refinement of fetch
Browse) - Best in context task identify the one element
from where to start reading in the document - Likely to be the two tasks in INEX 2007
20Retrieval strategies - to recap
- Focussed assume that user prefers a single
element that is the most relevant. - Thorough assume that user prefers all highly
relevant elements. - All In Context assume that user interested in
highly relevant elements that are contained only
within highly relevant articles. - Best In Context assume that user interested in
the best entry points, one per article, of highly
relevant articles
21Relevance and assessment procedures
22Relevance in XML retrieval
- smallest component (specificity) that is highly
relevant (exhaustivity) - specificity extent to which a document
component is focused on the information need,
while being an informative unit - exhaustivity extent to which the information
contained in a document component satisfies the
information need.
article
Query XML retrieval evaluation
XML retrieval evaluation
s1 s2 s3
XML retrieval
XML evaluation
ss1 ss2
23Relevance in XML retrieval INEX 2003 - 2004
article
XML retrieval evaluation
s1 s2 s3
XML retrieval
XML evaluation
ss1 ss2
- Relevance (0,0) (1,1) (1,2) (1,3) (2,1) (2,2)
(2,3) (3,1) (3,2) (3,3) - exhaustivity how much the section discusses
the query 0, 1, 2, 3 - specificity how focused the section is on the
query 0, 1, 2, 3 - If a subsection is relevant so must be its
enclosing section, ...
24Relevance assessment task
- Pooling technique
- Completeness
- Rules that force assessors to assess related
elements - E.g. element assessed relevant ? its parent
element and children elements must also be
assessed -
- Consistency
- Rules to enforce consistent assessments
- E.g. Parent of a relevant element must also be
relevant, although to a different extent - E.g. Exhaustivity increases going up specificity
increases going down -
25Quality of assessments
- Very laborious assessment task, eventually
impacting on the quality of assessments - Interactive study shows that assessors agreement
levels are high only at extreme ends of the
relevance scale (very vs. not relevant) - Statistical analysis of 2004 data showed that
comparisons of approaches would lead to same
outcomes using a reduced scale - A simplified assessment procedure based on
highlighting
26Relevance in XML - 2005
- specificity defined continuous scale defined as
ratio (in characters) of the highlighted text to
element size.
- Exhaustivity
- Highly exhaustive (2)
- Partly exhaustive (1)
- Not exhaustive (0)
- Too Small (?)
New assessment procedure led to better quality
assessments
27Latest analysis
- Statistical analysis on the INEX 2005 data
- The exhaustivity 31 scale is not needed in most
scenarios to compare XML retrieval approaches - too small may be simulated by some threshold
length - INEX 2006 used only the specificity dimension to
measure relevance - The same highlighting approach is used
- Use of a highlighting procedure simplifies
everything and is enough to properly compare
the effectiveness of XML retrieval systems
28Metrics
29Measuring effectiveness Metrics
- A research problem in itself!
- Quantizations reflecting preference scenarios
- Metrics
- inex_eval - official INEX metric through 2004
- inex_eval_ng (consider overlap size)
- ERR (expected ratio of relevant units)
- XCG (XML cumulative gain) - official INEX metric
2005 - t2i (tolerance to irrelevance)
- PRUM (Precision Recall with User Modelling)
- HiXEval
- ..
30Near-misses
XML retrieval allows users to retrieve document
components that are more focussed, e.g. a
section of a book instead of an entire
book BUT what about if the chapter or one
the subsections is returned?
(3,1)
(3,2)
(3,3)
(1,3)
(exhaustivity, specificity) as defined in 2004
31Retrieve the best XML elements according to
content and structure criteria (2004 scale)
- Most exhaustive and the most specific (3,3)
- Near misses (3,3) (2,3) (1,3) ? specific
- Near misses (3, 3) (3,2) (3,1) ? exhaustive
- Near misses (3, 3) (2,3) (1,3) (3,2) (3,1)
(1,2)
near-misses
32Quantization functions - reward near misses (2004
scale)
- Strict - no reward
-
- General - some rewards
33Other INEX tracks
- Interactive (2004 - 2006)
- Relevance feedback (2004 - 2006)
- Natural language query processing (2004 - 2006)
- Heterogeneous collection (2004 - 2006)
- Multimedia track (2005 - )
- Document mining (2005 - ) together with PASCAL
network - http//xmlmining.lip6.fr/ - User - case studies (2006)
- XML entity ranking (2006 - )
- Other tracks under discussion for 2007, including
a book search track
34Looking Forward
- Much recent work on evaluation
- Larger more realistic collection - Wikipedia
- More assessed topics!
- Better suite for analysis and reusability
- Better understanding of
- information needs and retrieval scenarios
- measuring effectiveness
- Introduction of a passage retrieval task in INEX
2007
Questions?