Main Mono and Bilingual Tasks: Track Organisation and Results Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Main Mono and Bilingual Tasks: Track Organisation and Results Analysis

Description:

Title: SAPIR Kick-off Meeting Subject: UPD Contribution to WP2 Author: Nicola Ferro Last modified by: Nicola Ferro Created Date: 11/18/2001 4:54:53 PM – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 33
Provided by: NicolaF150
Category:

less

Transcript and Presenter's Notes

Title: Main Mono and Bilingual Tasks: Track Organisation and Results Analysis


1
Main Mono and Bilingual Tasks Track
Organisation and Results Analysis
2
Outline
3
CLEF Infrastructure DIRECT
4
Information Hierarchy
  • experimental collections and the experiments are
    data, since they are the raw, basic elements
    needed for any further investigation
  • performance measurements are information, since
    they are the result of computations and
    processing on the data,
  • descriptive statistics and the hypothesis tests
    are knowledge, since they are a further
    elaboration of the information carried by the
    performance measurements
  • theories, models, algorithms, and techniques are
    wisdom, since they provide interpretation,
    explanation, and formalization of the content of
    the previous levels.

5
Approach to the Evaluation (1/2)
  • Introduce a conceptual model
  • it makes clear what are the entities entailed by
    the information space of an evaluation campaign,
    their features, and their relationships
  • logical models can be derived from it to manage
    and preserve the experimental data
  • commonly agreed data formats for exchanging
    information can be derived from it
  • Develop common metadata formats
  • they provide meaning to the data, and thereby
    enable their sharing and re-use
  • they allow to keep track of the lineage of the
    managed information
  • Adopt a unique identification mechanism
  • it allows for explicit citation and easy access
    to the scientific data and it supports the
    enrichement of the scientific data

6
Approach to the Evaluation (2/2)
  • Provide common tools for statistical analyses
  • they allow for judging whether measured
    differences between retrieval methods can be
    considered statistically significant
  • a uniform way of performing statistical analyses
    on experiments make the analysis and assessment
    of the experiments comparable too
  • Design and develop a Digital Library System (DLS)
    for IR scientific data
  • it is well suited for managing and making
    accessible the scientific data and the
    experiments produced during the course of an
    evaluation campaign
  • it also provides tools for analyzing, comparing,
    and citing the scientific data of an evaluation
    campaign, as well as curating, preserving,
    annotating, enriching, and promoting the re-use
    of them
  • Give to organizations responsible for evaluation
    initiatives an active role in this process
  • they should take a leadership role in developing
    a comprehensive strategy for long-lived digital
    data collections and drive the research community
    through this process in order to improve the way
    of doing research
  • they should take care also of defining guiding
    principles, policies, best practices for making
    use of the scientific data produced during the
    evaluation campaign itself

7
Internationalization of the User Interface
Bulgarian Petya Osenova, Kiril Simov
Czech Pavel Pecina
English Marco Dussin
French Jacques Savoy
German Thomas Mandl
Indonesian Mirna Adriani
Italian Marco Dussin
Portuguese Paulo Rocha, Diana Santos
Spanish Julio Villena Román
8
Identification Digital Object Identifiers (DOI)
10.2415/AH-BILI-X2BG-CLEF2007.JHU-APL.APLBIENBGTD4
  • DOIs
  • allow us to uniquely identify a digital object
  • are persistent and actionable
  • aim especially at the intellectual property
  • We assign DOIs to
  • collections - prefix 10.2453
  • topics - prefix 10.2452
  • experiments - prefix 10.2415
  • pools - prefix 10.2454
  • statistical tests - prefix 10.2455

http//www.medra.org
9
DOI Resolution
http//dx.doi.org
10
Experiment Metrics
11
Experiment Statistics
12
Experiment Plots
13
Task Statistics
14
Task Plots
15
Appendices (1/2)
16
Appendices (2/2)
17
Track Overview
18
Participation
22 participants 12 countries
19
Participation by Country
20
Tasks and Collections
  • Monolingual and bilingual tasks have principally
    offered for Central European languages
    Bulgarian, Czech and Hungarian
  • Topics in 16 languages
  • European languages Bulgarian, Czech, English,
    French, Hungarian, Italian and Spanish
  • non-European languages (for X2EN) Amharic,
    Chinese, Indonesian, Oromo
  • Indian sub-task Bengali, Hindi, Marathi, Tamil
    and Telugu

Language Task Collection
Bulgarian Monolingual BG, Bilingual X2BG Sega 2002, Standart 2002, Novinar 2002
Cezch Monolingual CS, Bilingual X2CS Mlada fronta DNES 2002, Lidové Noviny 2002
Hungarian Monolingual HU, Bilingual X2HU Magyar Hirlap 2002
English Bilingual X2EN (Indian sub-task) LA Times 2002
21
Participation by Task
172 submitted runs
Disappointing participation
22
Runs by Source Language
23
Monolingual Tasks
24
Monolingual Bulgarian
25
Monolingual Czech
26
Monolingual Hungarian
27
Monolingual English
28
Approaches to Monolingual Retrieval
  • Main emphasis
  • stemming
  • morphological analysis
  • relevance feed-back

Morphological Lemmatizer
29
Bilingual Tasks
30
Bilingual X ? English
31
Approaches to Bilingual X2EN
  • Main emphasis
  • bilingual dictionaries
  • machine translation
  • coverage of lexicons
  • use of pivot languages

Best Bilingual English system is about88 of the
best monolingual system
  • bilingual dictionaries and pivot languages
  • query expansion with RF
  • parallel corpora
  • translation ambiguity resolution with a graph
    based approach
  • lexicon coverage with a pattern-based approach
  • Afaan Oromo stemmer
  • stop list creation
  • bilingual Oromo-English dictionary creation
  • Bilingual Hungarian to English
  • bilingual dictionary
  • exploiting Wikipedia to remove improbable
    translations

32
Bilingual X2EN Indian Subtask
Write a Comment
User Comments (0)
About PowerShow.com