Folie 1 - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Folie 1

Description:

translation. 4. The Domain-Specific Task. Tasks: Monolingual: against German, English or Russian ... Translation: GIRT German GIRT English, GIRT Russian. INION ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 17
Provided by: izs11
Category:

less

Transcript and Presenter's Notes

Title: Folie 1


1
The Domain-Specific Track at CLEF 2008
Vivien Petras Stefan Baerisch GESIS Social
Science Information Centre, Bonn,
Germany Aarhus, Denmark, September 17, 2008
2
Outline
  • The Domain-Specific Task
  • Collections Controlled Vocabularies
  • Participants, Runs Relevance Assessments
  • Themes
  • Outlook

3
The Domain-Specific Task
  • CLIR on structured scientific document
    collections
  • social science domain
  • bibliographic metadata
  • controlled vocabularies for subject description
  • Leverage for
  • search
  • query expansion
  • translation

4
The Domain-Specific Task
  • Tasks
  • Monolingual against German, English or Russian
  • Bilingual against German, English or Russian
  • Multilingual against combined collection
  • Topics
  • 25 topics in standard TREC format (title, desc,
    narr)
  • suggestions from 28 subject specialties in the
    Social Sciences
  • translated from German ? English, Russian

5
Collections
6
Controlled Vocabularies
  • 5 different subject-describing terminologies
  • Thesaurus for the Social Sciences (GIRT-DE, -EN)
  • Thesaurus of Sociological Indexing Terms
    (CSA-SA)
  • INION Thesaurus (ISISS)
  • Social Sciences Classification (GIRT-DE, -EN)
  • Sociological Abstracts Classification (CSA-SA)

7
Controlled Vocabularies Mapping Tools
  • Translation
  • GIRT German ?? GIRT English, GIRT Russian
  • INION Russian ?? INION English
  • Term mappings
  • equivalent terms in vocabularies
  • GIRT German / English ?? CSA-SA English
  • GIRT German ?? INION Russian
  • counseling for the aged ? Counseling Elderly

8
Participants
6 groups
9
Runs
10
Relevance Assessments

In Russian collection 1 topic without
relevant docs 3 topics without relevant docs
11
Relevance Assessments Best MAP
German topics English 0.2751 Russian 0.2357
12
Themes - Retrieval models
  • Lucene (Xtrieval Chemnitz, Darmstadt)
  • Semantic relatedness Wikipedia / Wiktionary
    (Darmstadt)
  • Language Models (Amsterdam)
  • Vector space (EasyIR, Hug)
  • Probabilistic Logistic Regression (Cheshire)
  • Comparison Vector Space, LM, Probabilistic, DFR
    (Unine)
  • Data fusion

13
Themes Query Expansion
  • Blind Feedback (Rocchio)
  • idf-window BF (infrequent terms near search
    term)
  • Thesaurus Lookup
  • Thesaurus as pivot language double translation
  • Google (text snippets)
  • Wikipedia (frequent terms from top-ranked
    articles)

14
Themes Translation
  • Google AJAX language API
  • Commercial Software (Systran, LEC)
  • Bilingual thesaurus look-up
  • ML retrieval ? thesaurus look-up
  • Wikipedia (Cross-language links)

15
Summary Outlook
  • Enough interest for 2009?
  • Different corpora
  • Different tasks
  • full topic run (125 topics)
  • result controlled vocabulary terms (not
    documents)
  • robust task
  • Full-text retrieval with open access literature

16
Domain-Specific Track http//www.gesis.org/en/re
search/ information_technology/clef_ds.htm Vocabu
lary Mappings http//www.gesis.org/en/research/
information_technology/komohe.htm
Email vivien.petras_at_gesis.org
Write a Comment
User Comments (0)
About PowerShow.com