Title: Folie 1
1The Domain-Specific Track at CLEF 2008
Vivien Petras Stefan Baerisch GESIS Social
Science Information Centre, Bonn,
Germany Aarhus, Denmark, September 17, 2008
2Outline
- The Domain-Specific Task
- Collections Controlled Vocabularies
- Participants, Runs Relevance Assessments
- Themes
- Outlook
3The Domain-Specific Task
- CLIR on structured scientific document
collections - social science domain
- bibliographic metadata
- controlled vocabularies for subject description
- Leverage for
- search
- query expansion
- translation
4The Domain-Specific Task
- Tasks
- Monolingual against German, English or Russian
- Bilingual against German, English or Russian
- Multilingual against combined collection
- Topics
- 25 topics in standard TREC format (title, desc,
narr) - suggestions from 28 subject specialties in the
Social Sciences - translated from German ? English, Russian
5Collections
6Controlled Vocabularies
- 5 different subject-describing terminologies
- Thesaurus for the Social Sciences (GIRT-DE, -EN)
- Thesaurus of Sociological Indexing Terms
(CSA-SA) - INION Thesaurus (ISISS)
- Social Sciences Classification (GIRT-DE, -EN)
- Sociological Abstracts Classification (CSA-SA)
7Controlled Vocabularies Mapping Tools
- Translation
- GIRT German ?? GIRT English, GIRT Russian
- INION Russian ?? INION English
- Term mappings
- equivalent terms in vocabularies
- GIRT German / English ?? CSA-SA English
- GIRT German ?? INION Russian
-
- counseling for the aged ? Counseling Elderly
-
8Participants
6 groups
9Runs
10Relevance Assessments
In Russian collection 1 topic without
relevant docs 3 topics without relevant docs
11Relevance Assessments Best MAP
German topics English 0.2751 Russian 0.2357
12Themes - Retrieval models
- Lucene (Xtrieval Chemnitz, Darmstadt)
- Semantic relatedness Wikipedia / Wiktionary
(Darmstadt) - Language Models (Amsterdam)
- Vector space (EasyIR, Hug)
- Probabilistic Logistic Regression (Cheshire)
- Comparison Vector Space, LM, Probabilistic, DFR
(Unine) - Data fusion
13Themes Query Expansion
- Blind Feedback (Rocchio)
- idf-window BF (infrequent terms near search
term) - Thesaurus Lookup
- Thesaurus as pivot language double translation
- Google (text snippets)
- Wikipedia (frequent terms from top-ranked
articles)
14Themes Translation
- Google AJAX language API
- Commercial Software (Systran, LEC)
- Bilingual thesaurus look-up
- ML retrieval ? thesaurus look-up
- Wikipedia (Cross-language links)
15Summary Outlook
- Enough interest for 2009?
- Different corpora
- Different tasks
- full topic run (125 topics)
- result controlled vocabulary terms (not
documents) - robust task
- Full-text retrieval with open access literature
16Domain-Specific Track http//www.gesis.org/en/re
search/ information_technology/clef_ds.htm Vocabu
lary Mappings http//www.gesis.org/en/research/
information_technology/komohe.htm
Email vivien.petras_at_gesis.org