SphereSearch - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

SphereSearch

Description:

The Pelican Hotel in Salvador, operated by ... The company Pelican Hotel /company in location Salvador /location , operated by ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 33
Provided by: ral138
Category:

less

Transcript and Presenter's Notes

Title: SphereSearch


1
(No Transcript)
2
Outline
  • Where existing search engines fail
  • SphereSearch Concepts
  • Transformation and Annotation
  • Query Language and Scoring
  • Experimental Evaluation
  • Summary

3
Example query 1
Which professors from Saarbrücken do research on
XML
Different terminology in query and Web pages
Director of Department 5 DBS IS
Professor at Saarland University
Abstraction Awareness
4
Example query 2
?
Conferences about XML in Norway 2005
Context Awareness
5
Example query 3
What are the publications of Max Planck?
Max Planck should be instance of concept person,
not of concept institute
Concept Awareness
6
SphereSearch Concepts
Goal Increase recall precision for hard
queries on linked and heterogeneous data
  • Unified search for unstructured, semistructured,
    structured data from heterogeneous sources
  • Graph-based model, including links
  • Annotation engines from NLP to recognize classes
    of named entities (persons, locations, dates, )
    for concept-aware queries
  • Flexible yet simple abstraction-aware query
    language with context-aware scoring
  • Compactness-based scores

7
Some Related Work
  • Web Query Languagese.g., W3QS VLDB95, WebOQL
    ICDE95,
  • Web IR with thesaurie.g., Qiu et al.SIGIR93,
    Liu et al.SIGIR04,
  • XML IRe.g., XXL WebDB00, XIRQL
    SIGIR01,XSearch VLDB93, XRank SIGMOD03,
  • Information extractione.g., Lixto, KnowItAll,
  • Advanced Web graph IRe.g., BANKS ICDE02,
    Hristidis et al.VLDB03,

8
Outline
  • Where existing search engines fail
  • SphereSearch Concepts
  • Transformation and Annotation
  • Query Language and Scoring
  • Experimental Evaluation
  • Current and Future Work

9
Unifying Search on Heterogeneous Data






Databases






10
Heuristic Transformation of HTML
Goal Transform layout tagsto semantic
annotations
  • Headlineslth1gtExperimentslt/h1gtlth2gtSettingslt/h2gtW
    e evaluated...lth2gtResultslt/h2gtOur system...
  • PatternsltbgtTopiclt/bgtXML
  • Rules for tables, lists,

11
(Almost) Generic XML Data Model
  • ltProfessorgt Gerhard Weikum ltCoursegt
    IR lt/Coursegt Saarbrücken ltResearchgt
    XML lt/Researchgtlt/Professorgt

docid1tagProfessor contentGerhard Weikum
Saarbrücken
1
docid1tagResearchcontentXML
docid1tagCoursecontentIR
3
2
Automatic annotation of important concepts
(persons, locations, dates, money amounts) with
tools from Information Extraction
12
Information Extraction (IE)
  • Named Entity Recognition (NER)
  • Named Entity abstract datatype, concept
    (location, person,, IP-address)
  • Mature (out-of-the-box products, e.g. GATE/ANNIE)
  • Extensible

The Pelican Hotel in Salvador, operated
by Roberto Cardoso, offers comfortable rooms
starting at 100 a night, including breakfast.
Please check in before 7pm.
The ltcompanygt Pelican Hotel lt/companygt
in ltlocationgt Salvador lt/locationgt, operated
by ltpersongt Roberto Cardoso lt/persongt,
offers comfortable rooms starting at ltpricegt 100
lt/pricegt a night, including breakfast. Please
check in before lttimegt 7pm lt/timegt.
13
Unifying Search on Heterogeneous Data






Databases






14
Annotation-Aware Data Model
docid1tagProfessorcontentGerhard Weikum
Saarbrücken
1
  • ltProfessorgt Gerhard Weikum
    ltCoursegtIRlt/Coursegt Saarbrücken
    ltResearchgtXMLlt/Researchgtlt/Professorgt

docid1tagResearch contentXML
docid1tagCoursecontentIR
3
2
Annotation introduces new tags
15
Data Model for Links
16
Architecture
INDEX
Search Engine
Search Engine
FROMSIGIR SUBJECTNotification
Date 15-18 August EventSIGIR
Location Frankfurt LocationSalvador Time 1315
Location Salvador Price 89
LocationSalvador
PersonSchenkel
IE Processor
Annotation Module PRICE
Annotators
Annotation Module DATE

Annotation Module LOCATION

XML Adapter
EMail Adapter
Adapters
Web Portal Adapter
Web Adapter
SIGIR Website
Hotel Website
Tourist Guide (XML)
Sources
Flight Schedule
Graupmann
Homepage
17
Outline
  • Where existing search engines fail
  • SphereSearch Concepts
  • Transformation and Annotation
  • Query Language and Scoring
  • Experimental Evaluation
  • Current and Future Work

18
SphereSearch Queries
  • Extended keyword queries
  • similarity conditions professor, Saarbrücken
  • concept-based conditions personMax Planck,
    locationTrondheim
  • grouping
  • join conditions
  • Ranked results with context-aware scoring

19
Score Aggregation SphereScore
research XML
1
  • Weighted aggregation of local scores in
    environment of element (sphere score)

Context awareness
Rewards proximity of terms and compactness of
term distribution
20
Similarity Conditions
Similarity conditions like professor,
Saarbrücken
disambiguation
Query expansion
d-exp(x)wsim(x,w)gtd
Local score weighted max over all expansion
terms
sL(e,professor) max t?d-exp(professor)
sim(professor,t)sL(e,t)
Abstraction awareness
21
Concept-based conditions
Goal Exploit explicit (tags) and automatic
annotations in documents
locationTrondheim
Allows similarity and range queries (for
annotated concepts) likelocationTrondheim1970ltd
atelt1980with concept-specific distancemeasures
Concept awareness
22
Query Groups
Goal Related terms should occur in the same
context
  • Group conditions that relate to the same
    entity professor teaching IR research XML
  • professor T(teaching IR) R(research XML)
  • SphereScore computed for each group
  • Find compact sets with one result for each group

23
Scores for Query Results
  • query result R one result per query group

compactness 1/size of a minimal spanning tree
Context awareness
24
Join conditions
Goal Connect results of different query groups
  • A(research, XML)
  • B(VLDB 2005 paper)
  • A.personB.person
  • Dependent on database size, application
  • Precomputed
  • Computed during query execution

B
A
VLDB 2005
research XML Ralf Schenkel
1.0
2004 2005 R.Schenkel
0.9
  • Join conditions do not change the score for a
    node
  • Join conditions create a new link with a specific
    weight

25
Score for Join Conditions
  • Join condition A.TB.S
  • For all nodes n1 with type T, n2 with type S, add
    edge (n1,n2) with weight sim(n1,n2))-1
  • sim(n1,n2) content-based similarity

26
Outline
  • Where existing search engines fail
  • SphereSearch Concepts
  • Transformation and Annotation
  • Query Language and Scoring
  • Experimental Evaluation
  • Current and Future Work

27
Setup for Experiments
No existing benchmark (INEX, TREC, ) fits
  • Three corpora
  • Wikipedia
  • extended Wikipedia with links to IMDB
  • extended DBLP corpus with links to homepages
  • 50 Queries like
  • A(actor birthday 1970ltdatelt1980) western
  • G(California,governor) M(movie)
  • A(Madonna,husband) B(director) A.personB.director
  • Opponent keyword queries with standard
    TF/IDF-based score
  • ? simplified Google

28
Incremental Language Levels
SSE-Join(join conditions)
SSE-QG(query groups)
SSE-CV(concept-based conditions)
SSE-basic(keywords, SphereScores)
29
Experimental Results on Wikipdia
30
Experimental Results on Wiki and DBLP
  • SphereScores better than local scores
  • New SSE features nearly double precision

31
Current and Future Work
  • Improve graphical user interface
  • Refined type-specific similarity measures (like
    geographic distances) SIGIR-WS 2005
  • Deep Web search through automatic portal queries
  • Parameter tuning with relevance feedback
  • Efficiency of query evaluation through
    precomputation and integrated top-k(TopX talk
    this afternoon)

32
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com