Title: SemSearch: A Search Engine for the Semantic Web
1SemSearch A Search Engine for the Semantic Web
- Yuangui Lei, Victoria Uren, Enrico Motta
- Knowledge Media Institute
- The Open University
- y.lei, v.s.uren, e.motta_at_open.ac.uk
2Outline
- Research background
- SemSearch overview
- Query interface
- Search process
- Implementation examples
- Conclusions future work
3Research background
- Semantic search extending traditional search
with the semantic web technology - Exploiting the explicit meaning of documents
(i.e., ontology-based metadata) - Current semantic search tools
- Form-based, e.g., SHOE, Magnet
- View-based, e.g., GRQL, SQoogle, Ontogator,
Falcon-S - QA-based, e.g., AquaLog, ORAKEL
- Keyword-based, e.g., TAP, Squiggle, DOSE
4Support for ordinary end users
- Form-based tools
- Forms are intuitive
- Issues knowledge overhead scalability
- View-based tools
- Support for domain understanding, query
refinement. - Complex queries could be specified
- The query construction process could be tedious
and time-consuming - QA-based tools
- Easy to use
- Issue heavy NLP.
- Keyword-based tools
- Easy to post queries quick response
- Typically one keyword only general knowledge of
the problem domain required
5The goal of our search engine
- Hide the complexity of semantic search from end
users - Low barrier to access easy to post queries
- Avoiding the form-based routine
- Avoiding the view-based search routine
- Dealing with relatively complex queries
- Supporting multiple keywords
- Precise and self-explanatory results
- Results satisfy user queries
- Results are easy to understand
- Quick response
- Avoiding linguistic processing
6SemSearch Architecture
End users
Google-like User Interface Layer
- Google-like query interface
Text Search Layer
- Semantic entity indexing engine
- Semantic entity search engine
Semantic Query Layer
- Formal query construction engine
Formal Query Language Layer (SPARQL, SERQL, etc.)
Semantic Data Layer
7The Google-like query interface
- Extending the traditional keyword search
languages by allowing the specification of - The queried subject
- The combination of keywords
- Three operations are used
- Operator captures the query subject
- and/or specifies the combination of keywords
- Query formats
- One keyword finding entities that have relations
with the keyword match(es) - Multiple keywords subjectkeyword1 and/or
keyword2 and/or keyword3, e.g., ltnews phd
studentsgt, ltpaper john and enricogt - Advantages
- More flexible than form-based query interface
- More powerful than state-of-art keyword-based
semantic search interfaces
8The search process
- Step1 making sense of the user queries
- Step2 translating user queries into formal
queries - Step3 Querying the back-end semantic data
repository - Step4 Ranking
9Making sense of user queries
- Finding out the meaning of keywords
- Class, e.g., the keyword phd students
- Relation, e.g., author
- Instance, e.g., Enrico, KMi director
- Method text search
- Labels (rdfslabel)
- Short literals also used in the case of instances
matching - When searching for KMi director, the instances
can be picked up.
10Translating user queries into formal queries
- Input semantic entity matches of the search
keywords - Each keyword -gt multiple matches
- Output formal queries which reflect the user
query - One user query -gt multiple formal queries.
11Simple queries
- There are only two keywords involved
ltsubjectkeywordgt - Fixed number of combination types
12A template example
- Pattern Subject -gt Class Cs Keyword -gt Class
Ck - Results ltIs,Relation,Ikgt associated with
exploratory links. - Example news stories about phd students
- ltnews KMi success, mentions-person, Tom-Heathgt
- A simplified template in Sesame SERQL
select Is, R, Ik from Is rdftype Cs,
Ik
rdftype Ck,
Is R Ik union select Is, R,
Ik from Is rdftype Cs,
Ik rdftype Ck,
Ik R
Is
13Complex queries
- Subject keyword1 and/or keyword2 and/or
- Instances of the subject which either have
relations with all the keywords or have relations
with some of the keywords. - Operational problem the number of combination
gets big when there are many keywords involved
and there are lots of matches for each keyword. - Rules for combination reduction
- Only considering the subject keyword as class
entities - Choosing the closest matches as possible
- Choosing the most specific class matches among
the class matches.
14Query construction
- Head block what needs to be retrieved, i.e.,
ltIs, r, Ikxgt - Body block how to retrieve the triples
- Condition block conditions need to be satisfied
- The construction algorithm constructs queries by
walking through all the appropriate matches.
15Query construction algorithm
Initializing the query blocks
No
Yes
Adding query blocks for class-class relations
retrieval
Yes
No
Adding query blocks for class-property relations
retrieval
Yes
No
Yes
Adding blocks for class-instance relations
retrieval
No
Composing queries using the blocks
16Implementation
- Based on Lucene and Sesame
- The prototype applied in the KMi domain and the
ESWC conference - http//semanticweb.kmi.open.ac.uk/
- http//search.eswc06.org/
- An experimental evaluation has been carried out
in the context of the KMi semantic web portal.
17Simple query example
18Refinement support
19Complex query example
20Conclusions
- A keyword-based semantic search engine has been
developed - Google-like query interface
- Supporting relatively complex queries
- Providing relatively quick response
- Future work
- Ranking
- Support for domain understanding
- Semantic matching
- Query refinement
21- Thanks for your attention!