Problems in Semantic Search - PowerPoint PPT Presentation

About This Presentation
Title:

Problems in Semantic Search

Description:

Search SW terms, i.e. URIs that have been defined as classes and properties ... E. Oren, R.Delbru, M. Catasta, R. Cyganiak, H. Stenzhorn, G. ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 23
Provided by: Var65
Category:

less

Transcript and Presenter's Notes

Title: Problems in Semantic Search


1
Problems in Semantic Search
Krishnamurthy Viswanathan and Varish
Mulwad krishna3, varish1 AT umbc DOT edu
1
2
Agenda
  • Introduction
  • Swoogle
  • Cool things others do
  • Swoogle facts/figures
  • Our ideas
  • References

2
3
  • Why is Semantic Search significant?

3
4
Swoogle
  • Swoogle is a search engine for Semantic Web (SW)
    documents
  • It offers the following services
  • Search SW ontologies and documents
  • Search SW terms, i.e. URIs that have been defined
    as classes and properties
  • Provide metadata of SW documents and support
    browsing the Semantic Web

4
5
Swoogle
  • Swoogle supports two relevant query types
  • Ontology Searches a small collection that
    consists only of Semantic Web Ontologies
  • Document Searches all SW documents. This search
    space is much larger
  • Swoogle indexes only the documents URL, the
    terms being defined in the document, explicit
    descriptions about the document, and the
    namespaces used by the document

5
6
Swoogle capabilities
  • Web search
  • Basic metadata e.g. url, desc, ns etc.
  • Document metadata hasEncoding, hasLength etc.
  • RDF metadata hasGrammar, hasCntTriple etc.
  • Advanced search using Lucene features
  • REST based services Compose an HTTP GET query
    and retrieve the results in the form of RDF/XML

6
7
Examples of REST queries
  • A query is represented as a URL
  • REST_QUERY SERVICE_URI ? PARAMS
  • Example search SW documents which are classified
    as ontologies (ontoRatio gt 0)
  • queryType e.g. search_swd_ontology
  • searchString user constructed (see manual)
  • Key
  • http//logos.cs.umbc.edu8080/swoogle31/q?queryTyp
    esearch_swd_ontologysearchStringpersonkeydemo

7
8
  • Cool things other semantic search engines do

8
9
Sindice
  • Sindice is a Semantic Web search engine created
    at Digitial Enterprise Research Institute (DERI)
  • Interesting things to note about Sindice
  • Architecture
  • Indexing

9
10
Sindice
  • Sindice uses the paradigms of cloud computing for
    their architecture
  • Sindice uses Hadoop / Nutch to distribute
    crawling across multiple machines
  • Collected data is stored in a HBase a
    distributed column store

10
11
Sindice
  • Sindice indexes based on
  • Inverse Functional Properties (IFP)
  • URIs
  • Literals (Keywords)
  • IFP An OWL cardinality restriction
  • Benefits Faster Retrieval

11
12
Watson A gateway to the Semantic Web
  • From the Knowledge Management Institute at the
    Open University in UK
  • Interesting things to note about Watson
  • Consider implicit semantic relationships
  • Quality of Semantic documents
  • Rich access to semantic data

12
13
Watson
  • Implicit relationships between semantic web
    documents
  • Equivalence (Duplicate detection)
  • Quality of Semantic Documents
  • Richer access to Semantic Data
  • Web Interface for Humans
  • SparQL end point
  • Java/SOAP and REST APIs

13
14
Others
  • Semantic Web Search Engine (SWSE)
  • Pipelined architecture for crawling and indexing
  • Improved index and storage structure
  • Falcons
  • Class subsumption reasoning
  • Includes a Triple Store

14
15
Power Aqua
  • Multi-ontology based QA system powered by
    PowerMap and Watson
  • Takes inputs in the form of NL queries
  • Factual queries that can be expressed as one or
    more linguistic triples
  • Common wh-questions

15
16
Power Aqua
  • Key challenges in order to be able to answer
    NL-questions
  • Locating the ontologies relevant to a particular
    query
  • Identifying semantically sound relationships
  • Combining information from multiple queries

16
17
Swoogle facts/figures
  • The search engine components currently run on 4
    machines
  • These machines host the crawler, the Lucene
    index, the MySQL database etc. and access the NFS
  • Approximately 20,000 pages are accessed by
    Swoogle everyday (which get queued)
  • About 1,731,371 pure SW documents have been
    discovered

17
18
Swoogle facts/figures
  • Swoogle crawler has a large queue of documents to
    be crawled and indexed
  • Swoogle accesses metadata and index files over
    the NFS that makes information retrieval slower

18
19
Our Ideas Research and Engineering
  • Acquire new hardware
  • Parallelize Swoogle
  • Focus on a particular domain
  • Project Swoogle as a search engines for agents

19
20
Our Ideas Research and Engineering
  • Improve Swoogles indexing scheme
  • Analyze Swoogles ranking scheme
  • Use of Swoogle Metadata
  • Improve the usability of the website
  • Google like Services

20
21
References
  • Li Ding et al., "Swoogle A Search and Metadata
    Engine for the Semantic Web", Proceedings of the
    Thirteenth ACM Conference on Information and
    Knowledge Management, November 2004.
  • P. Mika, G. Tummarello Web Semantics in the
    Clouds, IEEE Intelligent Systems, Volume 23 ,
    Issue 5 (September 2008)
  • E. Oren, R.Delbru, M. Catasta, R. Cyganiak, H.
    Stenzhorn, G.
  • Tummarello Sindice.com A document-oriented
    lookup index for open linked data. In
    International Journal of Metadata, Semantics and
    Ontologies, 3(1), 2008.
  • Mathieu dAquin et al., Watson A Gateway for
    the Semantic Web ,Poster session of the European
    Semantic Web Conference, ESWC 2007
  • Gong Cheng, Weiyi Ge, Honghan Wu, Yuzhong Qu ,
    Searching Semantic Web Objects Based on Class
    Hierarchies In WWW 2008 Workshop on Linked Data
    on the Web, 2008

21
22
  • Questions ?

22
Write a Comment
User Comments (0)
About PowerShow.com