The Lucene Search Engine: Powerful, Flexible - PowerPoint PPT Presentation

1 / 14

About This Presentation

Title:

The Lucene Search Engine: Powerful, Flexible

Description:

Number of Views:240

Avg rating:3.0/5.0

Slides: 15

Provided by: Ani546

Category:

Tags: engine | flexible | lucene | powerful | search

Transcript and Presenter's Notes

Title: The Lucene Search Engine: Powerful, Flexible

1
The Lucene Search Engine Powerful, Flexible
FREE!!

2
Introduction

Lucene is an open-source API maintained by the
Apache Software Foundations Jakarta Project.
It has been implemented in Java with ports for
C, .NET , Perl and Python.
Since it is written in a modular fashion, it
allows a developer tremendous amount of freedom
to decide how to use it to suit an application.
The deciding factor for a search engine is its
effectiveness.
Two factors
Accuracy The percentage of the documents that
were actually returned from the available set of
documents for a particular query.
Precision The percentage of the documents
returned that are actually about the particular
query.

3
Search Engine Concepts

There are two paths (index path query path)
through a search engine.
The index path shows how the index gets filled
with documents.
The documents are fed to an analyzer which then
transforms them into the appropriate weighted
terms (or scores) and passes them to the
IndexWriter.

The query path through the search engine shows
how the index is queried for documents.
The same analyzer is used to derive a
user-defined set of terms that are, in turn,
passed to the IndexSearcher to perform the search
of the index.
Because indexes rarely ever hold the entire
document, a set of Hits are returned, where each
hit represents what is retained about the
document within the index.

5
Understanding indexing strategies

When the user starts the application for the
first time, Scishare will use the default
settings that identifies the user called PSEUDO
USER. Pseudo users are provided with
automatically generated X.509 certificates and
have access to public resources.
Removing stop words ..
user starts application first time, Scishare use
default settings identifies user called PSEUDO
USER. Pseudo users provided automatically
generated X.509 certificates access public
resources.
Stemming..
user start application first time, Scishare use
default setting identifies user called PSEUDO
USER. Pseudo users provided automatically
generated X.509 certificates access public
resources.

7
Lucene

Lucene involves a set of classes that are
implemented in an application depending on its
requirements.
You start by indexing your documents. To index
documents, you write a method that performs the
following steps
Gathers a list of files to be indexed.
Create an instance of a Document object to handle
an InputStream to each file.
Create an instance of Analyzer. This could be the
included StandardAnalyzer or one as sophisticated
as you can make it.
Create an IndexWriter with the following a
location of where to locate the index, an
instance of the Analyzer just created, and a flag
to tell it whether to create the index or not (if
it is missing).
Add the Document objects to the IndexWriter using
the addDocument method.

Imports StandardAnalyzer, IndexWriter,
FileDocument
private void indexFiles()
IndexWriter writer new IndexWriter("lib/indexIn
fo", new StandardAnalyzer(), true)
indexDocs(writer,new File("lib/parseDoc"))
writer.optimize()
writer.close()
public static void indexDocs(IndexWriter writer,
File file)throws Exception
if (file.isDirectory())
String files file.list()
for (int i 0 i lt files.length i)
indexDocs(writer, new File(file, filesi))
else
writer.addDocument(FileDocument.Document(file))

9
Querys

Create a QueryParser by passing in the default
field to search (as a String) and an instance of
Analyzer .
Call parse on QueryParser to return a Query
object.
Initialize an IndexSearcher with the location of
the index you wish to search.
Pass the Query object into the search method of
IndexSearcher, which will return a Hits object
where the Hits object is a ranked list of the
Document objects.

The built-in query parser supports most queries,
but if it is insufficient, you can always fall
back on the set of query-building constructs
provided. The query parser can parse queries like
these
free AND "text search Search for documents
containing "free" and the phrase "text search.
text search Search for documents containing
"text" and preferentially containing "search.
giants football Search for "giants" but omit
documents containing "football
authorpillai java Search for documents
containing pillai" in the author field and
"java" in the body.
Lucene also lets you write your own Analyzer to
accommodate the sophistication desired for a
particular application.

12
Applications

13
Conclusion

The primary goal for Lucene is "simplicity
without loss of power or performance.
Lucene's design leaves the user in charge of
functions that he/she needs to knows about --
selecting and retrieving documents, storing the
index data -- and hides the details of the
working of the underlying search engine.
Because Lucene is an API, it can be very
effectively used to index an e-mail Inbox, a
database, or a set of news feeds. The
applications are limited only by how you choose
to use them.
http//jakarta.apache.org/lucene/docs/index.html

14
? THANK YOU ?

Write a Comment

User Comments (0)