A Specialised Search Engine for Neuroscience WebPages - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

A Specialised Search Engine for Neuroscience WebPages

Description:

none – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 27
Provided by: fatmaye
Category:

less

Transcript and Presenter's Notes

Title: A Specialised Search Engine for Neuroscience WebPages


1
A Specialised Search Engine for Neuroscience
WebPages
NeuroSearch
  • Fatma Y. ELDRESI (MPhil )
  • Systems Analysis / Programming Specialist, AGOCO
  • Part time lecturer in University of Garyounis,
  • Fatmaeldresi_at_hotmail.com

2
Contents
Introduction
Components in a NeuroSearch its Architecture
Implementation
Software lifecycle (1)webCrawler Engine, (2)
Indexer Engine, (3) Query Engine, (4) Re-Crawler
Engine (Specialised Crawler)
Challenges
Testing
Conclusions
3
Introduction
  • A server or a collection of servers dedicated to
    indexing internet web pages, storing the results
    and returning lists of pages which match
    particular queries.
  • Convenient search engines generate indexes
  • Google using Spider
  • Yahoo using Directory
  • NeuroSearch Using Spider the Advance Knowledge

What is a Search Engine?
4
Introduction cont..
  • why is a specialised search engine needed?
  • Web has got non centralised organisation, with
    huge mixed collection of Information
  • Updated continuously, without standard format,
  • Pages are extensively linked

Defining the problem
Therefore, establishing standard measures for
relevance is a very challenging task
  • In addition,
  • (1)- users have many challenges in choosing the
    relevant keywords
  • (2)- professionals sometimes fail in their search
    and get disappointed result, because
  • the retrieved pages sometimes not related or
  • different from what the theyre looking for.
  • Creating a specialised search engine (i.e,
    Advance knowledge) to read web documents
  • Index and update all the content in the local
    server
  • Answer the queries from the local database
  • Update the system over a constant period

The Objective
5
Components of NeuroSearch
It has two components 1-Search/Crawler Engine 2-
Query engines
6
Components explained
Query Engine
Crawler Engine
Crawler Engine
Crawler Engine
7
NeuroSearch Architecture Model
8
Implementation and Case Study
  • Creating the database using Access DB.
  • Implementing all parts of NueroSearch using
  • Java Language and SQL.

9
NeuroSearch Database
The Advance Knowledge
TEXT
TEXT
TEXT
10
The advance knowledge Case study- Neuroscience
(Vision)
NeuroSearch uses advance knowledge about
Neuroscience (vision) as a case study.
Then, as a domain knowledge of Vision, do data
mining to construct keywords and the relation
between them.
This knowledge is stored in the database and
categorised by numbers, and related knowledge is
categorised too and stored in data network form
in the database.
11
Software lifecycle
Crawler Engine
Consists of 1. WebCrawler/Spider Engine 2.
Indexer Engine 3. Re-Crawler (specialised)
12
WebCrawler (Spider)
  • 1)-This web crawler is general one which can
    download any kind of WebPages.
  • It performs this using

2)-Fetch URL, retrieves all its WebPages and
saves them in the local drive
Spider
4)-The crawler performs a breadth-first search,
which means it collects a list of all the links
that are on the current page before it follows
any of the links to a new page.
3)-In addition, WebCrawler has to access the
proxy firewall (i.e. in Newcastle University
LAN), before downloaded any web sites.
13
WebCrawler - real challenge
  • .

Challenge 1 connect to www and accessing
private websites.
Challenge 2 connect this socket further to the
WWW
Solution 2 Get method the straight forward
socket uses is just to get the file name.
However, in this case Get command has to take the
full URL.
Solution 1 Crawler has to allow its socket to
connect first with the Proxy server.
14
Indexer Engine
2)- if it is related to the case study subject
(neuroscience) so the indexer will collect the
following information from the document
1)-Firstly, it search the webpage using its
advance knowledge. Then, Webpage will be deleted
if it is not related to the case study subject.
Indexer Engine
3)-All keywords it contains, how many times they
are repeated, title, contents Then, save them
in the database for later display in the query
result and do other calculation.
4)-The Ranking Method
15
Query Engine
It has an interface to accept keywords from the
user
Query Engine
It searches for query keywords in the index
database and retrieved the result in html format.
gives the user 2 choices for either display only
the most relevant result, or the whole result
which include the related results.
16
Query Result This is indeed an edge compared to
other convenient search engines
17
Re-Crawling
1-WebCrawler is specialised of any subject
created in the advance knowledge in the database,
which will achieve this purpose by reading the
URL from the index database using SQL
2-its interface allow the special users decide to
continue crawling the website or cancel it.
Re-Crawling
3-This Part of software aimed to update the
index found new link. This is will make
search and crawl any advance knowledge subject
related websites easier
18
Testing phase
  • Test phase requires
  • checking the first 10 ranking queries results
  • of the NeuroSearch with
  • the same 10 queries results of another
  • search engine such as Google.

specific keywords
general keywords
abbreviation combined keywords
20 tests for each category
Abbreviation keywords
combined keywords
Total of 1000 tests
19
Testing cont..
Ranking query test results in General Keywords
Search Engine Google Google Google NeuroSearch Search Engine NeuroSearch Search Engine NeuroSearch Search Engine NeuroSearch Search Engine NeuroSearch Search Engine NeuroSearch Search Engine
First 10 results Rank Keyword Repeated Rank Keyword repeated Related-keyword repeated Quality/percentage
1 0 0 0 10 1 3 53 3 37
2 10 1 3 10 1 3 51 3 27
3 0 0 0 10 1 3 37 3 36
4 0 0 0 10 1 3 37 3 33.6
5 0 0 0 10 1 3 34 3 36.7
6 0 0 0 10 1 3 29 3 38.4
7 0 0 0 10 1 3 28 3 38.1
8 0 0 0 10 1 3 28 3 38
9 0 0 0 10 1 3 28 3 24.9
10 0 0 0 10 1 3 28 3 13.8
Average 10 10 100 100
Table 1 (Query 1) Ranking query test result in
General Keywords (Eye)
20
Testing cont..
Chart 1 Average of Keywords performance for
Category Based test results of the (Google)
Chart 2 Average of Keywords performance for
Category Based test results of the (NeuroSearch)
21
Analysing the search engines ranking results
Depends on the Categories


Table 4. The Average Ranking Engines Performance
Query test results Category based
22
Analysing the Average Ranking Engines Performance
Query test results Category based
t test Result analysis Result analysis ..
is used to compare two groups' scores on the same variable p value lt .05). That indicates, NeuroSearch have a statistically significantly higher mean score in all categories ranking results (100) than Google (52.35) the negative values of t-test show the (inverse) relation between them when NeuroSearch results increase the Google results decrease.
23
Visual representation
Chart 4 Average of the keyword Based in the
documents in Query test results for (Category
based Query) engines performance
Chart 3 Average of Categories Based Engines
ranking performance
24
Conclusion
Particularly, if its advance knowledge
built/created by specialist (domain
knowledge), e.g. Oil, Medical, arts, etc
Although NeuroSearch search engine Used a
simple algorithm to judge the page quality
compared by other convenient search engines,
NeuroSearch proves to be very powerful in
obtaining relevant results,
25
Reference (example..)
  • Wandell, Brain A. Foundations of Vision.
    Sunderland, Massachusetts, USA, 1995.
  • Brin, S. and L. Page. The Anatomy of a
    Large-Scale Hypertextual Web Search Engine. The
    Seventh Annual International WWW Conference and
    computing science of Stanford University,
    Stanford, CA 94305.USA, 1998.

26
Thank You !
Ready for Questions!!!
Write a Comment
User Comments (0)
About PowerShow.com