Title: A Specialised Search Engine for Neuroscience WebPages
1A Specialised Search Engine for Neuroscience
WebPages
NeuroSearch
- Fatma Y. ELDRESI (MPhil )
- Systems Analysis / Programming Specialist, AGOCO
- Part time lecturer in University of Garyounis,
- Fatmaeldresi_at_hotmail.com
2Contents
Introduction
Components in a NeuroSearch its Architecture
Implementation
Software lifecycle (1)webCrawler Engine, (2)
Indexer Engine, (3) Query Engine, (4) Re-Crawler
Engine (Specialised Crawler)
Challenges
Testing
Conclusions
3Introduction
- A server or a collection of servers dedicated to
indexing internet web pages, storing the results
and returning lists of pages which match
particular queries. - Convenient search engines generate indexes
- Google using Spider
- Yahoo using Directory
- NeuroSearch Using Spider the Advance Knowledge
What is a Search Engine?
4Introduction cont..
- why is a specialised search engine needed?
- Web has got non centralised organisation, with
huge mixed collection of Information - Updated continuously, without standard format,
- Pages are extensively linked
Defining the problem
Therefore, establishing standard measures for
relevance is a very challenging task
- In addition,
- (1)- users have many challenges in choosing the
relevant keywords - (2)- professionals sometimes fail in their search
and get disappointed result, because - the retrieved pages sometimes not related or
- different from what the theyre looking for.
- Creating a specialised search engine (i.e,
Advance knowledge) to read web documents - Index and update all the content in the local
server - Answer the queries from the local database
- Update the system over a constant period
The Objective
5Components of NeuroSearch
It has two components 1-Search/Crawler Engine 2-
Query engines
6Components explained
Query Engine
Crawler Engine
Crawler Engine
Crawler Engine
7NeuroSearch Architecture Model
8Implementation and Case Study
- Creating the database using Access DB.
-
- Implementing all parts of NueroSearch using
- Java Language and SQL.
9NeuroSearch Database
The Advance Knowledge
TEXT
TEXT
TEXT
10The advance knowledge Case study- Neuroscience
(Vision)
NeuroSearch uses advance knowledge about
Neuroscience (vision) as a case study.
Then, as a domain knowledge of Vision, do data
mining to construct keywords and the relation
between them.
This knowledge is stored in the database and
categorised by numbers, and related knowledge is
categorised too and stored in data network form
in the database.
11Software lifecycle
Crawler Engine
Consists of 1. WebCrawler/Spider Engine 2.
Indexer Engine 3. Re-Crawler (specialised)
12WebCrawler (Spider)
- 1)-This web crawler is general one which can
download any kind of WebPages. - It performs this using
2)-Fetch URL, retrieves all its WebPages and
saves them in the local drive
Spider
4)-The crawler performs a breadth-first search,
which means it collects a list of all the links
that are on the current page before it follows
any of the links to a new page.
3)-In addition, WebCrawler has to access the
proxy firewall (i.e. in Newcastle University
LAN), before downloaded any web sites.
13WebCrawler - real challenge
Challenge 1 connect to www and accessing
private websites.
Challenge 2 connect this socket further to the
WWW
Solution 2 Get method the straight forward
socket uses is just to get the file name.
However, in this case Get command has to take the
full URL.
Solution 1 Crawler has to allow its socket to
connect first with the Proxy server.
14Indexer Engine
2)- if it is related to the case study subject
(neuroscience) so the indexer will collect the
following information from the document
1)-Firstly, it search the webpage using its
advance knowledge. Then, Webpage will be deleted
if it is not related to the case study subject.
Indexer Engine
3)-All keywords it contains, how many times they
are repeated, title, contents Then, save them
in the database for later display in the query
result and do other calculation.
4)-The Ranking Method
15Query Engine
It has an interface to accept keywords from the
user
Query Engine
It searches for query keywords in the index
database and retrieved the result in html format.
gives the user 2 choices for either display only
the most relevant result, or the whole result
which include the related results.
16Query Result This is indeed an edge compared to
other convenient search engines
17Re-Crawling
1-WebCrawler is specialised of any subject
created in the advance knowledge in the database,
which will achieve this purpose by reading the
URL from the index database using SQL
2-its interface allow the special users decide to
continue crawling the website or cancel it.
Re-Crawling
3-This Part of software aimed to update the
index found new link. This is will make
search and crawl any advance knowledge subject
related websites easier
18Testing phase
- Test phase requires
- checking the first 10 ranking queries results
- of the NeuroSearch with
- the same 10 queries results of another
- search engine such as Google.
specific keywords
general keywords
abbreviation combined keywords
20 tests for each category
Abbreviation keywords
combined keywords
Total of 1000 tests
19Testing cont..
Ranking query test results in General Keywords
Search Engine Google Google Google NeuroSearch Search Engine NeuroSearch Search Engine NeuroSearch Search Engine NeuroSearch Search Engine NeuroSearch Search Engine NeuroSearch Search Engine
First 10 results Rank Keyword Repeated Rank Keyword repeated Related-keyword repeated Quality/percentage
1 0 0 0 10 1 3 53 3 37
2 10 1 3 10 1 3 51 3 27
3 0 0 0 10 1 3 37 3 36
4 0 0 0 10 1 3 37 3 33.6
5 0 0 0 10 1 3 34 3 36.7
6 0 0 0 10 1 3 29 3 38.4
7 0 0 0 10 1 3 28 3 38.1
8 0 0 0 10 1 3 28 3 38
9 0 0 0 10 1 3 28 3 24.9
10 0 0 0 10 1 3 28 3 13.8
Average 10 10 100 100
Table 1 (Query 1) Ranking query test result in
General Keywords (Eye)
20Testing cont..
Chart 1 Average of Keywords performance for
Category Based test results of the (Google)
Chart 2 Average of Keywords performance for
Category Based test results of the (NeuroSearch)
21Analysing the search engines ranking results
Depends on the Categories
Table 4. The Average Ranking Engines Performance
Query test results Category based
22Analysing the Average Ranking Engines Performance
Query test results Category based
t test Result analysis Result analysis ..
is used to compare two groups' scores on the same variable p value lt .05). That indicates, NeuroSearch have a statistically significantly higher mean score in all categories ranking results (100) than Google (52.35) the negative values of t-test show the (inverse) relation between them when NeuroSearch results increase the Google results decrease.
23Visual representation
Chart 4 Average of the keyword Based in the
documents in Query test results for (Category
based Query) engines performance
Chart 3 Average of Categories Based Engines
ranking performance
24Conclusion
Particularly, if its advance knowledge
built/created by specialist (domain
knowledge), e.g. Oil, Medical, arts, etc
Although NeuroSearch search engine Used a
simple algorithm to judge the page quality
compared by other convenient search engines,
NeuroSearch proves to be very powerful in
obtaining relevant results,
25Reference (example..)
- Wandell, Brain A. Foundations of Vision.
Sunderland, Massachusetts, USA, 1995. - Brin, S. and L. Page. The Anatomy of a
Large-Scale Hypertextual Web Search Engine. The
Seventh Annual International WWW Conference and
computing science of Stanford University,
Stanford, CA 94305.USA, 1998.
26Thank You !
Ready for Questions!!!