Online Search Engine MED SEARCH - PowerPoint PPT Presentation

About This Presentation
Title:

Online Search Engine MED SEARCH

Description:

The goal of this project is to build a search engine that provides data XML ... BODY BGCOLOR='#000000' TEXT='#FFFFFF' TABLE BORDER='1' TR ALIGN='LEFT' ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 19
Provided by: Sae70
Category:

less

Transcript and Presenter's Notes

Title: Online Search Engine MED SEARCH


1
Online Search Engine MED SEARCH
  • By Group 20

2
Introduction
  • The goal of this project is to build a search
    engine that provides data XML documents that
    contains data for Medicine.
  • developed an efficient online search engine for
    MED
  • need a valid PMID (unique identification number)
    for each provided XML document or keyword from
    the document.
  • Once a PMID or any keyword is provided the user
    gets relevant data

3
Class Diagram
4
How does the Indexeractually work?
  • This document was parsed in order for its
    contents to be used.
  • The program that does this is called
    Indexer.java.


if(totalWord.charAt(i) 'lt') test
false if(totalWord.charAt(i)
'gt') test true //builds words based
character at a time if (test
totalWord.charAt(i) ! 'gt') word word
totalWord.charAt(i)
5
  • The indexer also parses the document to ignore
    characters such as the following ones.

StringTokenizer st new StringTokenizer(word,
"," "_at_" "-" "." "\"" "(" ")" "?" " "
"" "\n" "\t" "\r" "" "" "/" ""
"" "'" "", false )
6
  • indexer runs through all parsed words and writes
    them to another file, preferable .txt or .doc
    file, with the page/document numbers in which
    each of the parsed term appeared.
  • For example if the word dna appears on
    page/document 67 then on the .txt/.doc file it
    would appear as dna 67.

element st.nextToken().toLowerCase() v
ector.addElement("\n" element " "
count)
7
Storing the Word in a Index
element st.nextToken().toLowerCase() v
ector.addElement("\n" element " "
count)
  • The above code stores the word with the page
    number attached. Eg. Dna 206

8
Writing the XML pages
if(readTemp.indexOf("ltPubmedArticlegt")!-1) /
/test true count FileWriter
file new FileWriter(count ".xml", true)
  • The above code reads through the entire XML
    dataset and every time it reads the tag
    ltPubmedArticlegt it creates a new XML document.

9
Doing the Search at Runtime
  • First the search is done in middle of file, if
    keyword not found, then looks on the top of
    vector, if keyword still does not match, then
    looks in the bottom section of vectored file.

//perform divide and search sort
technique while(lower1!upper) middle
(lowerupper)/2 String tmpElement
(String)vector.elementAt(middle) StringTokeni
zer tmpToken new StringTokenizer(tmpElement)
String tmpString tmpToken.nextToken()

10
if(wordIn.compareTo(tmpString)lt0) upper
middle else if(wordIn.compareTo(tmpStr
ing)gt0) lowermiddle else
if(wordIn.equals(tmpString)) found
true upper lower 1 int
mid middle1
11
  • As soon as match is found for keyword the
    SearchPageServlet will look for the word with in
    the top, middle or bottom vector, depending on
    vector the word was found.

if(found true) String tmpElement
(String)vector.elementAt(middle) Stri
ngTokenizer tmpToken new StringTokenizer(tmpElem
ent,"," "", false) String tmpWord
tmpToken.nextToken() String tmpPage
tmpToken.nextToken() vectorPage.addEleme
nt(tmpPage) .
12
String tmpElement2 (String)vector.elementAt
(mid) StringTokenizer tmpToken2 new
StringTokenizer(tmpElement2,",""",
false) String tmpWord2 tmpToken2.nextToken(
) String tmpPage2 tmpToken2.nextToken()

13
  • Reading numbers attached to all matched words.

//adds the page numbers of the words that match
the user input while(tmpWord.equals(tmpWord2))
middle String tmpE
(String)vector.elementAt(middle) Str
ingTokenizer tmp new StringTokenizer(tmpE,",""
", false) tmpWord tmp.nextToken() tm
pPage tmp.nextToken() mid
String tmpE2 (String)vector.elementAt(mid)
StringTokenizer tmpT2 new StringTokenizer(tmp
E2, ",""", false) tmpWord2
tmpT2.nextToken()
14
  • Every time the program finds matching code it
    adds it to the vector and formats them to be
    displayed in a list on the website.
  • If there are no matching keywords found in
    indexed file then program will display on website
    no matching results are found.
  • This happens when the vector is empty, as each
    time a matching word is found it is stored in a
    vector.

15
  • Thus from the search results listed the user can
    select the link that most suits their search.
  • As soon as they click on the link on the website
    the XSL style XML document is displayed on the
    web.

16
lt?xml version'1.0'?gt ltxslstylesheet
xmlnsxsl"http//www.w3.org/TR/WD-xsl"
xmlns"http//www.w3.org/TR/REC-html40"
result-ns""gt ltxsltemplate match"/"gt ltHTMLgt
ltHEADgt ltTITLEgtPubmed Articleslt/TITLEgt
lt/HEADgt ltBODY BGCOLOR"000000"
TEXT"FFFFFF"gt ltTABLE BORDER"1"gt ltTR
ALIGN"LEFT"gt ltTHgtPMIDlt/THgtltTHgtYearlt/THgtltT
HgtTitlelt/THgtltTHgtAffiliationlt/THgtltTHgtAbstractlt/THgt
lt/TRgt ltxslfor-each
select"PubmedArticleSet/PubmedArticle"gt
ltTR ALIGN"LEFT" VALIGN"TOP"gt
17
  • This XSL style sheet isolates each XML document
    by identifying the tag ltPubMedArticlegt
    lt/PubmedArticlegt and displays each set of
    information for every document, independently as
    a separate page.

ltTDgtltxslvalue-of select"MedlineCitation/PMID"/gtlt
/TDgt ltTDgtltxslvalue-of
select"MedlineCitation/Article/Journal/JournalIss
ue/PubDate/Year"/gt ltxslvalue-of
select"MedlineCitation/Article/Journal/JournalIss
ue/PubDate/Month"/gtlt/TDgt ltTDgtltFONT
SIZE"2"gtltxslvalue-of select"MedlineCitation/Art
icle/ArticleTitle"/gtlt/FONTgtlt/TDgt
ltTDgtltFONT SIZE"2"gtltxslvalue-of
select"MedlineCitation/Article/Affiliation"/gtlt/FO
NTgtlt/TDgt ltTDgtltFONT SIZE"2"gtltxslvalue-of
select"MedlineCitation/Article/Abstract/Abstract
Text"/gtlt/FONTgtlt/TDgt lt/TRgt
lt/xslfor-eachgt lt/TABLEgt
lt/BODYgt lt/HTMLgt lt/xsltemplategt lt/xslstylesheet
gt
18
Website
  • http//unix.aml.yorku.ca8080/w04_g20/searchPage.h
    tml
Write a Comment
User Comments (0)
About PowerShow.com